Phase 1 of my project involved writing sparse matrix algorithms. For that purpose, I have written two classes, the DOKMatrix class and the LILMatrix class. DOKMatrix is good for add, sub, mult and easy fast editing of matrices. It currently has sparse cholesky decomposition, LDL decomposition algorithms implemented, and a fast matrix multiplication algorithm. But these decompositions only work for symmetric matrices. So I was multiplying by A.T and solving the equation A.T * A * x = A.T * B. There is an overhead of two matrix multiplications here. Further, this method could only give the absolute value of the determinant of a general matrix.
Enter LIL. LIL is a list of list of elements. It has N lists which store the N rows of a matrix. A row is a list which stores only the non-zero elements of that row in the form of (j, value) tuple pair. Row operations in LILMatrix are very fast and intuitive. This enable me to write sparsified gaussian elimination and reduced row echelon form of a matrix. This completes the bare essentials of sparse matrix algorithms !
Some benchmarks:
LDL decomposition on DOKMatrix
In [5]: A = randInvDOKMatrix(10,2); A = A.T * A # creates a banded 10*10 matrix, with diagonals till 2nd diagonal full. In [6]: %timeit A._LDL_sparse() # On python floats 1000 loops, best of 3: 894 us per loop In [7]: A.applyfunc(S) In [8]: %timeit A._LDL_sparse() # On sympy Integers and Rationals. 100 loops, best of 3: 2.34 ms per loop In [9]: A.sparsity() Out[9]: 0.42 In [11]: A = randInvDOKMatrix(100, 10); A = A.T * A # For a 100 * 100 Matrix, with a band of 2*10 full diagonals In [12]: %timeit A._LDL_sparse() 1 loops, best of 3: 221 ms per loop In [13]: A.applyfunc(S) In [14]: %timeit A._LDL_sparse() 1 loops, best of 3: 712 ms per loop In [15]: A.sparsity() Out[15]: 0.3316
Reduction to Upper Triangular Matrix using Gaussian Elimination on LILMatrix
In [20]: %timeit A.gauss_col() 1000 loops, best of 3: 280 us per loop In [21]: A = randInvLILMatrix(10,2) In [22]: %timeit A.gauss_col() 1000 loops, best of 3: 259 us per loop In [23]: A.applyfunc(S) In [24]: %timeit A.gauss_col() 1000 loops, best of 3: 744 us per loop In [25]: A.sparsity() Out[25]: 0.26 In [26]: A = randInvLILMatrix(100,10) In [27]: %timeit A.gauss_col() 10 loops, best of 3: 77.2 ms per loop In [28]: A.applyfunc(S) In [29]: %timeit A.gauss_col() 1 loops, best of 3: 241 ms per loop In [30]: A.sparsity() Out[30]: 0.1691
Finding inverse of a LILMatrix using RREF
In [8]: A = randInvLILMatrix(10,2) In [9]: %timeit A.inv_rref() 1000 loops, best of 3: 1.75 ms per loop In [10]: A.applyfunc(S) In [11]: %timeit A.inv_rref() 100 loops, best of 3: 9.04 ms per loop In [12]: A.sparsity() Out[12]: 0.26 In [13]: A = randInvLILMatrix(100,10) In [14]: %timeit A.inv_rref() 1 loops, best of 3: 851 ms per loop In [15]: A.applyfunc(S) In [16]: %timeit A.inv_rref() 1 loops, best of 3: 10.6 s per loop In [17]: A.sparsity() Out[17]: 0.1704 Dense Matrix speed (for comparison) In [18]: B = A.toMatrix() In [19]: %time C = B.inv() CPU times: user 175.02 s, sys: 3.73 s, total: 178.75 s Wall time: 186.35 s
Solving Ax=B by RREFing on a augmented matrix.
In [5]: A = randLILMatrix(10,11, sparsity=0.6) In [6]: %timeit A.rref() 1000 loops, best of 3: 1.01 ms per loop In [7]: A.applyfunc(S) In [8]: %timeit A.rref() 100 loops, best of 3: 2.67 ms per loop In [9]: A = randLILMatrix(100,101, sparsity=0.6) In [10]: %timeit A.rref() 1 loops, best of 3: 999 ms per loop In [11]: A = randLILMatrix(100,101, sparsity=0.3) In [12]: %timeit A.rref() 1 loops, best of 3: 787 ms per loop In [13]: A.applyfunc(S) In [14]: %timeit A.rref() 1 loops, best of 3: 9.04 s per loop In [18]: A.sparsity() Out[18]: 0.304158415842
asmeurer
June 22, 2011 at 7:09 am
The values of A appear to be Python ints, not floats.
Was it intentional that you changed the slicing syntax? A[0][0] doesn’t work for me.
Sherjil Ozair
June 22, 2011 at 2:41 pm
Yes, they’re python ints. But I have division imported from __future__. So the ints become floats soon enough.
Yes A[0] wouldn’t mean anything right now. Should it return the first row of the matrix ? The slicing syntax currently is A[r1:r2,c1:c2].
vks
June 22, 2011 at 6:11 pm
I think A[0] should at least work when A.cols or A.rows == 1, this way you can easily use such matrices as vectors.
Sherjil Ozair
June 23, 2011 at 2:09 pm
More typechecks in the __getitem__. Performance regression, if you ask me. I’ll have to figure out something.
vks
June 23, 2011 at 9:18 pm
Yes, but it should be in the high level interface. It’s fine if the low-level interface does not have this syntactic sugar.