I’m currently on a week long vacation with my family, so I’m not doing much work. I plan to pick up speed when I return on the 24th.

]]>

While I was refactoring the Matrix class to making it more modular and removing high level functionalities, I faced some problems that I’d rather discuss with the list first.

I will be writing to the list to discuss what the public API of the matrix module should be. Only when that is done can we proceed with the structuring.

I’m proud to say that the algorithmic part of my project is over. I’m working on cleaning my commit history, writing docstrings, tests and cleaning up the code.

]]>

In [1]: A = randInvLILMatrix(100,5) In [2]: A.sparsity() Out[2]: 0.0819 In [3]: A = randInvLILMatrix(100,10) In [4]: A.sparsity() Out[4]: 0.1705 In [5]: %timeit L, U, P = A.LU_sparse() 10 loops, best of 3: 138 ms per loop In [6]: B = A.toMatrix() In [7]: %timeit L, U, P = B.LUdecomposition() 1 loops, best of 3: 5.02 s per loop In [8]: A = randInvLILMatrix(100,5) In [9]: A.sparsity() Out[9]: 0.0831 In [10]: %timeit L, U, P = A.LU_sparse() 10 loops, best of 3: 86.5 ms per loop In [11]: B = A.toMatrix() In [12]: %timeit L, U, P = B.LUdecomposition() 1 loops, best of 3: 4.6 s per loop In [13]: A = randInvLILMatrix(1000,10) In [14]: %timeit L, U, P = A.LU_sparse() 1 loops, best of 3: 8.86 s per loop In [15]: A.sparsity() Out[15]: 0.017837 In [16]: B = A.toMatrix() In [17]: %time L, U, P = B.LUdecomposition()

The last command is taking more than 5 minutes to execute. I grew impatient and started to write this blog. :)

Almost all the algorithmic part of my project is finished, except for nullspaces. But LILMatrix’s RREF works nicely so I guess nullspaces would be trivial.

As discussed in IRC, I will now start work on making a Matrix_ class (transitional name). The Matrix_ class will have a smart constructor which will decide which of the three matrices (dense, dok, lil) to use for its internal representation. The current Matrix and my DOKMatrix and LILMatrix class will be made low-level, that is not usable by the user directly. Much error-checks will be removed from them. The Matrix_ class would be the user-level class containing all the methods which the user needs. They will basically perform error checks and then call the required method in the internal representation. Matrix_ will also handle the seamless interconversion between the representations. For example, if a matrix object has LILMatrix representation, but the user calls .cholesky which is a method of DOKMatrix, then the object would be converted to DOKMatrix implicitly and .cholesky called.

In a few days, I will add a blog post detailing more about how I plan to write this interface.

I welcome any design ideas you might have.

]]>

1. [**New code**] Add code that I have written to SparseMatrix.

2. [**Removing code duplication**] Observe which functions can be used by both Matrix and SparseMatrix and put them in a super Matrix class, call it _Matrix for now. _Matrix will derive from object. Remove those functions from Matrix and SparseMatrix(if they’re there).

3. [**SparseMatrix and Matrix derive from a common super class _Matrix**]Make SparseMatrix derive from _Matrix rather than Matrix. This will make a lot of tests fail probably. Fix those tests. If that test was written for a SparseMatrix object using a dense algorithm, then that test is not good, and should be removed. User should use the dense matrix for dense algorithms. If a test is for a functionality that dense matrix shares, then that function could be put in the super _Matrix class. _Matrix class can be thought of as a collection of functions that do not depend on the representation of the matrix. It can also define some interfaces that a a subclass should have.

4. [**Structuring**] Put the three classes in separate files, and also split the test file. Change imports accordingly.

5. [**Renaming**] Rename Matrix to DenseMatrix, _Matrix to Matrix, SparseMatrix to DOKMatrix. This might break tests all over sympy, but solution is most likely to be trivial. Fix those tests. Because of the renaming Matrix(…) constructor won’t work. A small getaway as of now would be to write a __new__method which will divert the construction to the DenseMatrix constructor. (This will be changed later.)

6. [**More tests**] Now that SparseMatrix is a respected working separate class in its own right, write tests for old and new functionality.

At this point, We have DOKMatrix(Matrix), DenseMatrix(Matrix), and Matrix(object) in three separate files, with tests written in separate files for each. Matrix(…) is working and returns a DenseMatrix object. isinstance(A, Matrix) also works. Only after this is complete, I will add another file lilmatrix.py which will have the LILMatrix class deriving from Matrix.

7. [**LILMatrix**] Add LILMatrix code that I have written. Write tests for it.

8. [**Misc**] Write miscellaneous functions and ensure all operations can be done on both the sparse matrices even if that algorithm has not been written.

( Note: One major example is that LILMatrix does not support an efficient matrix multiplication algorithm, If a user calls LILmatrix * LILmatrix, both the matrices will be converted to a DOKMatrix and then multiplied and then return the product DOKMatrix. Note that LILMatrix * LILMatrix –> DOKMatrix. But I don’t think that is a problem since even if the user assumes the product to be a LILMatrix and carries out an operation of LILMatrix, if the DOKMatrix does not have that operation, then it will be converted back to LILMatrix and that operation performed. In short, Matrix interconversions would be implicit. Of course, the user will also be given explicit conversion functions like .to_dokmatrix.)

At this point three matrix representations will be have been implemented along with tests.

9. [**Domainifying**] Add the `domain` kwarg to relevant constructors and functions. Write some more tests to check for matrix functionality over some domains from polys like QQ, FF(q), etc. Write some random matrix generator function over such domains. A small experiment I did indicated that this is reasonably easy, but generates some random bugs. Fixing bugs depends on changing Polys code and for this reason might be slow.

10.[**Putting it all together**] Write the Matrix constructor, which will take into account 1. the domain specified and 2. the sparsity of the data passed to it. If the domain is not explicitly specified, I would use construct_domain to set the domain. The user can also explicitly state that he wants his matrix to have no domain i.e. domain=object. If the given data is sparse enough, the data will be passed to one of the sparse matrices.

11+. Uncharted.

Many of these steps can also be carried out in parallel, but those that cannot reply on the completion of the previous steps. I will need the help of the community here to review each step and tell me when according to them, a step has been ‘completed’, so that I can confidently move to the next step. Lack of this is causing me to be in a slight state of confusion regarding what should I work on next.

Things have also been a little slow lately due to the above problem and also due to my lack of experience with git. I have never contributed to open source before, and have no expertise in handling large collection of commits. I’m learning fast though with the help of Vinzent.

I hope to get much of this done by mid-term evaluation. But I think (I’m not sure, though) all these steps 1 – 10 will take some 10 – 15 days more than 15th July.

Suggestions are most welcome.

]]>

I used `if j` somewhere to check if j is not None, but j could also take values in the integers, and j == 0 was making the if check evaluate to False. I’ve not done much work this week, other than figuring out how to manage a large number of commits in git.

Once I add all the stuff to SparseMatrix, before adding the LILMatrix, I will have to make a few abstract classes. My plan is to let Matrix be the superclass, and make DenseMatrix (currently Matrix) DOKMatrix (SparseMatrix) and LILMatrix derive from it. The Matrix class would contain non-algorithmic and utility code which doesn’t depend on the representation of the matrix. This is also so that one can use isinstance(A, Matrix) to mean any of the matrices.

For being backward compatible, the constructor Matrix(…) will still work as it is now. According to me, it should go through the data once, and see if it is sparse enough, and then delegate the construction to DenseMatrix, or one of the sparse matrices. So, all matrices that are now used will be called DenseMatrix instead of Matrix, but that is hardly a change, since isinstance(A, Matrix) would still work. We would also need this preprocessing my the Matrix constructor because of the domain of the matrix. If the user does not explicitly specify the domain of the matrix, the Matrix constructor would run a modified construct_domain to see which domain does the elements belong to.

And there is Matrix.domain too which needs some work. But I think I will only be able to do that properly, when the class hierarchy is set.

LILMatrix currently employs only partial pivoting in gaussian elimination and rref. Vinzent is not too happy about it, as it partial pivoting does not take care of the sparsity of the matrix, and for particular matrices, the sparse matrix might completely change to a dense matrix. I’m searching for pivoting strategies that takes care of the sparsity of the matrix which can also work for symbolic matrices.

This paper [1] deals with sparse symbolic structure prediction of LU factors of a matrix, but only through partial pivoting.

This paper [2] contains some literature about complete pivoting, i.e. using column ordering to minimize sparsity of the LU factors. I will search for a more explicit algorithm which implements gaussian elimination with complete pivoting.

My thoughts on this are, that if a user wants to solve Ax = b for non-singular matrix A, then he should you use the cholesky decomposition methods of DOKMatrix. However for RREF, and nullspaces of singular matrix, he should use LILMatrix’s gauss methods with partial pivoting. For fast multiplication, he should use DOKMatrix. LILMatrix currently has no algorithms for multiplication and probably will not, as column operations on a LILMatrix is not efficient at all. It will hoever divert multiplication to DOKMatrix.

I will write another blog in 1-2 days detailing more about my design ideas for the matrix module.

[1] **gauss**.cs.ucsb.edu/publication/GrigoriGilbertCosnardStructHall.pdf

[2] http://www.cs.uwaterloo.ca/research/tr/1984/CS-84-43.pdf

]]>

Due to the vast powers that the Expr/Poly class has provided us with, large expressions are easy to form and common in calculations, resulting in expression blowup. Factorizing a matrix with simple elements like x, x+1, x**2 in it give factor matrices with very large expressions, all of which, when simplified give simple and small expressions. It is evident that sympy lacks somewhere.

What is lacks is expressions under division. The polys are smart enough to handle addition, subtraction and division, but for division, it doesn’t use any of its computation power. Thus the fundamental for operations are not complete in the true sense.

Many algorithms, especially linear algebra, assume that the elements it is operating on belong to a field, that is division is possible. Sympy expr’s will perform brilliantly if division is made clean.

Look here.

In [13]: a Out[13]: x In [14]: a=(a+1)**-1 In [15]: a Out[15]: 1 ───── x + 1 In [16]: a=(a+1)**-1 In [17]: a Out[17]: 1 ───────── 1 1 + ───── x + 1 In [18]: a=(a+1)**-1 In [19]: a Out[19]: 1 ───────────── 1 1 + ───────── 1 1 + ───── x + 1 In [20]: a=(a+1)**-1 In [21]: a Out[21]: 1 ───────────────── 1 1 + ───────────── 1 1 + ───────── 1 1 + ───── x + 1 In [22]: a=(a+1)**-1 In [23]: a Out[23]: 1 ───────────────────── 1 1 + ───────────────── 1 1 + ───────────── 1 1 + ───────── 1 1 + ───── x + 1 In [24]: a.simplify() Out[24]: 3⋅x + 5 ─────── 5⋅x + 8

The big expression on 23 simplified to the simplest Rational Function. Why wasn’t it simplified automatically ?

According to me, a should be simplified automatically, the very first time in Out[17]. Out[17] should be

x + 1 ───── x + 2

This would be easy with a Frac class.

The reason why I’m pitching so much for a Frac class is because, when I knew that sympy polys can take in cos(x), 1/x as gens, I knew that if we implement the Frac class, almost all of symbolic needs are done for. The fundamental four operators, +,-,*,/ will be implemented in the true sense in sympy. Of course things like sqrt will still not be supported by the Frac class, but it is often to see things like

⎽⎽⎽ ╲╱ x + x ───────── 3/2 x + 1

which can be treated as a Frac with generator x**1/2.

Sympy will be able to operate on common expressions like

4⋅sin(x) + 5⋅cos(x) + 4 ───────────────────────── 11⋅sin(x) + 2⋅cos(x) + 12

Currently, if the above expression is a, then

In [49]: (a+1)**-1 Out[49]: 1 ───────────────────────────── 4⋅sin(x) + 5⋅cos(x) + 4 ───────────────────────── + 1 11⋅sin(x) + 2⋅cos(x) + 12

which is just sad.

Hence, IMHO, the Frac class would be a great step forward for sympy, and would be a great asset of sympy.

]]>

Enter LIL. LIL is a list of list of elements. It has N lists which store the N rows of a matrix. A row is a list which stores only the non-zero elements of that row in the form of (j, value) tuple pair. Row operations in LILMatrix are very fast and intuitive. This enable me to write sparsified gaussian elimination and reduced row echelon form of a matrix. This completes the bare essentials of sparse matrix algorithms !

Some benchmarks:

**LDL decomposition on DOKMatrix**

In [5]: A = randInvDOKMatrix(10,2); A = A.T * A # creates a banded 10*10 matrix, with diagonals till 2nd diagonal full. In [6]: %timeit A._LDL_sparse() # On python floats 1000 loops, best of 3: 894 us per loop In [7]: A.applyfunc(S) In [8]: %timeit A._LDL_sparse() # On sympy Integers and Rationals. 100 loops, best of 3: 2.34 ms per loop In [9]: A.sparsity() Out[9]: 0.42 In [11]: A = randInvDOKMatrix(100, 10); A = A.T * A # For a 100 * 100 Matrix, with a band of 2*10 full diagonals In [12]: %timeit A._LDL_sparse() 1 loops, best of 3: 221 ms per loop In [13]: A.applyfunc(S) In [14]: %timeit A._LDL_sparse() 1 loops, best of 3: 712 ms per loop In [15]: A.sparsity() Out[15]: 0.3316

**Reduction to Upper Triangular Matrix using Gaussian Elimination on LILMatrix**

In [20]: %timeit A.gauss_col() 1000 loops, best of 3: 280 us per loop In [21]: A = randInvLILMatrix(10,2) In [22]: %timeit A.gauss_col() 1000 loops, best of 3: 259 us per loop In [23]: A.applyfunc(S) In [24]: %timeit A.gauss_col() 1000 loops, best of 3: 744 us per loop In [25]: A.sparsity() Out[25]: 0.26 In [26]: A = randInvLILMatrix(100,10) In [27]: %timeit A.gauss_col() 10 loops, best of 3: 77.2 ms per loop In [28]: A.applyfunc(S) In [29]: %timeit A.gauss_col() 1 loops, best of 3: 241 ms per loop In [30]: A.sparsity() Out[30]: 0.1691

**Finding inverse of a LILMatrix using RREF**

In [8]: A = randInvLILMatrix(10,2) In [9]: %timeit A.inv_rref() 1000 loops, best of 3: 1.75 ms per loop In [10]: A.applyfunc(S) In [11]: %timeit A.inv_rref() 100 loops, best of 3: 9.04 ms per loop In [12]: A.sparsity() Out[12]: 0.26 In [13]: A = randInvLILMatrix(100,10) In [14]: %timeit A.inv_rref() 1 loops, best of 3: 851 ms per loop In [15]: A.applyfunc(S) In [16]: %timeit A.inv_rref() 1 loops, best of 3: 10.6 s per loop In [17]: A.sparsity() Out[17]: 0.1704Dense Matrix speed (for comparison)In [18]: B = A.toMatrix() In [19]: %time C = B.inv() CPU times: user 175.02 s, sys: 3.73 s, total: 178.75 s Wall time: 186.35 s

** Solving Ax=B by RREFing on a augmented matrix.**

In [5]: A = randLILMatrix(10,11, sparsity=0.6) In [6]: %timeit A.rref() 1000 loops, best of 3: 1.01 ms per loop In [7]: A.applyfunc(S) In [8]: %timeit A.rref() 100 loops, best of 3: 2.67 ms per loop In [9]: A = randLILMatrix(100,101, sparsity=0.6) In [10]: %timeit A.rref() 1 loops, best of 3: 999 ms per loop In [11]: A = randLILMatrix(100,101, sparsity=0.3) In [12]: %timeit A.rref() 1 loops, best of 3: 787 ms per loop In [13]: A.applyfunc(S) In [14]: %timeit A.rref() 1 loops, best of 3: 9.04 s per loop In [18]: A.sparsity() Out[18]: 0.304158415842

]]>

Or so it seemed. Only the absolute value of the determinant could be calculated by using this factorization, for a general matrix. Since this method involves, A.T * A.

det(A.T * A) = det(L * L.T). Since det(A.T) = det(A), det(A.T * A) = det(A.T) * det(A) = det(A) ** 2 so, det(A)**2 = det(L)**2 which implies abs(det(A)) = abs(det(L)).

Thus, using cholesky, or similarly LDL cannot give me the sign of the determinant. Still, this method is good to check whether a matrix is singular or not, yet incomplete.

Having tried it for a few days now, I feel that the DOKMatrix representation does not support Gaussian elimination intuitively. The best that I can do is a pseudo-sparse gaussian elimination which would iterate even over the zero elements. But that would just be equivalent to the dense algorithm.

The DOK Matrix offers O(1) access time, but sacrificing dynamic iteration. In my code for the above two decompositions, I had to generate the LIL structure of the matrix before starting the algorithm and then use it for iteration. This paradigm fails if the matrix in question is dynamically changing during the algorithm. That is, if we need the LIL of matrix A, AND matrix A is being edited during the algorithm. In cholesky decomposition of A, the LIL of A is used to get the LIL of L, and then the numerica cholesky formula is applied to get L. This works, because we have an algorithm that pre-determines the structure of the result matrix. This is NOT the case with gaussian elimination.

In essence, what I need for the gaussian elimination is that I have the sparsity structure of the matrix in computation AND the ability to edit the matrix in reasonable complexity (I know O(1) would be too unrealistically ambitious).

The CSR storage (compressed sparse row) more or less fits my need. It takes only O(2 * C + N) memory, consisting of two lists of size C, and an array(list) of size N. Element accessing is O(c), where c is the (max) number of non-zero elements in a row. An element can be added to the matrix by an insertion each in the two lists. Thus python’s list finds itself useful for this job. Though list.insert is O(n), it is super-optimized in CPython. More so, operations in the gaussian elimination are row-based, that is, complete rows are modified in a single operation. The CSR is well-suited for this.

As the structure is slightly non-trivial, the algorithms would be a little complicated. But I think we’ll have to live with it, as CSR is a popular and powerful representation. Still, the algorithms would be much less complicated than the ones given in BLAS/LAPACK.

Thus, after investing/wasting much of my time in trying to implement the gaussian elimination for the DOKMatrix, I conclude that it cannot be done. I would like to implement the CSR sparse structure, and implement the gaussian elimination for CSR. It would be slow, as compared to DOK, but still orders of magnitude faster than the dense version.

]]>

The two fastest algorithms for symbolic cholesky decomposition in the research paper is algorithm 4.2, based on row structure, and algorithm 1.3, which is column-based. Algorithm 2.4 merely involves pre-computing the elimination tree and using it in conjunction with algorithm 4.2. This division of work increases the performance of the row-based algorithm. Algorithm 1.3, as it is column based, and elimination trees are also column based, does not need any pre-computation of the elimination tree. Both the symbolic decomposition algorithms are O(N * c), and the elimination tree finding algorithm is O(N), N being the dimension of the matrix, C the total number of non-zeros in the matrix, and c the average number of non-zeros in a row/col of a matrix

The cholesky recurse relations, incidentally, are also row-based. As in, they involve dot products of row-vectors in a matrix. So algorithm 4.2 was the most obvious choice to use, as it gave me the row-structure of L, which I use later in the numerical factorization to sparsify the dot of row vectors, since I have knowledge of which elements in a row is non-zero. I was able to exploit this to write the numerical factorization in O(C * c ** 2).

It factorizes a 1000 * 1000 matrix of sparsity 0.01657 in 506 ms, a 1000 * 1000 matrix of sparsity 0.004702 in 105 ms, a 100 * 100 of sparsity 0.331 in 174 ms, a 100 * 100 matrix of sparsity 0.6194 in 772 ms.

These tests were done taking a banded sparse matrix as test matrix, and python floats as elements.

I will upload more exhaustive tests when I’m finished optimizing both the decompositions.

The WIP code can be found at [2] if anyone’s interested.

The next few days will be devoted to optimizing and cleaning this code, and writing matrix functionalities which make use of the factorizations.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.7506&rep=rep1&type=pdf

[2] https://github.com/sherjilozair/sympy/tree/dokmatrix

]]>

Phase 2 involves class abstraction, and making the Matrix class use groundtypes like Polys. I haven’t figured out the details on how I would do this and have only a general idea.

I’m currently using the DOK(Dictionary of Keys) structure to write DOKMatrix class. Python’s dict works great, and is highly optimized and thus it makes sense to make use of its performance.

The problem with DOK is, iterating over non-zero elements is not supported intuitively. dict.keys() gives me all the non-zero positions, and by sorting and ordering the keys, we can have the LIL(list of list) representation which provides very efficient row and col iterators. And row and col iterators are much needed, in gaussian elimination for example.

The question now is, Do I maintain two representations ? Or do I calculate LIL when required ?

The ideal way, I think, would be to have both representations in different classes, and convert the DOK matrix to LIL for the algorithm and then back again. But this GSoC would be too less time to make both DOK and LIL.

There are two ways to go about this now. One way is to calculate the LIL each time it is needed. Or to use a cache. I invite discussion on what to do in this matter.

I think I will be able to write the gaussian elimination in 10 – 15 days, along with all the auxiliary functions that are required of it.

Comments, suggestions and ideas are most welcome.

]]>