Python sparse implementation of sparse matrix

In engineering practice, in most cases, large matrices are generally sparse matrices, so how to deal with sparse matrices is very important in practice. This article takes the implementation in python as an example, first to discuss how sparse matrices are stored and represented.

1. A preliminary study
of the sparse module In the scipy module in python, there is a module called the sparse module, which is specially designed to solve sparse matrices. Most of the content of this article is actually based on the sparse module.
The first step is naturally to import the sparse module

from scipy import sparse

2.coo_matrix
coo_matrix is ​​the simplest storage method. Three arrays row, col and data are used to store the information of non-zero elements .
The three arrays have the same length, row holds the row of the element, col holds the column of the element, and data holds the value of the element. Generally speaking, coo_matrix is ​​mainly used to create matrices, because coo_matrix cannot perform operations such as additions, deletions and modifications to the elements of the matrix. Once the matrix is ​​successfully created, it will be converted into other forms of matrices.

>>> row = [2,2,3,2]
>>> col = [3,4,2,3]
>>> c = sparse.coo_matrix((data,(row,col)),shape=(5,6))
>>> print c.toarray()
[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 5 2 0]
 [0 0 3 0 0 0]
 [0 0 0 0 0 0]]
from scipy.sparse import coo_matrix, hstack,vstack
A = coo_matrix([[1, 2], [2, 4]])
print(A.toarray())

result:

(0, 0)  1
  (0, 1)    2
  (1, 0)    2
  (1, 1)    4

3.dok_matrix
and lil_matrix The applicable scenario of dok_matrix and lil_matrix is ​​to gradually add elements of the matrix. The strategy of doc_matrix is ​​to use a dictionary to record the elements in the matrix that are not 0. Naturally, the key of the dictionary stores the tuple of the location information of the record element, and the value is the specific value of the record element.

>>> import numpy as np
>>> from scipy.sparse import dok_matrix
>>> S = dok_matrix((5, 5), dtype=np.float32)
>>> for i in range(5):
...     for j in range(5):
...             S[i, j] = i + j
...
>>> print S.toarray()
[[ 0.  1.  2.  3.  4.]
 [ 1.  2.  3.  4.  5.]
 [ 2.  3.  4.  5.  6.]
 [ 3.  4.  5.  6.  7.]
 [ 4.  5.  6.  7.  8.]]

lil_matrix uses two lists to store non-zero elements. data holds the non-zero elements in each row, and rows holds the columns where the non-zero elements are located. This format is also great for adding elements one by one, and getting row-related data quickly.

>>> from scipy.sparse import lil_matrix
>>> l = lil_matrix((6,5))
>>> l[2,3] = 1
>>> l[3,4] = 2
>>> l[3,2] = 3
>>> print l.toarray()
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  3.  0.  2.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
>>> print l.data
[[] [] [1.0] [3.0, 2.0] [] []]
>>> print l.rows
[[] [] [3] [2, 4] [] []]

It is easy to see from the above analysis that the above two methods of constructing sparse matrices are generally used to construct matrices by gradually adding non-zero elements, and then convert them into other matrix storage methods that can be quickly calculated.
4.dia_matrix
This is a diagonal storage method. where the columns represent the diagonals and the rows represent the rows. Omit if all elements on the diagonal are 0.
If the original matrix is ​​a good diagonal matrix, the compression rate will be very high.
Looking for a picture on the Internet, everyone can easily understand the principle.
write picture description here

5.csr_matrix and csc_matrix
csr_matrix, whose full name is Compressed Sparse Row, compresses the matrix by row. CSR requires three types of data: value, column number, and row offset. CSR is a coding method, in which the meaning of the value and column number is the same as that in coo. The row offset represents the starting offset of the first element of a row in values.
I also found a picture on the Internet, which can better reflect the principle.
write picture description here
See how to use it in python:

>>> from scipy.sparse import csr_matrix
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

Reprinted: https://blog.csdn.net/bitcarmanlee/article/details/52668477

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325653658&siteId=291194637