Modify the value of csr_matrix

Update the value of csr_matrix

Convenient way

Original link The
sparse matrix is ​​as follows

import numpy as np
from scipy.sparse import

x = csr_matrix(np.array([[1, 0, 2, 0, 3], 
                         [0, 4, 0, 5, 0]]))

When you need to modify the element whose element is less than 3 to 0, it is easy to think of this way

x[x < 3] = 0

But this method will report the following warning in the case of a large sparse matrix

/home/miniconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py:282: SparseEfficiencyWarning: Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
  warn(bad_scalar_msg, SparseEfficiencyWarning)

Because in the case of a large sparse matrix, most of the elements are 0. When <judging by the number, many elements that are 0 also need to be re-assigned, so this method is extremely bad. You can see that >=the judgment mentioned in the warning is more Efficient.

Then, we can choose this way, not all elements less than 3 are set to 0, but all non-zero elements are set to 0.

nonzero_mask = np.array(x[x.nonzero()] < 3)[0]

Then you can get the corresponding ranks

rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]
x[rows, cols] = 0

print(x.todense())
# 最后消除掉0
x.eliminate_zeros()  # This happens inplace
[[0 0 0 0 3]
 [0 4 0 5 0]]

My realization...

In the development process, the operation of modifying the value of the sparse matrix is ​​often involved. Previously csr_matrix, the attribute traversal modification was often directly passed . You need to traverse yourself to modify the content in the data.
Insert picture description here

    import numpy as np
    from scipy.sparse import csr_matrix

    x = csr_matrix(np.array([[1, 0, 2, 0, 3], 
                            [0, 4, 0, 5, 0]]))
    print(x)
    print(x.toarray())
    row, col = x.shape
    # 当data[i][j] < 3时设置为0
    for i in range(row):
        for r in range(x.indptr[i], x.indptr[i+1]):
            if x.data[r] < 3:
                x.data[r] = 0
    print(x.toarray())

The output results are as follows, if you are not clear about the indptr and data attributes in csr_matrix, you can refer to the storage method of sparse matrix

1--- 
   (0, 0)       1
  (0, 2)        2
  (0, 4)        3
  (1, 1)        4
  (1, 3)        5
2--- 
 [[1 0 2 0 3]
 [0 4 0 5 0]]
3--- 
 [[0 0 0 0 3]
 [0 4 0 5 0]]

Guess you like

Origin blog.csdn.net/qq_32507417/article/details/111639904