Update the value of csr_matrix
Convenient way
Original link The
sparse matrix is as follows
import numpy as np
from scipy.sparse import
x = csr_matrix(np.array([[1, 0, 2, 0, 3],
[0, 4, 0, 5, 0]]))
When you need to modify the element whose element is less than 3 to 0, it is easy to think of this way
x[x < 3] = 0
But this method will report the following warning in the case of a large sparse matrix
/home/miniconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py:282: SparseEfficiencyWarning: Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
warn(bad_scalar_msg, SparseEfficiencyWarning)
Because in the case of a large sparse matrix, most of the elements are 0. When <
judging by the number, many elements that are 0 also need to be re-assigned, so this method is extremely bad. You can see that >=
the judgment mentioned in the warning is more Efficient.
Then, we can choose this way, not all elements less than 3 are set to 0, but all non-zero elements are set to 0.
nonzero_mask = np.array(x[x.nonzero()] < 3)[0]
Then you can get the corresponding ranks
rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]
x[rows, cols] = 0
print(x.todense())
# 最后消除掉0
x.eliminate_zeros() # This happens inplace
[[0 0 0 0 3]
[0 4 0 5 0]]
My realization...
In the development process, the operation of modifying the value of the sparse matrix is often involved. Previously csr_matrix
, the attribute traversal modification was often directly passed . You need to traverse yourself to modify the content in the data.
import numpy as np
from scipy.sparse import csr_matrix
x = csr_matrix(np.array([[1, 0, 2, 0, 3],
[0, 4, 0, 5, 0]]))
print(x)
print(x.toarray())
row, col = x.shape
# 当data[i][j] < 3时设置为0
for i in range(row):
for r in range(x.indptr[i], x.indptr[i+1]):
if x.data[r] < 3:
x.data[r] = 0
print(x.toarray())
The output results are as follows, if you are not clear about the indptr and data attributes in csr_matrix, you can refer to the storage method of sparse matrix
1---
(0, 0) 1
(0, 2) 2
(0, 4) 3
(1, 1) 4
(1, 3) 5
2---
[[1 0 2 0 3]
[0 4 0 5 0]]
3---
[[0 0 0 0 3]
[0 4 0 5 0]]