How to retain N smallest elements in a given row of numpy array?

Ashwin Geet D'Sa :

Given a 2-D numpy matrix, how to retain the N smallest elements in each row and change rest of them to 0(zero).

For example: N=3 Input array:

1   2   3   4   5
4   3   6   1   0
6   5   3   1   2

Expected output:

1   2   3   0   0
0   3   0   1   0
0   0   3   1   2

Following is the code that I have tried and it works:

# distance_matrix is the given 2D array
N=3
for i in range(distance_matrix.shape[0]):
    n_th_largest = np.sort(distance_matrix[i])[N]
    for j in range(distance_matrix.shape[1]):
        distance_matrix[i][j] = np.where(distance_matrix[i][j]<n_th_largest,distance_matrix[i][j],0)

# return distance_matrix

However, this operation involves iterating over every single element. Is there a faster way to solve this using np.argsort() or any other function?

Divakar :

Approach #1

Here's one with np.argpartition for performance efficiency -

N = 3
newval = 0
np.put_along_axis(a,np.argpartition(a,N,axis=1)[:,N:],newval,axis=1)

Explanation : We partition the input array to get indices that are partitioned-across for the kth argument in np.argpartition. So, basically consider this as two partitions, with first one for smallest N elements along that axis and the other for the rest. We need to reset the second partition, which we select with [:,N:] and we use np.put_along_axis to do the resetting.

Sample run -

In [144]: a # input array
Out[144]: 
array([[1, 2, 3, 4, 5],
       [4, 3, 6, 1, 0],
       [6, 5, 3, 1, 2]])

In [145]: np.put_along_axis(a,np.argpartition(a,3,axis=1)[:,3:],0,axis=1)

In [146]: a
Out[146]: 
array([[1, 2, 3, 0, 0],
       [0, 3, 0, 1, 0],
       [0, 0, 3, 1, 2]])

Approach #2

Here's another again with np.argpartition, but just slicing the Nth smallest element per row and then resetting all greater than it. As such, if there are duplicates for the Nth smallest element, we will keep all those with this method. Here's the implementation -

a[a>=a[np.arange(len(a)), np.argpartition(a,3,axis=1)[:,3],None]] = 0

Timings on a scaled up version -

In [184]: a = np.array([[1,2,3,4,5],[4,3,6,1,0],[6,5,3,1,2]])

In [185]: a = np.repeat(a,10000,axis=0)

In [186]: %timeit np.put_along_axis(a,np.argpartition(a,3,axis=1)[:,3:],0,axis=1)
1.78 ms ± 5.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [187]: a = np.array([[1,2,3,4,5],[4,3,6,1,0],[6,5,3,1,2]])

In [188]: a = np.repeat(a,10000,axis=0)

In [189]: %timeit a[a>=a[np.arange(len(a)), np.argpartition(a,3,axis=1)[:,3],None]] = 0
1.54 ms ± 54.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=361111&siteId=1