Fastest way for boolean matrix computations

Franc Weser :

I have a boolean matrix with 1.5E6 rows and 20E3 columns, similar to this example:

M = [[ True,  True, False,  True, ...],
     [False,  True,  True,  True, ...],
     [False, False, False, False, ...],
     [False,  True, False, False, ...],
     ...
     [ True,  True, False, False, ...]
     ]

Also, I have another matrix N ( 1.5E6 rows, 1 column):

 N = [[ True],
      [False],
      [ True],
      [ True],
      ...
      [ True]
      ]

What I need to do, is to go through each column pair from matrix M (1&1, 1&2, 1&3, 1&N, 2&1, 2&2 etc) combined by the AND operator, and count how many overlaps there are between the result and matrix N.

My Python/Numpy code would look like this:

for i in range(M.shape[1]):
  for j in range(M.shape[1]):
    result = M[:,i] & M[:,j] # Combine the columns with AND operator
    count = np.sum(result & N.ravel()) # Counts the True occurrences
    ... # Save the count for the i and j pair

The problem is, going through 20E3 x 20E3 combinations with two for loops is computationally expensive (takes around 5-10 days to compute). A better option I tried is comparing each column to the whole matrix M:

for i in range(M.shape[1]):
  result = M[:,i]*M.shape[1] & M # np.tile or np.repeat is used to horizontally repeat the column
  counts = np.sum(result & N*M.shape[1], axis=0)
  ... # Save the counts

This reduces overhead and calculation time to around 10%, but it's still taking 1 day or so to compute.

My question would be :
what is the fastest way (non Python maybe?) to make these calculations (basically just AND and SUM)?

I was thinking about low level languages, GPU processing, quantum computing etc.. but I don't know much about any of these so any advice regarding the direction is appreciated!

Additional thoughts: Currently thinking if there is a fast way using the dot product (as Davikar proposed) for computing triplets of combinations:

def compute(M, N):
    out = np.zeros((M.shape[1], M.shape[1], M.shape[1]), np.int32)
    for i in range(M.shape[1]):
        for j in range(M.shape[1]):
            for k in range(M.shape[1]):
                result = M[:, i] & M[:, j] & M[:, k]
                out[i, j, k] = np.sum(result & N.ravel())
    return out
Divakar :

Simply use np.einsum to get all the counts -

np.einsum('ij,ik,i->jk',M,M.astype(int),N.ravel())

Feel free to play around with optimize flag with np.einsum. Also, feel free to play around with different dtypes conversion.

To leverage GPU, we can use tensorflow package that also supports einsum.

Faster alternatives with np.dot :

(M&N).T.dot(M.astype(int))
(M&N).T.dot(M.astype(np.float32))

Timings -

In [110]: np.random.seed(0)
     ...: M = np.random.rand(500,300)>0.5
     ...: N = np.random.rand(500,1)>0.5

In [111]: %timeit np.einsum('ij,ik,i->jk',M,M.astype(int),N.ravel())
     ...: %timeit (M&N).T.dot(M.astype(int))
     ...: %timeit (M&N).T.dot(M.astype(np.float32))
227 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
66.8 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
3.26 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And take it a bit further with float32 conversions for both of the boolean arrays -

In [122]: %%timeit
     ...: p1 = (M&N).astype(np.float32)
     ...: p2 = M.astype(np.float32)
     ...: out = p1.T.dot(p2)
2.7 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=8469&siteId=1