Python Numpy vectorize nested for-loops for combinatorics

This solution is 5x faster for n=100:

coms = np.fromiter(itertools.combinations(np.arange(n), 3), 'i,i,i').view(('i', 3))
best = A[coms].min(1).max(1)
at = best.argmin()
global_best = best[at]
save_rows = coms[at]

The first line is a bit convoluted but turns the result of itertools.combinations into a NumPy array which contains all possible [i,j,k] index combinations.

From there, it's a simple matter of indexing into A using all the possible index combinations, then reducing along the appropriate axes.

This solution consumes a lot more memory as it builds the concrete array of all possible combinations A[coms]. It saves time for smallish n, say under 250, but for large n the memory traffic will be very high and it may be slower than the original code.


Working by chunks allows to combine the speed of vectorized calculus while avoiding to run into Memory Errors. Below there is an example of converting the nested loops to vectorization by chunks.

Starting from the same variables as the question, a chunk length is defined, in order to vectorize calculations inside the chunk and loop only over chunks instead of over combinations.

chunk = 2000 # define chunk length, if to small, the code won't take advantage 
             # of vectorization, if it is too large, excessive memory usage will 
             # slow down execution, or Memory Error will be risen 
combinations = itertools.combinations(range(n),3) # generate iterator containing 
                                        # all possible combinations of 3 columns
N = n*(n-1)*(n-2)//6 # number of combinations (length of combinations cannot be 
                     # retrieved because it is an iterator)
# generate a list containing how many elements of combinations will be retrieved 
# per iteration
n_chunks, remainder = divmod(N,chunk)
counts_list = [chunk for _ in range(n_chunks)]
if remainder:
    counts_list.append(remainder)

# Iterate one chunk at a time, using vectorized code to treat the chunk
for counts in counts_list:
    # retrieve combinations in current chunk
    current_comb = np.fromiter(combinations,dtype='i,i,i',count=counts)\
                     .view(('i',3)) 
    # maximum of element-wise minimum in current chunk
    chunk_best = np.minimum(np.minimum(A[current_comb[:,0],:],A[current_comb[:,1],:]),
                            A[current_comb[:,2],:]).max(axis=1) 
    ravel_save_row = chunk_best.argmin() # minimum of maximums in current chunk
    # check if current chunk contains global minimum
    if chunk_best[ravel_save_row] < global_best: 
        global_best = chunk_best[ravel_save_row]
        save_rows = current_comb[ravel_save_row]
print(global_best,save_rows)

I ran some performance comparisons with the nested loops, obtaining the following results (chunk_length = 1000):

  • n=100
    • Nested loops: 1.13 s ± 16.6 ms
    • Work by chunks: 108 ms ± 565 µs
  • n=150
    • Nested loops: 4.16 s ± 39.3 ms
    • Work by chunks: 523 ms ± 4.75 ms
  • n=500
    • Nested loops: 3min 18s ± 3.21 s
    • Work by chunks: 1min 12s ± 1.6 s

Note

After profiling the code, I found that the np.min was what took longest by calling np.maximum.reduce. I converted it directly to np.maximum which improved performance a bit.


You can use combinations from itertools, that it's a python standard library, and it will help you to to remove all those nested loops.

from itertools import combinations
import numpy as np

n = 100
np.random.seed(2)
A = np.random.rand(n,n)
global_best = 1000000000000000.0

for i, j, k in combinations(range(n), 3):
    local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))
    if local_best < global_best:
        global_best = local_best
        save_rows = [i, j, k]

print global_best, save_rows