Compute the pairwise distance in scipy with missing values

If I understand you correctly, you want the distance for all dimensions that two vector have valid values for.

Unfortunately pdist doesn't understand masked arrays in that sense, so I modified your semi-solution to not reduce information. It is however not the most efficient solution, nor most readable:

np.array([pdist(data[s][:, ~numpy.isnan(data[s]).any(axis=0)], "euclidean") for s in map(list, itertools.combinations(range(data.shape[0]), 2))]).ravel()

The outer making it to an array and ravel is just to get it in a matching shape to what you would expect.

itertools.combinations produces all pairwise possible indices of the data-array.

I then just slice data on these (must be a list and not a tuple to slice correctly) and do the pairwise filtering of nan just as your code did.