Good way to create a similarity / distance matrix for a large dataset

If you have Mathematica 10.3 or above you can use DistanceMatrix:

DistanceMatrix[dataset2, DistanceFunction -> NormalizedSquaredEuclideanDistance]

I'm assuming the same data as defined by kglr, you have not given us an example. If you don't have Mathematica 10.3 there's still HierarchicalClustering`DistanceMatrix which is used in the same way.


dataset2 = RandomReal[1, {5, 7}]; (* this stands for dataset[[All,2;;]] in your case*)

dataset2 // MatrixForm

Mathematica graphics

output = Outer[NormalizedSquaredEuclideanDistance, dataset2, dataset2, 1];
output // MatrixForm

Mathematica graphics