Checking for duplicates in sublists
There are a lot of questions on this site already about deleting duplicates and filtering lists. You'll probably learn a lot if you take the time to search for and read some of them. From several you could learn that the second argument of DeleteDuplicates
is rarely a highly efficient way to approach a problem such as this and it is far better to use GatherBy
where possible.
Compare the performance:
ss = Subsets[Range@16, {8}];
DeleteDuplicates[ss, Times @@ Take[#1, 4] == Times @@ Take[#2, 4] &] // Timing // First
First /@ GatherBy[ss, Times @@ Take[#, 4] &] // Timing // First
1.919
0.
The second method is so fast we must run it multiple times slow it down:
Do[First /@ GatherBy[ss, Times @@ Take[#, 4] &], {400}] // Timing // First
1.919
So we see that GatherBy
is about 400 times faster on this size of problem. On larger sets it will be comparatively even faster as it has superior computational complexity.
For fun, as an additional optimization we can compute the product in a vectorized form (with tuning thanks to Michael E2) and extract our indices using Szabolcs's fine method from the second link below. This is roughly 850 times faster than DeleteDuplicates
:
Do[
With[{prod = Times @@ Take[ss\[Transpose], 4]},
ss[[GatherBy[Range@Length@prod, prod[[#]] &][[All, 1]]]]
],
{850}
] // Timing // First
1.903
Related examples:
How to represent a list as a cycle
How to efficiently find positions of duplicates?
DeleteDuplicates[] does not work as expected on floating point values
Sort+Union on a list
Intersection for lists of numeric data
How about:
DeleteDuplicates[list, Times @@ Take[#1, 4] == Times @@ Take[#2, 4] &]
If your L1
is long (length n
) it may be reasonable to calculate each Product
once rather than n-1
times like DeleteDuplicates
does.
Let's calculate those products and then we can compare only the results.
l2 = {Times @@ #[[ ;; 4]], #} & /@ list;
DeleteDuplicates[l2, #1[[ 1]] == #2[[ 1]] &][[;; , 2]]
We can see, it does matter:
n = 10^3;
L1 = RandomInteger[{1, 4}, {n, 7}];
DeleteDuplicates[L1,
Times @@ Take[#1, 4] == Times @@ Take[#2, 4] &
] // AbsoluteTiming // First
0.203125
DeleteDuplicates[{Times @@ #[[ ;; 4]], #} & /@ L1 ,
#1[[ 1]] == #2[[ 1]] &
][[ ;; , 2]] // AbsoluteTiming // First
0.078125