Fastest way to find unique combinations of list
Here's some Python code based on the generating function approach outlined in this Math Forum article. For each letter appearing in the input we create a polynomial 1 + x + x^2 + ... + x^k
, where k
is the number of times that the letter appears. We then multiply those polynomials together: the n
th coefficient of the resulting polynomial then tells you how many combinations of length n
there are.
We'll represent a polynomial simply as a list of its (integer) coefficients, with the first coefficient representing the constant term, the next coefficient representing the coefficient of x
, and so on. We'll need to be able to multiply such polynomials, so here's a function for doing so:
def polymul(p, q):
"""
Multiply two polynomials, represented as lists of coefficients.
"""
r = [0]*(len(p) + len(q) - 1)
for i, c in enumerate(p):
for j, d in enumerate(q):
r[i+j] += c*d
return r
With the above in hand, the following function computes the number of combinations:
from collections import Counter
from functools import reduce
def ncombinations(it, k):
"""
Number of combinations of length *k* of the elements of *it*.
"""
counts = Counter(it).values()
prod = reduce(polymul, [[1]*(count+1) for count in counts], [1])
return prod[k] if k < len(prod) else 0
Testing this on your examples:
>>> ncombinations("abcd", 2)
6
>>> ncombinations("abab", 2)
3
>>> ncombinations("abbb", 2)
2
>>> ncombinations("aaaa", 2)
1
And on some longer examples, demonstrating that this approach is feasible even for long-ish inputs:
>>> ncombinations("abbccc", 3) # the math forum example
6
>>> ncombinations("supercalifragilisticexpialidocious", 10)
334640
>>> from itertools import combinations # double check ...
>>> len(set(combinations(sorted("supercalifragilisticexpialidocious"), 10)))
334640
>>> ncombinations("supercalifragilisticexpialidocious", 20)
1223225
>>> ncombinations("supercalifragilisticexpialidocious", 34)
1
>>> ncombinations("supercalifragilisticexpialidocious", 35)
0
>>> from string import printable
>>> ncombinations(printable, 50) # len(printable)==100
100891344545564193334812497256
>>> from math import factorial
>>> factorial(100)//factorial(50)**2 # double check the result
100891344545564193334812497256
>>> ncombinations("abc"*100, 100)
5151
>>> factorial(102)//factorial(2)//factorial(100) # double check (bars and stars)
5151
Start with a regular recursive definition of combinations() but add a test to only recurse when the lead value at that level hasn't been used before:
def uniq_comb(pool, r):
""" Return an iterator over a all distinct r-length
combinations taken from a pool of values that
may contain duplicates.
Unlike itertools.combinations(), element uniqueness
is determined by value rather than by position.
"""
if r:
seen = set()
for i, item in enumerate(pool):
if item not in seen:
seen.add(item)
for tail in uniq_comb(pool[i+1:], r-1):
yield (item,) + tail
else:
yield ()
if __name__ == '__main__':
from itertools import combinations
pool = 'ABRACADABRA'
for r in range(len(pool) + 1):
assert set(uniq_comb(pool, r)) == set(combinations(pool, r))
assert dict.fromkeys(uniq_comb(pool, r)) == dict.fromkeys(combinations(pool, r))