Find unique pairs in list of pairs
You could scan the list from start to end, while maintaining a map
of encountered pairs to their first position. Whenever you process a pair, you check to see if you've encountered it before. If that's the case, both the first encounter's index in b and the current encounter's index must be set to False. Otherwise, we just add the current index to the map of encountered pairs and change nothing about b. b will start initially all True
. To keep things equivalent wrt [1,2]
and [2,1]
, I'd first simply sort the pair, to obtain a stable representation. The code would look something like this:
def proc(a):
b = [True] * len(a) # Better way to allocate this
filter = {}
idx = 0
for p in a:
m = min(p)
M = max(p)
pp = (m, M)
if pp in filter:
# We've found the element once previously
# Need to mark both it and the current value as "False"
# If we encounter pp multiple times, we'll set the initial
# value to False multiple times, but that's not an issue
b[filter[pp]] = False
b[idx] = False
else:
# This is the first time we encounter pp, so we just add it
# to the filter for possible later encounters, but don't affect
# b at all.
filter[pp] = idx
idx++
return b
The time complexity is O(len(a))
which is good, but the space complexity is also O(len(a))
(for filter
), so this might not be so great. Depending on how flexible you are, you can use an approximate filter such as a Bloom filter.
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
We can use Counter to get counts of each list (turn list to frozenset to ignore order) and then for each list check if it only appears once.
#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res
wich gives :
[[3, 5]]
[False, False, False, True, False]
Here's a solution with NumPy that 10 times faster than the suggested frozenset
solution:
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)
Sorting is fast and makes sure that the edges
[i, j]
,[j, i]
in the original array identify with each other. Much faster thanfrozenset
s ortuple
s.Row uniquification inspired by https://stackoverflow.com/a/16973510/353337.
Speed comparison for different array sizes:
The plot was created with
from collections import Counter
import numpy
import perfplot
def fs(a):
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
return b
def with_numpy(a):
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
res = ct[inv] == 1
return res
perfplot.save(
"out.png",
setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
kernels=[fs, with_numpy],
labels=["frozenset", "numpy"],
n_range=[2 ** k for k in range(15)],
xlabel="len(a)",
)