Find unique pairs in list of pairs

You could scan the list from start to end, while maintaining a map of encountered pairs to their first position. Whenever you process a pair, you check to see if you've encountered it before. If that's the case, both the first encounter's index in b and the current encounter's index must be set to False. Otherwise, we just add the current index to the map of encountered pairs and change nothing about b. b will start initially all True. To keep things equivalent wrt [1,2] and [2,1], I'd first simply sort the pair, to obtain a stable representation. The code would look something like this:

def proc(a):
  b = [True] * len(a) # Better way to allocate this
  filter = {}
  idx = 0
  for p in a:
    m = min(p)
    M = max(p)
    pp = (m, M)
    if pp in filter:
      # We've found the element once previously
      # Need to mark both it and the current value as "False"
      # If we encounter pp multiple times, we'll set the initial
      # value to False multiple times, but that's not an issue
      b[filter[pp]] = False
      b[idx] = False
    else:
      # This is the first time we encounter pp, so we just add it
      # to the filter for possible later encounters, but don't affect
      # b at all.
      filter[pp] = idx
    idx++
  return b

The time complexity is O(len(a)) which is good, but the space complexity is also O(len(a)) (for filter), so this might not be so great. Depending on how flexible you are, you can use an approximate filter such as a Bloom filter.


ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]

We can use Counter to get counts of each list (turn list to frozenset to ignore order) and then for each list check if it only appears once.


#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res

wich gives :

[[3, 5]]
[False, False, False, True, False]

Here's a solution with NumPy that 10 times faster than the suggested frozenset solution:

a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
    numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)
  • Sorting is fast and makes sure that the edges [i, j], [j, i] in the original array identify with each other. Much faster than frozensets or tuples.

  • Row uniquification inspired by https://stackoverflow.com/a/16973510/353337.

Speed comparison for different array sizes:

enter image description here

The plot was created with

from collections import Counter
import numpy
import perfplot


def fs(a):
    ctr = Counter(frozenset(x) for x in a)
    b = [ctr[frozenset(x)] == 1 for x in a]
    return b


def with_numpy(a):
    a = numpy.array(a)
    a.sort(axis=1)
    b = numpy.ascontiguousarray(a).view(
        numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
    )
    _, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
    res = ct[inv] == 1
    return res


perfplot.save(
    "out.png",
    setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
    kernels=[fs, with_numpy],
    labels=["frozenset", "numpy"],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(a)",
)

Tags:

Python

List