What is the fastest algorithm for intersection of two sorted lists?
Assume that A
has m
elements and B
has n
elements, with m ≥ n
. Information theoretically, the best we can do is
(m + n)!
lg -------- = n lg (m/n) + O(n)
m! n!
comparisons, since in order to verify an empty intersection, we essentially have to perform a sorted merge. We can get within a constant factor of this bound by iterating through B
and keeping a "cursor" in A
indicating the position at which the most recent element of B
should be inserted to maintain sorted order. We use exponential search to advance the cursor, for a total cost that is on the order of
lg x_1 + lg x_2 + ... + lg x_n,
where x_1 + x_2 + ... + x_n = m + n
is some integer partition of m
. This sum is O(n lg (m/n))
by the concavity of lg
.
I don't know if this is the fastest option but here's one that runs in O(n+m)
where n
and m
are the sizes of your lists:
- Loop over both lists until one of them is empty in the following way:
- Advance by one on one list.
- Advance on the other list until you find a value that is either equal or greater than the current value of the other list.
- If it is equal, the element belongs to the intersection and you can append it to another list
- If it is greater that the other element, advance on the other list until you find a value equal or greater than this value
- as said, repeat this until one of the lists is empty