The intersection of two sorted arrays

Use set_intersection as here. The usual implementation would work similar to the merge part of merge-sort algorithm.


Since this looks like a HW...I'll give you the algorithm:

Let arr1,arr2 be the two sorted arrays of length La and Lb.
Let i be index into the array arr1.
Let j be index into the array arr2.
Initialize i and j to 0.

while(i < La and j < Lb) do

    if(arr1[i] == arr2[j]) { // found a common element.
        print arr[i] // print it.
        increment i // move on.
        increment j
    }
    else if(arr1[i] > arr2[j])
        increment j // don't change i, move j.
    else
        increment i // don't change j, move i.
end while

I've been struggling with same problem for a while now, so far I came with:

  1. Linear matching which will yield O(m+n) in worst case. You basically keep two pointers (A and B) each pointing to beginning of each array. Then advance pointer which points to smaller value, until you reach end of one of arrays, that would indicate no intersection. If at any point you have *A == *B - here comes your intersection.

  2. Binary matching. Which yields ~ O(n*log(m)) in worst case. You basically pick smaller array and perform binary search in bigger array of all elements of the smaller array. If you want to be more fancy, you can even use last position where binary search failed and use it as starting point for next binary search. This way you marginally improve worst case but for some sets it might perform miracles :)

  3. Double binary matching. It's a variation of regular binary matching. Basically you get element from the middle of smaller array and do binary search in bigger array. If you find nothing then you cut smaller array in half (and yes you can toss element you already used) and cut bigger array in half (use binary search failure point). And then repeat for each pair. Results are better than O(n*log(m)) but I am too lazy to calculate what they are.

Those are two most basic ones. Both have merits. Linear is a bit easier to implement. Binary one is arguably faster (although there are plenty of cases when linear matching will outperform binary).

If anyone knows anything better than that I would love to hear it. Matching arrays is what I do these days.

P.S. don't quote me on terms "linear matching" and "binary matching" as I made them up myself and there are probably fancy name for it already.