How to find the kth smallest element in the union of two sorted arrays?

I hope I am not answering your homework, as it has been over a year since this question was asked. Here is a tail recursive solution that will take log(len(a)+len(b)) time.

Assumption: The inputs are correct, i.e., k is in the range [0, len(a)+len(b)].

Base cases:

If length of one of the arrays is 0, the answer is kth element of the second array.

Reduction steps:

If mid index of a + mid index of b is less than k:
- If mid element of a is greater than mid element of b, we can ignore the first half of b, adjust k.
- Otherwise, ignore the first half of a, adjust k.
If k is less than sum of mid indices of a and b:
- If mid element of a is greater than mid element of b, we can safely ignore second half of a.
- Otherwise, we can ignore second half of b.

Code:

def kthlargest(arr1, arr2, k):
    if len(arr1) == 0:
        return arr2[k]
    elif len(arr2) == 0:
        return arr1[k]

    mida1 = len(arr1) // 2  # integer division
    mida2 = len(arr2) // 2
    if mida1 + mida2 < k:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1, arr2[mida2+1:], k - mida2 - 1)
        else:
            return kthlargest(arr1[mida1+1:], arr2, k - mida1 - 1)
    else:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1[:mida1], arr2, k)
        else:
            return kthlargest(arr1, arr2[:mida2], k)

Please note that my solution is creating new copies of smaller arrays in every call, this can be easily eliminated by only passing start and end indices on the original arrays.

Many people answered this "kth smallest element from two sorted array" question, but usually with only general ideas, not a clear working code or boundary conditions analysis.

Here I'd like to elaborate it carefully with the way I went though to help some novices to understand, with my correct working Java code. A1 and A2 are two sorted ascending arrays, with size1 and size2 as length respectively. We need to find the k-th smallest element from the union of those two arrays. Here we reasonably assume that (k > 0 && k <= size1 + size2), which implies that A1 and A2 can't be both empty.

First, let's approach this question with a slow O(k) algorithm. The method is to compare the first element of both array, A1[0] and A2[0]. Take the smaller one, say A1[0] away into our pocket. Then compare A1[1] with A2[0], and so on. Repeat this action until our pocket reached k elements. Very important: In the first step, we can only commit to A1[0] in our pocket. We can NOT include or exclude A2[0]!!!

The following O(k) code gives you one element before the correct answer. Here I use it to show my idea, and analysis boundary condition. I have correct code after this one:

private E kthSmallestSlowWithFault(int k) {
    int size1 = A1.length, size2 = A2.length;

    int index1 = 0, index2 = 0;
    // base case, k == 1
    if (k == 1) {
        if (size1 == 0) {
            return A2[index2];
        } else if (size2 == 0) {
            return A1[index1];
        } else if (A1[index1].compareTo(A2[index2]) < 0) {
            return A1[index1];
        } else {
            return A2[index2];
        }
    }

    /* in the next loop, we always assume there is one next element to compare with, so we can
     * commit to the smaller one. What if the last element is the kth one?
     */
    if (k == size1 + size2) {
        if (size1 == 0) {
            return A2[size2 - 1];
        } else if (size2 == 0) {
            return A1[size1 - 1];
        } else if (A1[size1 - 1].compareTo(A2[size2 - 1]) < 0) {
            return A1[size1 - 1];
        } else {
            return A2[size2 - 1];
        }
    }

    /*
     * only when k > 1, below loop will execute. In each loop, we commit to one element, till we
     * reach (index1 + index2 == k - 1) case. But the answer is not correct, always one element
     * ahead, because we didn't merge base case function into this loop yet.
     */
    int lastElementFromArray = 0;
    while (index1 + index2 < k - 1) {
        if (A1[index1].compareTo(A2[index2]) < 0) {
            index1++;
            lastElementFromArray = 1;
            // commit to one element from array A1, but that element is at (index1 - 1)!!!
        } else {
            index2++;
            lastElementFromArray = 2;
        }
    }
    if (lastElementFromArray == 1) {
        return A1[index1 - 1];
    } else {
        return A2[index2 - 1];
    }
}

The most powerful idea is that in each loop, we always use the base case approach. After committed to the current smallest element, we get one step closer to the target: the k-th smallest element. Never jump into the middle and make yourself confused and lost!

By observing the above code base case k == 1, k == size1+size2, and combine with that A1 and A2 can't both be empty. We can turn the logic into below more concise style.

Here is a slow but correct working code:

private E kthSmallestSlow(int k) {
    // System.out.println("this is an O(k) speed algorithm, very concise");
    int size1 = A1.length, size2 = A2.length;

    int index1 = 0, index2 = 0;
    while (index1 + index2 < k - 1) {
        if (size1 > index1 && (size2 <= index2 || A1[index1].compareTo(A2[index2]) < 0)) {
            index1++; // here we commit to original index1 element, not the increment one!!!
        } else {
            index2++;
        }
    }
    // below is the (index1 + index2 == k - 1) base case
    // also eliminate the risk of referring to an element outside of index boundary
    if (size1 > index1 && (size2 <= index2 || A1[index1].compareTo(A2[index2]) < 0)) {
        return A1[index1];
    } else {
        return A2[index2];
    }
}

Now we can try a faster algorithm runs at O(log k). Similarly, compare A1[k/2] with A2[k/2]; if A1[k/2] is smaller, then all the elements from A1[0] to A1[k/2] should be in our pocket. The idea is to not just commit to one element in each loop; the first step contains k/2 elements. Again, we can NOT include or exclude A2[0] to A2[k/2] anyway. So in the first step, we can't go more than k/2 elements. For the second step, we can't go more than k/4 elements...

After each step, we get much closer to k-th element. At the same time each step get smaller and smaller, until we reach (step == 1), which is (k-1 == index1+index2). Then we can refer to the simple and powerful base case again.

Here is the working correct code:

private E kthSmallestFast(int k) {
    // System.out.println("this is an O(log k) speed algorithm with meaningful variables name");
    int size1 = A1.length, size2 = A2.length;

    int index1 = 0, index2 = 0, step = 0;
    while (index1 + index2 < k - 1) {
        step = (k - index1 - index2) / 2;
        int step1 = index1 + step;
        int step2 = index2 + step;
        if (size1 > step1 - 1
                && (size2 <= step2 - 1 || A1[step1 - 1].compareTo(A2[step2 - 1]) < 0)) {
            index1 = step1; // commit to element at index = step1 - 1
        } else {
            index2 = step2;
        }
    }
    // the base case of (index1 + index2 == k - 1)
    if (size1 > index1 && (size2 <= index2 || A1[index1].compareTo(A2[index2]) < 0)) {
        return A1[index1];
    } else {
        return A2[index2];
    }
}

Some people may worry what if (index1+index2) jump over k-1? Could we miss the base case (k-1 == index1+index2)? That's impossible. You can add up 0.5+0.25+0.125..., and you will never go beyond 1.

Of course, it is very easy to turn the above code into recursive algorithm:

private E kthSmallestFastRecur(int k, int index1, int index2, int size1, int size2) {
    // System.out.println("this is an O(log k) speed algorithm with meaningful variables name");

    // the base case of (index1 + index2 == k - 1)
    if (index1 + index2 == k - 1) {
        if (size1 > index1 && (size2 <= index2 || A1[index1].compareTo(A2[index2]) < 0)) {
            return A1[index1];
        } else {
            return A2[index2];
        }
    }

    int step = (k - index1 - index2) / 2;
    int step1 = index1 + step;
    int step2 = index2 + step;
    if (size1 > step1 - 1 && (size2 <= step2 - 1 || A1[step1 - 1].compareTo(A2[step2 - 1]) < 0)) {
        index1 = step1;
    } else {
        index2 = step2;
    }
    return kthSmallestFastRecur(k, index1, index2, size1, size2);
}

Hope the above analysis and Java code could help you to understand. But never copy my code as your homework! Cheers ;)

You've got it, just keep going! And be careful with the indexes...

To simplify a bit I'll assume that N and M are > k, so the complexity here is O(log k), which is O(log N + log M).

Pseudo-code:

i = k/2
j = k - i
step = k/4
while step > 0
    if a[i-1] > b[j-1]
        i -= step
        j += step
    else
        i += step
        j -= step
    step /= 2

if a[i-1] > b[j-1]
    return a[i-1]
else
    return b[j-1]

For the demonstration you can use the loop invariant i + j = k, but I won't do all your homework :)

How to find the kth smallest element in the union of two sorted arrays?

Tags:

Algorithm

Arrays

Binary Search

Divide And Conquer

Related

Recent Posts