Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

Here's a summary of Dimitris Andreou's link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a₁ + a₂ + ... + a_k = b₁

a₁² + a₂² + ... + a_k² = b₂

...

a₁^k + a₂^k + ... + a_k^k = b_k

Using Newton's identities, knowing b_i allows to compute

c₁ = a₁ + a₂ + ... a_k

c₂ = a₁a₂ + a₁a₃ + ... + a_k-1a_k

...

c_k = a₁a₂ ... a_k

If you expand the polynomial (x-a₁)...(x-a_k) the coefficients will be exactly c₁, ..., c_k - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means a_i are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c₁,...,c_k is prohibitely expensive, since e.g. c_k is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Z_q field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

Compute i-th powers of given numbers
Subtract to get sums of i-th powers of unknown numbers. Call the sums b_i.
Use Newton's identities to compute coefficients from b_i; call them c_i. Basically, c₁ = b₁; c₂ = (c₁b₁ - b₂)/2; see Wikipedia for exact formulas
Factor the polynomial x^k-c₁x^k-1 + ... + c_k.
The roots of the polynomial are the needed numbers a₁, ..., a_k.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Z_q, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.

You will find it by reading the couple of pages of Muthukrishnan - Data Stream Algorithms: Puzzle 1: Finding Missing Numbers. It shows exactly the generalization you are looking for. Probably this is what your interviewer read and why he posed these questions.

Now, if only people would start deleting the answers that are subsumed or superseded by Muthukrishnan's treatment, and make this text easier to find. :)

Also see sdcvvc's directly related answer, which also includes pseudocode (hurray! no need to read those tricky math formulations :)) (thanks, great work!).

We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.

We can then reduce the problem to

k1 + k2 = x
k1^2 + k2^2 = y

Where x and y are how far the sums are below the expected values.

Substituting gives us:

(x-k2)^2 + k2^2 = y

Which we can then solve to determine our missing numbers.

Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

Tags:

Algorithm

Math

Related

Recent Posts