Lower bound in algorithmic puzzle

This is a very fun problem, which I have encountered in the literature as the knights and spies problem. Good computers are called knights, because they always tell the truth, while bad computers are called spies, because they say whatever they want. (Traditionally, a liar would be called a knave.) I will use this terminology because I feel it is more consistent with other liar/truth-teller puzzles.

Let me briefly discuss an alternative objective: figure out everyone's identity, not just a single knight's. I encourage you to think about this problem before reading further. This is an interesting problem, in part because it's exciting how many different bounds I have seen people come up with:

  • $n^2$ or $n(n-1)$ by asking everyone about everyone
  • $\Theta(n\sqrt{n})$ using a standard "square-root" trick or $\Theta(n\log{n})$ by being smarter about the recursion
  • $5n + O(1)$, $3n + O(1)$, and $2n + O(1)$ by increasingly efficient methods related to pairing up people
  • $3n/2 + O(1)$ optimally!

The 2009 paper "Knights, spies, games and ballot sequences" by Mark Wildon proves that the final bound is optimal, computing the exact value of the $O(1)$ for every $n$. Mark Wildon has a webpage with additional information about the problem which notes that this solution was previously published by Pavel M. Blecher in the 1983 paper "On a logical problem". The strategy used in these papers, which Wildon calls the "Spider Interrogation Strategy" is fantastic.

Your problem of identifying a single knight is more challenging! I'll go straight to the punch by saying that the optimal bound is $n-1 - H(n-1)$ where $H(k)$ is the number of 1 bits in the binary representation of $k$. I haven't carefully read the other answer to this question, but as far as I can tell, this bound can easily be achieved by a trivial modification of that strategy.

The upper bound isn't too bad, but the lower bound is much trickier. When my friends and I thought about (and solved!) this problem, we used a reduction to the following problem:

There are two players, the Questioner and God, and a collection of (nonnegative) numbers. The Questioner's aim is to achieve a violation of the (strict) triangle inequality, that is, have a single number be greater than or equal to the sum of the rest. God's aim is to not have this happen for as long as possible. Each move consists of the Questioner asking about two numbers, to which God replies "sum" or "difference"; the two numbers $a$ and $b$ are then replaced by either $a+b$ or $\lvert a-b\rvert$ accordingly. Since the number of numbers decreases by one each turn, eventually there will be two numbers and the (strict) triangle inequality is necessarily violated. If God manages to not lose before then, we say that "God wins".

Here are some fun facts:

  • Consider a starting position of $n$ ones. Then God wins if and only if $n$ is 1 more than a power of 2.

  • Suppose God has a winning position. Then his answer to any question the Questioner can ask is forced; that is, it is always the case that one of the two answers is losing. We call this "Uniqueness". The way we think about it is that God somehow has to be very careful in answering every single question, and indeed, if you look at optimal play in "endgame" positions, it's hard to predict what the correct answer for God is.

  • Suppose the starting position is $n=2^k+1$ ones. By the above two points, this is a winning position and God must carefully answer every question so as not to lose. Nonetheless, the answer to the first $(n-3)/2$ questions asked will necessarily be "sum".

So there's this phenomenon that in a certain class of positions, God answers blindly for the first half of the time, and then has to be very careful afterwards. It's very weird.

We finally classified winning positions. It turns out that the first bullet point above is true because $\binom{2n}{n}/2$ is odd only when $n$ is a power of 2.

Some more quick observations:

  • With regards to the reduction: this sum/difference game is just what you get if the spies are knaves. In that case, the knights and spies are just two groups of people which support themselves but accuse the others.

  • The lower bound $n-1-H(n-1)$ holds even if you're trying to find a knight or a spy.

Let me finish with some references to the literature. This result has been published many times! The term used in the literature for this game is "the majority game".

You have a group of $n$ items, each with one of $k$ labels. One of the labels is shared by a majority of the items. You are allowed to ask if two items have the same label, and the goal is to minimize the number of questions necessary to identify a majority element.

The best studied case is when $k=2$. The problem is often presented as played by a questioner and an answerer. This is not exactly the same game, because in principle spies could tell the truth. However, the lower bound reduction above is to the case when the spies are knaves, and then this is the same game.

Here's a summary of the relevant literature:

  • Michael J. Fischer and Steven L. Salzberg.
    Finding a majority among $n$ votes.
    J. Algorithms 3 (1982) 375--379.

    They consider the problem when $k$ is unknown, and a majority may or may not exist. They prove that the optimal bound is then $\lceil 3n/2 - 2\rceil$ questions. You may recognize this as the bound from Mark Wildon's paper. Now, I am a little confused because it doesn't seem to me that the problems map onto each other exactly, but I find it hard to imagine that it's an accident.

    Actually, the algorithm they present seems strikingly similar to Wildon's "spider interrogation strategy". Because Fischer and Salzberg are in a slightly different model (in particular one that is symmetric w.r.t. asking $x$ about $y$ and $y$ about $x$), you have to change some of the details. I don't think they don't exactly map onto each other (in terms of the questions asked), but they're similar.

  • Saks, Michael E. and Werman, Michael.
    On computing majority by comparisons.
    Combinatorica 11 (1991), no. 4, 383--387.

    They show that if $n$ is odd, the optimal bound is $n-H(n)$, where $H(n)$ is the number of 1-bits in $n$.

    Their analysis proceeds by first reformulating the game as between a selector and an assigner. A position in the game is a multiset of integers, and the rules are as in the sum-difference game, e.g. "the game ends when the largest number in the multiset is at least half the total." They then have a slick proof of the upper bound! (In their notation it's a lower bound.) I mention this because even though the upper bound isn't hard, some analyses gets mired in minor details.

    They then define a 2-adic valuation invariant which my friends and I also discovered. Their proof uses generating functions which ours did not, but I believe the actual invariant is the same as ours. Finally, they apply the invariant result to the position consisting of $2h+1$ ones by computing the valuation of $\binom{2h}{h}$.

  • Alonso, Laurent; Reingold, Edward M.; and Schott, René
    Determining the majority.
    Inform. Process. Lett. 47 (1993), no. 5, 253--255.

    They present a short proof of the previous paper's results. The upper bound uses the same neat trick. They then prove the lower bound with essentially the same 2-adic invariant technique, except with a different exposition. Admittedly, it's cleaner and doesn't use generating functions.

  • Wiener, Gábor
    Search for a majority element.
    International Conference (of the Forum for Interdisciplinary Mathematics) on Combinatorics, Information Theory and Statistics (Portland, ME, 1997). J. Statist. Plann. Inference 100 (2002), no. 2, 313--318.

    He proves some related results, including the following:

    If asking about $x$ and $y$ is an optimal question, then there exists an optimal question that doesn't ask about $x$. It follows that the optimal number of questions is monotonic, in the sense that adding another 1 into the position does not decrease the optimal number of questions.


Here's a more detailed explanation of the method I sketched in a comment. This method lowers the upper bound in some cases.

We'll perform an operation which (1) reduces the number of computers under consideration (by about half), and (2) preserves the fact that more than half of the computers under consideration are good. Repeating this operation eventually reduces the number of computers to 1, at which point the more-than-half condition tells us that the computer is good, and we're done.

The operation is this:

  1. Pair up all the computers arbitrarily. (Maybe one computer is left out; see step 3.)
  2. In each pair, ask one computer about the other. If the answer is "bad", discard both computers; if the answer is "good", discard just the testing computer and keep the tested one.
  3. If there was one computer left out of the pairing in step 1, then either keep or discard it so that you keep in total an odd number of computers.

Proof that after this operation, more than half of the computers are good: Let $G$ be the number of good computers to start with, and $B$ the number of bad computers. Write $(GG)$ for the number of pairs produced in step 1 with both computers good; write $(GB)$ for the number of pairs with the testing computer good and the tested computer bad; write $(BG)$ and $(BB)$ similarly. Let $G_2$ and $B_2$ denote respectively the number of good and bad computers that are kept after step 2, and let $G_3$ and $B_3$ denote the corresponding number for after step 3. We know that $G > B$ and want to show that $G_3 > B_3$.

In step 2, every $(GG)$ pair answers "good", so you keep the second good computer; this shows $G_2 \ge (GG)$. On the other hand, the only way for a bad computer to survive step 2 is if a bad computer vouches for it; this shows $B_2 \le (BB)$. Now consider three cases.

Case 1: There were an even number of computers to start with. Then $G = 2(GG) + (GB) + (BG)$ and $B = 2(BB) + (GB) + (BG)$. By hypothesis, $G > B$, so these identities imply $(GG) > (BB)$, which gives $G_2 > B_2$. Since there is no unpaired computer, $G_3 = G_2$ and $B_3 = B_2$, so we're done.

Case 2: There were an odd number of computers to start with, and the unpaired computer was good. Then $G = 2(GG) + (GB) + (BG) + 1$ and $B = 2(BB) + (GB) + (BG)$, which since $G > B$ implies $(GG) \ge (BB)$, so $G_2 \ge B_2$. If $G_2 > B_2$ then $G_3 > B_3$ whether or not we keep the unpaired good computer. If $G_2 = B_2$ then the total number of computers kept after step 2 is even, so in step 3 we keep the unpaired good computer, obtaining $G_3 = G_2+1 > B_2 = B_3$.

Case 3: There were an odd number of computers to start with, and the unpaired computer was bad. Then $G = 2(GG) + (GB) + (BG)$ and $B = 2(BB) + (GB) + (BG) + 1$, which since $G > B$ implies $(GG) > BB$. As in case 1 this yields $G_2 > B_2$. If $G_2 = B_2 + 1$ then the total number of computers kept after step 2 is odd, so in step 3 we discard the unpaired bad computer, obtaining $G_3 = G_2 > B_2 = B_3$. If $G_2 > B_2 + 1$ then $G_3 > B_3$ whether or not we keep the unpaired bad computer.

So in all cases, at the end of step 3 we have kept more good computers than bad computers, as desired.


As Alon said in comments, in the worst case (when all tests say "good") this method uses $Q(n) = n - h(n)$ questions, where $h(n)$ is the number of bits in the binary representation of $n$. (This can be easily proved by (strong) induction.) Some specific values:

  • $Q(100) = 97$, just as in the question.
  • $Q(2^m) = 2^m-1$, which is slightly worse than the estimate in the question, which in this case is $2^m-3$ (for $m\ge 2$, I presume).
  • In particular, $Q(2) = 1$, which is suboptimal, since the hypothesis that more than half the computers are good here tells us that all of them are good, so there is no need to ask questions. But I guess adding this special case to the algorithm saves us at most one question.
  • $Q(2^m-1) = 2^m - m$, which is asymptotically better than the estimate in the question (it's $n-\log n$ instead of $n-c$).
  • In particular, $Q(7) = 4$, slightly improving the estimate in the question.

Here's another way to look at this puzzle. Consider the following game. There is a (directed) graph with $2n$ vertices, labelled $x_1,\dotsc,x_n$ and $\neg x_1,\dotsc,\neg x_n$. At the beginning of the game, the graph has no edges. Every round, player A chooses two numbers $i$ and $j$; player B draws either the directed edge $x_i\to x_j$ or the directed edge $x_i\to\neg x_j$. At the end of the round, player A may claim victory by choosing some number $k$ and drawing $x_k\to\neg x_k$; player A then wins if, interpreting the graph as specifying a boolean formula (where each directed edge represents an implication, which is a 2-clause), it is not possible to satisfy that boolean formula with more than half of the variables set to "true". Player A's goal is to win as quickly as possible; player B's goal is to delay player A's victory as long as possible.

In short, the players jointly construct an instance of MAX 2-SAT, with player A trying to make it unsatisfiable and player B trying to keep it satisfiable.

(See, when player A chooses $k$, adds $x_k\to\neg x_k$, and shows that the resulting MAX 2-SAT instance is unsatisfiable, that amounts to a proof (by contraposition) that, if more than half the computers are good, the answers so far prove that computer $k$ is good.)

I have no insights from this way of thinking about the puzzle, except that it makes me suspect the problem is hard.


Several related problems are discussed in papers of Martin Aigner (and his coauthors). See for example M. Aigner: Two Colors and More, In: Csiszár, Katona, Tardos (eds.): Entropy, Search, Complexity, Springer 2007, p. 9-26.

This paper mentions the Bleher paper in which the Knights and Spies problem (that Aigner call the liar problem) is solved. The version where we only have to find one knight was posed as an open problem in the 2001 COSSAC meeting in Ischia (http://www.di.unisa.it/COSSAC/Program.html) and was solved independently in 2002 by Aigner (see the paper above) and myself (G. Wiener: A Solution to the Liar Problem, In: Proceedings of the 3rd Japanese-Hungarian Workshop on Discrete Mathematics and its Applications. Tokyo, Japan, 21/01/2003-24/01/2003. pp. 232-239.). Aigner solved a somewhat generalized version of the problem, where it is known that there are at least k>n/2 knights and proved that the number of questions needed is exactly 2(n-k)-H(n-k). He also proved a similar result for the original problem, which is just sketched in the Bleher paper: if there are at least k>n/2 knights, then exactly 2n-k-1 questions are needed.