[Crypto] Why are $\lceil 1/\operatorname{entropy-per-bit} \rceil$ number of bits not sufficient to generate an unbiased bit?
Solution 1:
Entropy and bias are not the same.
Yes, total entropy is additive so as you suggest 7 bits of badrand()
produce a total of 1.064 bits of entropy. So? How would you use that? In cryptography we aim to use some source to produce a stream of independently and identically distributed random bits.
Assume a plaintext ($p$) of octets XORed with an octet keystream ($k$) to generate ciphertext ($c$) as $c = p \oplus k$. Where would you get 8 bits of $k$ from? We would need to use up at least $\frac{8}{0.152} \approx 53$ bits from badrand()
. So the issue becomes how do you transform 53 bits into 8 to be usable as an unbiased keystream?
We could use a randomness extractor such as a 2 universal hash function. That can be as simple as vector multiplication. The maths (Leftover Hash Lemma) governing the transformation tells us that the output bias $\epsilon_{hash} = 2^{-(sn - k)/2}$. In your case, $\epsilon_{hash} = 2^{-(0.152 \times 53 - 8)/2} = 0.98$. Not much improvement over the raw bit stream! The maths only works if $n$ is increased to >849 which then gives you a small enough bias for $k$ to be cryptographically useful.
As stated in the linked paper, there is a large entropy loss as an unavoidable(?) consequence of the transformation. The loss can be reduced and the extraction made more efficient, but nothing comes entirely for free.
Speaking of entropy loss: The most important factor that can reduce the entropy loss is the output size of the extractor function. Your example uses 8 bits. So the entropy loss is just over 99%. That's bad. Imagine using SHA-512 instead of an 8 column wide vector extractor. You'd hash 4211 bits to produce 512. The loss is now 88%.
Solution 2:
The other answer actually says that you need 2527 bits of input for each 256 bits of hash output. That's 9.9 bits per bit, not much worse than 7.
The same calculations gives 849 input bits per output bit if you use a 1-bit hash, but that doesn't mean there's no way to produce a first output bit with fewer than 849 input bits. The hashing approach isn't provably optimal for every use case, it's just a good choice for many use cases.
A simple approach that uses fewer than 849 bits is the classic method of reading a pair of bits, returning the first one if they're unequal, or discarding them and trying again if they're equal. Once you've retried enough times that the probability you've reached this iteration is small enough, you can just return 0 (or 1 if you prefer), making the worst-case bit usage bounded. This needs an average of about 11 bits/bit, and a maximum of... $2\lceil\log_{0.9^2}\sqrt{2^{-64}\ln(2)/4}\rceil = 220$, I think. This isn't optimal either; it just shows that cheaper methods exist when you need a bit ASAP.
It's easy to prove the impossibility of producing a sufficiently unbiased output bit from 7 bits of input. The input is one of 128 discrete values, and the function has no other source of randomness, so it must deterministically return either 0 or 1 for each input. This is a subset-sum problem: you must find a subset of $\{\prod S : S\in \{0.1,0.9\}^7\}$ that adds to a value in the acceptable probability range. If you multiply the elements of the set by $10^7$ so they're integers, they are all multiples of $9$ except for a single $1$, while the sum you're trying to reach is congruent to 5 mod 9. Thus, you can't do better than output bit probabilities of $0.5\pm 4\cdot10^{-7}$, and that's only if the subset-sum problem has a solution. The other answer says that an entropy of $1-2^{-64}$ bits/bit requires $P(x_i = 0, x_i = 1) \approx 0.5 \pm 2^{-66}$. I think that's wrong and it should be $2^{-33}$, but both values are less than $4\cdot10^{-7}$.
An interesting question to which I don't know the answer is the smallest number of input bits for which you can produce a sufficiently unbiased output using this method, but an extension of this argument shows that it has to be at least 11 (or 21 if I'm wrong about the $2^{-33}$), making this less efficient in the long run than the 256-bit hashing method.
Solution 3:
I will take a slightly different approach, a side step to a simpler problem to gain intuition. Let's say we have a fair 6 sided die. And we wish to draw a number uniformly from 1 to 4. It can't be done with a single dice roll. The single dice roll has enough entropy. More than the 2 bits we need. But it is impossible to map the dice roll results to a uniform number 1-4 we must map multiple values to the same output and can not allocate the same number of original values to each. With 2 such dice rolls there is no problem. We can easily extract a single unbiased bit from each dice roll and construct a uniformly random number 1-4.
If we wanted a number between 1 and 5 uniformly it becomes more difficult. With a fixed number of dice rolls the space of outcomes is $6^n$ which will never be devisible by 5 for any n(The prime decomposition will only ever have 2s and 3s). So we can either take a sufficiently large $n$ to get the bias as small as we want (but never 0). Or more commonly we use a simple algorithm with probablistic runtime which may need an unbounded number of roles. roll the die if the outcome is 1-5 return the value if it's 6 repeat. This produces the unbiased result but with some small probability it will continue arbitrarily long time.
So having enough entropy isn't sufficient.
Solution 4:
Why are $\lceil 1/\operatorname{entropy-per-bit} \rceil$ number of bits not sufficient to generate an unbiased bit?
Because the question is formulated for just one (nearly) unbiased bit to produce. For a large number of (nearly) unbiased bits to produce, that would be enough.
Assume $n$ independent input bits $b_j$, each set with known probability exactly $\alpha$, thus $\operatorname{entropy-per-bit}=\alpha\log_2(\alpha)-(1-\alpha)\log_2(1-\alpha)$. The question is for $\alpha=0.9$, thus $\operatorname{entropy-per-bit}\approx0.468996$, and $\lceil 1/\operatorname{entropy-per-bit} \rceil=3$.
We can make an explicit algorithm generating $m$ bits from $n=\lceil(m+\ell)/\operatorname{entropy-per-bit}\rceil$ bits of the biased source, with advantage $\mathcal O(2^{-\ell})$ (including vanishing individual bit bias) for an adversary trying to distinguish the output from $m$ uniform random bits.
One of the simplest such algorithm, generating $m$ bits from $n$ goes:
- $x\gets0$ and $y\gets1$ (these a real values, or rationals when $\alpha$ is rational)
- for each of the $n$ input bits $b_j$
- if $b_j=0$, then $y\gets\alpha\,x+(1-\alpha)\,y$, else $x\gets\alpha\,x+(1-\alpha)\,y$
- output the $m$ bits of the binary expression of $\lfloor x\,2^m\rfloor$.
This is a so-called arithmetic coder. There are slightly more complex variants that most of the time output some bits before the end. There are other variants that use bounded memory (here, we need storage proportional to $m$), at the expense of a small bias.
Back to the problem of generating a single output bit, as unbiased as possible, from a fixed number $n\ge1$ input bits.
There are $2^n$ possible values of $n$ input bits. For any such $n$-bit bitstring, note $i$ the corresponding integer per big-endian binary convention (thus $0\le i<2^n$) and $\mathcal\|i\mathcal\|=k$ the number of ones in $i$, with thus $0\le k\le n$. Value $i$ has probability $p_i=(1-\alpha)^{n-k}\,\alpha^k$. There are $n\choose k$ values $i$ with the same probability $q_k$, and correspondingly $1=\sum{n\choose k}q_k$. Recall $n\choose k$ is given by Pascal's triangle.
For each of the $2^n$ possible values $i$, we can decide if it will output a $0$ or a $1$. The probability of a $1$ at the output is the sum of the $p_i$ for the $i$ we decide will output a $1$. That's $2^{(2^n)}$ assignments, which quickly becomes too much to explore. However we can make simplifications:
- The only thing that matters to the final probability is how many $i$ with a given $\mathcal\|i\mathcal\|=k$ output a $1$. That's an integer $m_k\in[0,{n\choose k}]$, and the final probability of a $1$ is $\sum{m_k\,q_k}$.
- In the search of the $m_k$ leading to $p$ closest to $1/2$, we can force $m_n=1$ (meaning that if all the input bits are set, that is $i=2^-1$, the output will be $1$). That's because changing all the $m_k$ to $m'_k={n\choose k}-m_k$ will change the probability of a $1$ from $p$ to $p'=1-p$, leaving the bias from $1/2$ unchanged in absolute value.
For example, with $n=2$, we can have $m_0\in\{0,1\}$, $m_1\in\{0,1,2\}$, for a total of only $2\times3=6$ possibilities of the outcome as a function of the two input bits $b_0$ and $b_1$. We show the corresponding probability $p$ that the output is $1$, and the value of $\alpha$ for $p=1/2$, if any. $$\begin{array}{cc|cccc|c|l} &&0&0&1&1&b_0\\ &&0&1&0&1&b_1\\ \hline m_0&m_1&&&&&p&\alpha\text{ for }p=1/2\\ \hline 0&0&0&0&0&1&\alpha^2&1/\sqrt2\\ 0&1&0&0&1&1&\alpha&1/2\\ 0&2&0&1&1&1&2\alpha-\alpha^2&1-1/\sqrt2\\ 1&0&1&0&0&1&1-2\alpha+2\alpha^2&1/2\\ 1&1&1&0&1&1&1-\alpha+\alpha^2\\ 1&2&1&1&1&1&1\\ \hline &&0&1&1&2&k\\ \end{array}$$
I don't know where the question's $n=849$ bit for $2^{-64}$ bias and $\alpha=0.9$ exactly comes from, but it's much too high. With $n=6$ we can't get better than $p=0.469\ldots$, but with $n=7$, $m_0=1$, $m_1=3$, $m_2=6$, $m_3=0$, $m_4=6$, $m_5=3$, $m_6=0$, $m_7=1$ gets $p=0.499996$, and I think we get one extra decimal (over 3 bits) for each increment of $n$.
Pseudocode implementing this strategy goes
- $i\gets0$ and $k\gets0$
- for each in $n=7$ input bits $b_j$
- $i\gets2i+b_j$
- $k\gets k+b_j$
- if $k=7$, return $1$;
- if $k=5$ and $i\le\mathtt{0110111_b}$, return $1$;
- if $k=4$ and $i\le\mathtt{0100111_b}$, return $1$;
- if $k=2$ and $i\le\mathtt{0001100_b}$, return $1$;
- if $k=1$ and $i\le\mathtt{0000100_b}$, return $1$;
- if $k=0$, return $1$;
- return $0$.
Note: the binary constant for $k$ is the ${m_k}^\text{th}$ integer with exactly $k$ bit(s) set.
A use case of this algorithm is to generate one almost unbiased bit from $n=7$ throws of a dice 10 with only one of the 10 sides marked. This is not a common setup, and correspondingly this algorithm is seldom used. That's because in practice, we seldom know exactly the $\alpha$ of a biased source. In that case, the applied cryptographer feeds the input to a CSPRNG.