How Does Random Noise Typically Look?
I think people might be misinterpreting the question.
The easy fact that I think Gil has in mind is that if you randomly choose a quantum noise operation with certain properties, then for most choices of that noise operation, although the total error rate will be very low, the errors will be highly correlated. The orthodox interpretation is that this is an artificial way to choose a quantum noise model.
One analogous classical question is as follows: If you randomly choose a stochastic map on the probability space of bit strings, with similar properties, what will it look like?
Gil proposes a randomly chosen permutation subject to the condition that at most t bits are flipped. I am not sure that that is really a good analogy to the quantum noise model that he proposes.
The Poisson process that flips bits is a specific model of random noise on bit strings. It is not a randomly chosen model of noise.
Gil has added the more specific question of why a random stochastic map conditioned on few errors will have the "ambush noise" property that he describes for a random unitary operator. There is a more complicated version, in which each bit separately has a bound on error, and a simpler version, where you just demand that at most $tn$ of the $n$ bits are changed. I will look at the simpler version. As a first step, the different inputs to the stochastic map do not communicate in this question. Each bit string, for instance the string of 0s, is sent to another bit string according to a distribution which is uniform on the simplex of all $2^n$ bit strings. So I will concentrate on just what happens to the 0 string.
One remark is that the fate of just the 0 string is actually statistically identical, whether you choose a random stochastic map classically or a random unitary operator quantumly. The first column of a random unitary operator is a random vector in $\mathbb{C}P^{N-1}$ (with $N = 2^n$ for us). The induced map from $\mathbb{C}P^{N-1}$ to the $(N-1)$-simplex of distributions is a toric moment map, and a famous fact (due to Archimedes for $\mathbb{C}P^1$) is that toric moment maps are measure-preserving. The difference between a random stochastic and a random unitary only first appears with pairwise correlations, and when $N$ is large only very-high-order correlations are significant. If Gil has an argument for random unitaries, this remark suggests that it also applies for random stochastics.
More directly, a distribution on bit strings induces a distribution on Hamming weights of bit strings. This is a linear map from a huge simplex $\Delta_{N-1}$ to the smaller simplex $\Delta_n$ whose corners are labelled by the Hamming weights $0,1,\ldots,n$. We are interested in the push-forward of uniform measure. Let $p_0,\ldots,p_n$ be barycentric coordinates on $\Delta_n$; $p_k$ is also just the probability that the random bit string has weight $k$. Then push-forward of uniform measure on $\Delta_{N-1}$ is proportional to $f(p) \propto \prod_k p_k^{\binom{n}{k}}$. So, in this induced measure on $\Delta_n$, there is an enormous statistical attraction to the corners with middle values of $k$.
Now let's impose the restriction that the total error is at most $tn$, in other words that $$\sum_k kp_k \le tn.$$ This cuts the simplex $\Delta_n$ by a hyperplane. At this point I'll switch to a rough calculation. If $t \ll \frac12$, and if you maximize the log of the probability density $\log f(p)$, the maximum puts most of the probability in the weights $k \approx n/2$. That's because the corresponding term in $\log f(p)$ is $\binom{n}{k}(\log p_k)$. The logarithmic dependence on $p_k$ is outweighed by the sizes of the coefficients. There is a also a geometric factor if your are close to a sharp corner of the cut simplex; however after a logarithm this geometric factor is of order $n$, which is again much smaller than the binomial coefficients.
I am still thinking about the original problem, but let me discuss the side problem of random strings.
In the infinite case the proper way to designate "random" is fairly similar to the "random tree" problem brought up recently: given a string length n, every possible string of that length occurs with the same frequency. This is how "normal" is defined for transcendental numbers to capture the notion that they occur randomly.
This is problematic for the finite case because the string 1111111111 can occur with equal probability as the string 1100101010. From a "pure" standpoint there appears to be nothing we can say, but in applied circumstance with a large enough sample set it is possible to get a notion of bias.
(Picture source, also more information on what I'm discussing.)
We would expect (letting p(n) be the number of complete permutation strings in the first n digits of a string) that in base ten n/p(n) would converge to 2755.
The "synthesized" data in the graph above comes from the infamous Rand Corporation work A Million Random Digits, and does appear to have a sort of bias.
However, in the case of even a converging sequence (like the digits of pi) in the low-n cases it's impossible to determine anything from n/p(n). This suggests to me that the noise model problem can only be approached (if at all) from the large-n case.
I think that there is a point of possible confusion with what it means to say there is an error. When I think about a noise process on a quantum system, I think of the following. Given an initial state $\rho$ of the system, the noise process is a completely positive map $\mathcal{E}$ that produces the output state $\sigma = \mathcal{E}(\rho)$. So, unless the channel is the identity channel, then there is always something nontrivial happening (for at least some choice of initial state $\rho$). So has an error occurred? In some sense, yes. Even if the channel has an operator decomposition where a large fraction of the support of the channel is on the identity channel, then there is in general always an error. I think the point of confusion is that for these types of channels, when you measure the state it will collapse into either a state with no error, or a state with (potentially) lots of errors. So, yes, the errors are highly correlated. But the channel is trying to model some kind of conditional independence. Upon measurement, you think of the channel as either having acted trivially, or having done something bad, like apply a randomly sampled Pauli error. Conditioned on something bad happening, then you have independence.
(For anyone unfamiliar with quantum channels, a classical model showing this kind of behavior would be a kind of "catastrophic loss" channel, where you put all $n$ bits into the channel, and with probability $1-\epsilon$ they are faithfully transmitted, and with probability $\epsilon$ you throw away the bit string and replace it with noise, c.f. something sampled uniformly from the hypercube.)
Also, following up on Gil's comment in Greg's answer, one obvious way to define a random stochastic map would be to just pick random probability distributions and make them the rows of a stochastic matrix. Picking a random distribution could be done by choosing your favorite measure on the simplex whose points are labeled by bit strings. You'd still have to decide how to choose the dependencies between rows. Even under the simple and natural case of uniform measure and independently chosen rows, it is not clear to me how this looks.