Ignorance in statistical mechanics

I wouldn't say the ignorance interpretation is a relic of the early days of statistical mechanics. It was first proposed by Edwin Jaynes in 1957 (see http://bayes.wustl.edu/etj/node1.html, papers 9 and 10, and also number 36 for a more detailed version of the argument) and proved controversial up until fairly recently. (Jaynes argued that the ignorance interpretation was implicit in the work of Gibbs, but Gibbs himself never spelt it out.) Until recently, most authors preferred an interpretation in which (for a classical system at least) the probabilities in statistical mechanics represented the fraction of time the system spends in each state, rather than the probability of it being in a particular state at the present time. This old interpretation makes it impossible to reason about transient behaviour using statistical mechanics, and this is ultimately what makes switching to the ignorance interpretation useful.

In response to your numbered points:

(i) I'll answer the "whose ignorance?" part first. The answer to this is "an experimenter with access to macroscopic measuring instruments that can measure, for example, pressure and temperature, but cannot determine the precise microscopic state of the system." If you knew precisely the underlying wavefunction of the system (together with the complete wavefunction of all the particles in the heat bath if there is one, along with the Hamiltonian for the combined system) then there would be no need to use statistical mechanics at all, because you could simply integrate the Schrödinger equation instead. The ignorance interpretation of statistical mechanics does not claim that Nature changes her behaviour depending on our ignorance; rather, it claims that statistical mechanics is a tool that is only useful in those cases where we have some ignorance about the underlying state or its time evolution. Given this, it doesn't really make sense to ask whether the ignorance interpretation can be confirmed experimentally.

(ii) I guess this depends on what you mean by "consistent with." If two people have different knowledge about a system then there's no reason in principle that they should agree on their predictions about its future behaviour. However, I can see one way in which to approach this question. I don't know how to express it in terms of density matrices (quantum mechanics isn't really my thing), so let's switch to a classical system. Alice and Bob both express their knowledge about the system as a probability density function over $x$, the set of possible states of the system (i.e. the vector of positions and velocities of each particle) at some particular time. Now, if there is no value of $x$ for which both Alice and Bob assign a positive probability density then they can be said to be inconsistent, since every state that Alice accepts the system might be in Bob says it is not, and vice versa. If any such value of $x$ does exist then Alice and Bob can both be "correct" in their state of knowledge if the system turns out to be in that particular state. I will continue this idea below.

(iii) Again I don't really know how to convert this into the density matrix formalism, but in the classical version of statistical mechanics, a macroscopic ensemble assigns a probability (or a probability density) to every possible microscopic state, and this is what you use to determine how heavily represented a particular microstate is in a given ensemble. In the density matrix formalism the pure states are analogous to the microscopic states in the classical one. I guess you have to do something with projection operators to get the probability of a particular pure state out of a density matrix (I did learn it once but it was too long ago), and I'm sure the principles are similar in both formalisms.

I agree that the measure you are looking for is $D_\textrm{KL}(A||B) = \sum_i p_A(i) \log \frac{p_A(i)}{p_B(i)}$. (I guess this is $\mathrm{tr}(\rho_A (\log \rho_A - \log \rho_B))$ in the density matrix case, which looks like what you wrote apart from a change of sign.) In the case where A is a pure state, this just gives $-\log p_B(i)$, the negative logarithm of the probability that Bob assigns to that particular pure state. In information theory terms, this can be interpreted as the "surprisal" of state $i$, i.e. the amount of information that must be supplied to Bob in order to convince him that state $i$ is indeed the correct one. If Bob considers state $i$ to be unlikely then he will be very surprised to discover it is the correct one.

If B assigns zero probability to state $i$ then the measure will diverge to infinity, meaning that Bob would take an infinite amount of convincing in order to accept something that he was absolutely certain was false. If A is a mixed state, this will happen as long as A assigns a positive probability to any state to which B assigns zero probability. If A and B are the same then this measure will be 0. Therefore the measure $D_\textrm{KL}(A||B)$ can be seen as a measure of how "incompatible" two states of knowledge are. Since the KL divergence is asymmetric I guess you also have to consider $D_\textrm{KL}(B||A)$, which is something like the degree of implausibility of B from A's perspective.

I'm aware that I've skipped over some things, as there was quite a lot to write and I don't have much time to do it. I'll be happy to expand it if any of it is unclear.

Edit (in reply to the edit at the end of the question): The answer to the question "When may (or may not) a microstate $\phi$ be regarded as a macrostate $\rho_0$ without affecting the predictability of the macroscopic observations?" is "basically never." I will address this is classical mechanics terms because it's easier for me to write in that language. Macrostates are probability distributions over microstates, so the only time a macrostate can behave in the same way as a microstate is if the macrostate happens to be a fully peaked probability distribution (with entropy 0, assigning $p=1$ to one microstate and $p=0$ to the rest), and to remain that way throughout the time evolution.

You write in a comment "if I have a definite penny on my desk with a definite temperature, how can it have several different pure states?" But (at least in Jaynes' version of the MaxEnt interpretation of statistical mechanics), the temperature is not a property of the microstate but of the macrostate. It is the partial differential of the entropy with respect to the internal energy. Essentially what you're doing is (1) finding the macrostate with the maximum (information) entropy compatible with the internal energy being equal to $U$, then (2) finding the macrostate with the maximum entropy compatible with the internal energy being equal to $U+dU$, then (3) taking the difference and dividing by $dU$. When you're talking about microstates instead of macrostates the entropy is always 0 (precisely because you have no ignorance) and so it makes no sense to do this.

Now you might want to say something like "but if my penny does have a definite pure state that I happen to be ignorant of, then surely it would behave in exactly the same way if I did know that pure state." This is true, but if you knew precisely the pure state then you would (in principle) no longer have any need to use temperature in your calculations, because you would (in principle) be able to calculate precisely the fluxes in and out of the penny, and hence you'd be able to give exact answers to the questions that statistical mechanics can only answer statistically.

Of course, you would only be able to calculate the penny's future behaviour over very short time scales, because the penny is in contact with your desk, whose precise quantum state you (presumably) do not know. You will therefore have to replace your pure-state-macrostate of the penny with a mixed one pretty rapidly. The fact that this happens is one reason why you can't in general simply replace the mixed state with a single "most representative" pure state and use the evolution of that pure state to predict the future evolution of the system.

Edit 2: the classical versus quantum cases. (This edit is the result of a long conversation with Arnold Neumaier in chat, linked in the question.)

In most of the above I've been talking about the classical case, in which a microstate is something like a big vector containing the positions and velocities of every particle, and a macrostate is simply a probability distribution over a set of possible microstates. Systems are conceived of as having a definite microstate, but the practicalities of macroscopic measurements mean that for all but the simplest systems we cannot know what it is, and hence we model it statistically.

In this classical case, Jaynes' arguments are (to my mind) pretty much unassailable: if we lived in a classical world, we would have no practical way to know precisely the position and velocity of every particle in a system like a penny on a desk, and so we would need some kind of calculus to allow us to make predictions about the system's behaviour in spite of our ignorance. When one examines what an optimal such calculus would look like, one arrives precisely at the mathematical framework of statistical mechanics (Boltzmann distributions and all the rest). By considering how one's ignorance about a system can change over time one arrives at results that (it seems to me at least) would be impossible to state, let alone derive, in the traditional frequentist interpretation. The fluctuation theorem is an example of such a result.

In a classical world there would be no reason in principle why we couldn't know the precise microstate of a penny (along with that of anything it's in contact with). The only reasons for not knowing it are practical ones. If we could overcome such issues then we could predict the microstate's time-evolution precisely. Such predictions could be made without reference to concepts such as entropy and temperature. In Jaynes' view at least, these are purely macroscopic concepts and don't strictly have meaning on the microscopic level. The temperature of your penny is determined both by Nature and by what you are able to measure about Nature (which depends on the equipment you have available). If you could measure the (classical) microstate in enough detail then you would be able to see which particles had the highest velocities and thus be able to extract work via a Maxwell's demon type of apparatus. Effectively you would be partitioning the penny into two subsystems, one containing the high-energy particles and one containing the lower-energy ones; these two systems would effectively have different temperatures.

My feeling is that all of this should carry over on to the quantum level without difficulty, and indeed Jaynes presented much of his work in terms of the density matrix rather than classical probability distributions. However there is a large and (I think it's fair to say) unresolved subtlety involved in the quantum case, which is the question of what really counts as a microstate for a quantum system.

One possibility is to say that the microstate of a quantum system is a pure state. This has a certain amount of appeal: pure states evolve deterministically like classical microstates, and the density matrix can be derived by considering probability distributions over pure states. However the problem with this is distinguishability: some information is lost when going from a probability distribution over pure states to a density matrix. For example, there is no experimentally distinguishable difference between the mixed states $\frac{1}{2}(\mid \uparrow \rangle \langle \uparrow \mid + \mid \downarrow \rangle \langle \downarrow \mid)$ and $\frac{1}{2}(\mid \leftarrow \rangle \langle \leftarrow \mid + \mid \rightarrow \rangle \langle \rightarrow \mid)$ for a spin-$\frac{1}{2}$ system. If one considers the microstate of a quantum system to be a pure state then one is committed to saying there is a difference between these two states, it's just that it's impossible to measure. This is a philosophically difficult position to maintain, as it's open to being attacked with Occam's razor.

However, this is not the only possibility. Another possibility is to say that even pure quantum states represent our ignorance about some underlying, deeper level of physical reality. If one is willing to sacrifice locality then one can arrive at such a view by interpreting quantum states in terms of a non-local hidden variable theory.

Another possibility is to say that the probabilities one obtains from the density matrix do not represent our ignorance about any underlying microstate at all, but instead they represent our ignorance about the results of future measurements we might make on the system.

I'm not sure which of these possibilities I prefer. The point is just that on the philosophical level the ignorance interpretation is trickier in the quantum case than in the classical one. But in practical terms it makes very little difference - the results derived from the much clearer classical case can almost always be re-stated in terms of the density matrix with very little modification.


I'll complete @Natahniel's answer with the fact that 'knowledge' can have physical implication linked with the behaviour of nature. The problem goes back to Maxwell's demon, who converts his knowledge of the system into work. Recent works (like arXiv:0908.0424 The work value of information) shows that the information theoretical entropies defining the knowledge of the system is connected to the work which is extractable in the same way than the physical entropies are.

To sum al this into a few words, "Nature [does not] change its behaviour depending on how much we ignore", but "how much we ignore" changes the amount of work we can extract fro Nature.


When it comes to discussion of these matters, I make a following comment witch starts with the citation fom Landau-Lifshitz, book 5, chapter 5:

The averaging by means of the statisitcal matrix ... has a twofold nature. It comprises both the averaging due to the probalistic nature of the quantum description (even when as complete as possible) and the statistical averaging necessiated by the incompleteness of our information concerning the object considered.... It must be borne in mind, however, that these constituents cannot be separated; the whole averaging procedure is carried out as a single operation, and cannot be represented as the result of succesive averagings, one purely quantum-mechanical and the other purely statistical.

... and the following ...

It must be emphasized that the averaging over various $\psi$ states, which we have used in order to illustrate the transition from a complete to an incomplete quantum-mechanical description has only a very formal significance. In particular, it would be quite incorrect to suppose that the description by means of the density matrix signifies that the subsystem can be in various $\psi$ states with various probabilities and that the average is over these probabilities. Such a treatment would be in conflict with the basic principles of quantum mechanics.


So we have two statements:

Statement A: You cannot "untie" quantum mechanical and statistical uncertainty in density matrix.
(It is just a restatement of the citations above.)

Statement B: Quantum mechanical uncertainty cannot be expressed in terms of mere "ignorance" about a system.
(I'm sure that this is self-evident from all that we know about quantum mechanics.)

Finally:
Therefore: Uncertainty in density matrix cannot be expressed in terms of mere "ignorance" about a system.