Can a single classical particle have any entropy?
First of all we must distinguish between two things that are called entropies. There's a microscopic entropy, also called Shannon Entropy, that is a functional over the possible probability distributions you can assign for a given system:
$\displaystyle H[p] = -\sum_{x \in \mathcal{X}}\; p(x) \log(p(x))$
where $\mathcal{X}$ is the set where your variable x takes values. And there's a "macroscopic entropy", that is merely the value of the functional above calculated for a specific family of distributions parametrized by some variable $\theta$:
$S(\theta)=-\sum_{x \in \mathcal{X}}\; p(x|\theta) \log(p(x|\theta))$
Now, what happens in thermodynamics and equilibrium statistical physics is that you have a specific family of distributions to substitute in the first expression: the Gibbs equilibrium distribution:
$p(x | V, T, N) = \frac{1}{Z}e^{-\frac{E(x)}{T}}$
where, as an example, we have as parameters the volume, temperature and number of particles, and E(x) is the energy of the specific configuration x. If you substitute this specific family of distributions on $H[p]$, what you'll have is the thermodynamic equilibrium entropy, and this is what physicists usually call entropy: a state function depending on parameters of the Gibbs distribution (as opposed to a functional that associate a real value for each possible choice of distributions). Now, to find what is the apropriate physical equilibrium for this system when those parameters are allowed to vary, you must maximize this entropy (1).
Now here it's common to make the following distinction: x is a microscopic variable that specify the detailed configuration of the system, and V, T and N are macroscopic parameters. It doesn't need to be so. In the specific case of statistical physics the origin of the distribution function is the fact that there are so many degrees of freedom that it's impossible (and even undesirable) to follow them all, so we are satisfied with a statistical description. Under this assumptions it's natural to expect that the distribution would be over microscopic variables with macroscopic parameters. But this is not the only reason why one would use a distribution function.
You could have other sources of ignorance. As an example, you could have the following problem: suppose we recently discovered a new planet on a solar system where there' 2 more planets. It's position $\vec{x}$ and velocity $\vec{v}$ at a given instant $t = 0$ have been measured within some precision $\sigma_x$ and $\sigma_v$. Let's assume that the sources of possible errors in the measures are additive. Then it's reasonable to assume that we have a gaussian probability distribution for the position of the planet:
$\displaystyle p(\vec{x}(0), \vec{v}(0) | \sigma_x, \sigma_v) =\frac{1}{Z} \exp\left(-\frac{x(0)^2}{2\sigma_x} -\frac{v(0)^2}{2\sigma_v} \right)$
where Z is some normalization constant. Now suppose we want to predict this planets position in the future given the current positions of the other planets and their uncertainties. We would have a distribution:
$\displaystyle p(\vec{x}(t), \vec{v}(t) | \vec{x}_i(0), \vec{v}_i(0), \sigma_{x,i},\sigma_{v,i})= \displaystyle p(\vec{x}(0), \vec{v}(0) | \sigma_x, \sigma_v)\prod_{i=1}^{2}\displaystyle p(\vec{x}_i(0), \vec{v}_i(0) | \sigma_{x,i}\sigma_{v,i}) \times$ $\times p(\vec{x}(t), \vec{v}(t) | \vec{x}(0), \vec{v}(0),\vec{x}_1(0), \vec{v}_1(0), \vec{x}_2(0), \vec{v}_2(0))$
where $p(\vec{x}(t), \vec{v}(t) | \vec{x}(0), \vec{v}(0),\vec{x}_1(0), \vec{v}_1(0), \vec{x}_2(0), \vec{v}_2(0))$ would take Newton's equations of motion into account. Note that there's a small number of particles here: just 3. And the only source of "randomness" is the fact that I don't know the positions and velocities precisely (for a technological reason, not a fundamental one: I have limited telescopes, for example).
I can substitute this distribution in the definition of entropy and calculate an "macroscopic entropy" that depends on the positions, velocities and measurement precisions of the other planets:
$S(x_i, v_i,\sigma_{x,i},\sigma_{v,i}) = - \int d\vec{x} d\vec{v} p(\vec{x}, \vec{v} | t, \vec{x}_i, \vec{v}_i, \sigma_{x,i},\sigma_{v,i}) \log \left[p(\vec{x}, \vec{v} |\vec{x}_i, \vec{v}_i, \sigma_{x,i},\sigma_{v,i})\right]$
What does this entropy means? Something quite close to what thermodynamic entropy means!!! Is the logarithm of the average configuration space volume where I expect to find the given planet in instant t (2)!!! And it's the entropy of a 'single particle'.
There's no problem with that. I can even have situations where I must maximize this entropy! Suppose I don't know the position planet 2, but I do know all three planets have coplanar orbits. There are well defined procedures in information and inference theory that say to me that one way of dealing with this is to find the value of $\vec{x}_2$ that maximizes the entropy, subject to the constraint that all orbits are in the same plane, and then substitute this value in the original distribution. This is often called "principle of maximum ignorance".
There are interpretations of thermodynamics and statistical physics as an instance of this type of inference problem ( please refer to the works of E. T. Jaynes, I'll give a list of references below). In this interpretation there's nothing special on the fact that you have many degrees of freedom besides the fact that this is what makes you ignorant about the details of the system. This ignorance is what brings probabilities, entropies and maximum entropy principles to the table.
Refrasing it a bit, probabilities and entropies are a part of your description when ignorance are built in your model. This ignorace could be a fundamental one - you can't know something about your system; could be a technical one - you could know if you had better instruments; and even, as in the case of statistical physics, a deliberate one - you can know, at least in principle, but you choose to leave detail out cause it isn't relevant in the scale you're interested in. But the details about how you use probabilities, entropies and maximum entropy principles are completely agnostic to what the sources of your ignorance are. They are a tool for dealing with ignorance, no matter the reasons why you are ignorant.
(1) For information-theoretic arguments why we have to maximize entropy in thermodynamics please refer to E. T. Jaynes' famous book "Probability Theory: The Logic of Science" (3) and this series of articles:
Jaynes, E. T., 1957, Information Theory and Statistical Mechanics, Phys. Rev., 106, 620
Jaynes, E. T., 1957, Information Theory and Statistical Mechanics II, Phys. Rev., 108, 171.
Another interesting source:
Caticha, A., 2008, Lectures on Probability, Entropy and Statistical Physics, arxiv:0808.0012
(2) This can be given a rigorous meaning within information theory. For any distribution p(x) let the set $A_\epsilon$ be defined as the smallest set of points with probability greater than $1 - \epsilon$. Then the size of this set must be of order:
$log |A_\epsilon| = S + O(\epsilon)$
For another form of this result see the book "Information Theory" by Cover and Thomas.
(3) Some of Jaynes's rants about quantum theory in this book may appear odd today, but let's excuse him. He committed some errors too. Just focus on the probability theory, information theory and statistical physics stuff which is quite amazing. :)
(4) It seems that dealing with this kinds of problems from Celestial Mechanics was actually one of the first problems that made Laplace interested in probabilities, and apparently he used it in calculations on Celestial Mechanics. The other problem that took his attention towards probability theory was... gambling! Hahaha...
Entropy is a concept in thermodynamics and statistical physics but its value only becomes indisputable if one can talk in terms of thermodynamics, too.
To do so in statistical physics, one needs to be in the thermodynamic limit i.e. the number of degrees of freedom must be much greater than one. In fact, we can say that the thermodynamic limit requires the entropy to be much greater than one (times $k_B$, if you insist on SI units).
In the thermodynamic limit, the concept of entropy becomes independent of the chosen ensembles - microcanonical vs canonical etc. - up to corrections that are negligible relatively to the overall entropy (either of them).
A single particle, much like any system, may be assigned the entropy of $\ln(N)$ where $N$ is the number of physically distinct but de facto indistinguishable states in which the particle may be. So if the particle is located in a box and its wave function may be written as a combination of $N$ small wave packets occupying appropriately large volumes, the entropy will be $\ln(N)$.
However, the concept of entropy is simply not a high-precision concept for systems away from the thermodynamic limit. Entropy is not a strict function of the "pure state" of the system; if you want to be precise about the value, it also depends on the exact ensemble of the other microstates that you consider indistinguishable.
If you consider larger systems with $N$ particles, the entropy usually scales like $N$, so each particle contributes something comparable to 1 bit to the entropy - if you equally divide the entropy. However, to calculate the actual coefficients, all the conceivable interactions between the particles etc. matter.
The concept of entropy is very difficult because of the following day-to-day fact: when we have a macroscopic mechanical system, we can look at the system all the time, and know exactly where everything is. In such a situation, we know what each particle is doing at all times, the evolution is deterministic, and the concept of entropy is meaningless.
But the process of looking at particles to find out where they are always produces entropy. To acquire the information about the positions of molecules cannot be done in a way that decreases the entropy of the particles plus the measuring devices. This is an important point, but it can be proven easily from Liouville's theorem. If you start off ignorant of the position of a particle, it occupies some phase space volume. The only way to shrink that volume is to couple trajectories so that you correlate the trajectory of the atoms in a measuring device with the trajectory of the particle. You can do this by adding an interaction Hamiltonian, and this will reduce the phase space volume of the particle given the measuring device trajectory, but the total phase space volume is conserved in the process, so there is uncertainty in the measuring device trajectories which more than compensates for the loss of uncertainty in the position of the particle.
The conservation of phase space probability volume is counterintuitive, because we have intuition that looking classically at particles doesn't disturb them. In fact, if you bounce very weak classical EM radiation from the particles, you can see them without disturbing them. But this is because classical fields do not have a thermal equilibrium--- and when they are near zero over all space, they are infinitely cold. So what you are doing is dumping the entropy of the particles into the infinite zero temperature reservoir provided by the field, and extracting the position from the field during this process.
If you put the field on a lattice, to avoid the classical Rayleigh-Jeans divergence in the thermal equilibrium, then you can define a thermal state for the classical field. If the field is in this thermal state, it gives you no information on the particle positions. If you add a little bit of non-thermal field to measure the particles with, the interaction with the particles dump the phase space volume of the original uncertainty in the particles' positions into the field with a finite entropy per bit acquired, just by Liouville's theorem.
The entropy is a well defined classical quantity, even for a single particle. When you have no information about the particle position, but you know it's energy, the entropy is given by the information theory integral of $\rho\log\rho$. You can extract as much information as you want about the particle by measuring its position more and more accurately, but this process will always dump an equal amount of entropy into the measuring device. All this follows from Liouville's theorem.
This is the reason that entropy is often confusing. When discussing entropy, you need to take into account what you know about the system, much as in quantum mechanics.