What's the physical meaning of the statement that "photons don't have positions"?
We could spend forever playing whac-a-mole with all of the confusing/confused statements that continue popping up on this subject, on PhysicsForums and elsewhere. Instead of doing that, I'll offer a general perspective that, for me at least, has been refreshingly clarifying.
I'll start by reviewing a general no-go result, which applies to all relativistic QFTs, not just to photons. Then I'll explain how the analogous question for electrons would be answered, and finally I'll extend the answer to photons. The reason for doing this in that order will probably be clear in hindsight.
A general no-go result
First, here's a review of the fundamental no-go result for relativistic QFT in flat spacetime:
In QFT, observables are associated with regions of spacetime (or just space, in the Schrödinger picture). This association is part of the definition of any given QFT.
In relativistic QFT, the Reeh-Schlieder theorem implies that an observable localized in a bounded region of spacetime cannot annihilate the vacuum state. Intuitively, this is because the vacuum state is entangled with respect to location.
Particles are defined relative to the vacuum state. By definition, the vacuum state has zero particles, so the Reeh-Schlieder theorem implies that an observable representing the number of particles in a given bounded region of spacetime cannot exist: if an observable is localized in a bounded region of spacetime, then it can't always register zero particles in the vacuum state.
That's the no-go result, and it's very general. It's not restricted to massless particles or to particles of helicity $\geq 1$. For example, it also applies to electrons. The no-go result says that we can't satisfy both requirements: in relativistic QFT, we can't have a detector that is both
perfectly reliable,
localized in a strictly bounded region.
But here's the important question: how close can we get to satisfying both of these requirements?
Warm-up: electrons
First consider the QFT of non-interacting electrons, with Lagrangian $L\sim \overline\psi(i\gamma\partial+m)\psi$. The question is about photons, and I'll get to that, but let's start with electrons because then we can use the electron mass $m$ to define a length scale $\hbar/mc$ to which other quantities can be compared.
To construct observables that count electrons, we can use the creation/annihilation operators. We know from QFT $101$ how to construct creation/annihilation operators from the Dirac field operators $\psi(x)$, and we know that this relationship is non-local (and non-localizable) because of the function $\omega(\vec p) = (\vec p^2+m^2)^{1/2}$ in the integrand, as promised by Reeh-Schlieder.
However, for electrons with sufficiently low momentum, this function might as well be $\omega\approx m$. If we replace $\omega\to m$ in the integrand, then the relationship between the creation/annihilation operators becomes local. Making this replacement changes the model from relativistic to non-relativistic, so the Reeh-Schlieder theorem no longer applies. That's why we can have electron-counting observables that satisfy both of the above requirements in the non-relativistic approximation.
Said another way: Observables associated with mutually spacelike regions are required to commute with each other (the microcausality requirement). The length scale $\hbar/mc$ is the scale over which commutators of our quasi-local detector-observables fall off with increasing spacelike separation. Since the non-zero tails of those commutators fall off exponentially with characteristic length $\hbar/mc$, we won't notice them in experiments that have low energy/low resolution compared to $\hbar/mc$.
Instead of compromising strict localization, we can compromise strict reliability instead: we can construct observables that are localized in a strictly bounded region and that almost annihilate the vacuum state. Such an observable represents a detector that is slightly noisy. The noise is again negligible for low-resolution detectors — that is, for detector-observables whose localization region is much larger than the scale $\hbar/mc$.
This is why non-relativistic few-particle quantum mechanics works — for electrons.
Photons
Now consider the QFT of the elelctromagnetic field by itself, which I'll call QEM. All of the observables in this model can be expressed in terms of the electric and magnetic field operators, and again we know from QFT $101$ how to construct creation/annihilation operators that define what "photon" means in this model: they are the positive/negative frequency parts of the field operators. This relationship is manifestly non-local. We can see this from the explicit expression, but we can also anticipate it more generally: the definition of positive/negative frequency involves the infinite past/future, and thanks to the time-slice principle, this implies access to arbitrarily large spacelike regions.
In QEM, there is no characteristic scale analogous to $\hbar/mc$, because $m=0$. The ideas used above for electrons still work, except that the deviations from localization and/or reliability don't fall off exponentially with any characteristic scale. They fall of like a power of the distance instead.
As far as this question is concerned, that's really the only difference between the electron case and the photon case. That's enough of a difference to prevent us from constructing a model for photons that is analogous to non-relativistic quantum mechanics for electrons, but it's not enough of a difference to prevent photon-detection observables from being both localized and reliable for most practical purposes. The larger we allow its localization region to be, the more reliable (less noisy) a photon detector can be. Our definition of how-good-is-good-enough needs to be based on something else besides QEM itself, because QEM doesn't have any characteristic length-scale of its own. That's not an obstacle to having relatively well-localized photon-observables in practice, because there's more to the real world than QEM.
Position operators
What is a position operator? Nothing that I said above refers to such a thing. Instead, everything I said above was expressed in terms of observables that represent particle detectors (or counters). I did that because the starting point was relativistic QFT, and QFT is expressed in terms of observables that are localized in bounded regions.
Actually, non-relativistic QM can also be expressed that way. Start with the traditional formulation in terms of the position operator $X$. (I'll consider only one dimension for simplicity.) This single operator $X$ is really just a convenient way of packaging-and-labeling a bunch of mutually-commuting projection operators, namely the operators $P(R)$ that project a wavefunction $\Psi(x)$ onto the part with $x\in R$, cutting off the parts with $x\notin R$. In fancy language, the commutative von Neumann algebra generated by $X$ is the same as the commutative von Neumann algebra generated by all of the $P(R)$s, so aside from how things are labeled with "eigenvalues," they both represent the same observable as far as Born's rule is concerned. If we look at how non-relativistic QM is derived from its relativistic roots, we see that the $P(R)$s are localized within the region $R$ by QFT's definition of "localized" — at least insofar as the non-relativistic approximation is valid. In this sense, non-relativistic single-particle QM is, like QFT, expressed in terms of observables associated with bounded regions of space. The traditional formulation of single-particle QM obscures this.
Here's the point: when we talk about a position operator for an electron in a non-relativistic model, we're implicitly talking about the projection operators $P(R)$, which are associated with bounded regions of space. The position operator $X$ is a neat way of packaging all of those projection operators and labeling them with a convenient spatial coordinate, so that we can use concise statistics like means and standard deviations, but you can't have $X$ without also having the projection operators $P(R)$, because the existence of the former implies the existence of the latter (through the spectral theorem or, through the von-Neumann-algebra fanciness that I mentioned above).
So... can a photon have a position operator? If by position operator we mean something like the projection operators $P(R)$, which are both (1) localized in a strictly bounded region and (2) strictly reliable as "detectors" of things in that region, then the answer is no. A photon can't have a position operator for the same reason that a photon can't have a non-relativistic approximation: for a photon, there is no characteristic length scale analogous to $\hbar/mc$ to which the size of a localization region can be compared, without referring to something other than the electromagnetic field itself. What we can do is use the usual photon creation/annihilation operators to construct photon-detecting/counting observables that are not strictly localized in any bounded region but whose "tails" are negligible compared to anything else that we care about (outside of QEM), if the quasi-localization region is large enough.
What is a physical consequence?
What is a physical consequence of the non-existence of a strict position operator? Real localized detectors are necessarily noisy. The more localized they are, the noisier they must be. Reeh-Schlieder guarantees this, both for electrons and for photons, the main difference being that for electrons, the effect decreases exponentially as the size of the localization region is increased. For photons, it decreases only like a power of the size.
The idea "photons do not have position operator" may have more meanings depending on who you ask.
To me, this statement means something very specific: EM radiation does not consist of particles that could be observed at some point of space and could be described by $\psi(r_1,r_2,...r_N)$ function in the sense of Born's interpretation. Instead, EM radiation itself is everywhere, and properly described by a function of 3 spatial coordinates - the thing to be studied is the EM field, not some particles of light. The field can be c number or q number, but the point is that the entity to be described is a field, not any set of particles. This view means there are no actual "particles of radiation" flying in hydrogen molecules, in contrast to electrons, which there are two in every neutral hydrogen molecule.
"Particles of light" or "photons" is a somewhat problematic word, because it doesn't have clear universally adopted concept behind it. The originator of the word meant something very different from what we use this term for after end of 1920's. Today, often it is meant as a short hand for "chunk of energy $hf$ transferred between matter and radiation of frequency $f$"; it may be distributed in some region of space but it is not localized at any single point of space.
Of course, one can go to the simple examples and talk about things such as "1 photon in mode (1,1,1,1), 2 photons in mode (2,2,2,2)" as a state of EM field in a box, but these states are of the whole system, one cannot go and find some real things at some point of space within the box more precisely than "in the box".
When an optics experiment is done using a laser beam, it is perfectly meaningful to talk about photons being in the beam.
Usual laser light is well described by a classical EM wave with definite electric strength vector and wave vector. This means it does not have any definite number of photons in it, it is better described (if needed) as coherent state. One can talk about photons in superposition, but then there is not definite number of photons of any definite kind there. The photons there are a mathematical fiction, spread from minus infinity to plus infinity.
We can also speak of a photon being emitted by an atom, in which case it is obviously localized near the atom when the emission occurs.
Yes, but this region is huge, its size is greater than wavelength of the emitted radiation. The claim is that it makes no sense to assign position to that emitted radiation within this region.
Furthermore, in the usual analysis of the double slit experiment one has, at least implicitly, a wavefunction for the photon, which successfully recovers the high school result.
Yes, this is because diffraction on slit can be roughly analyzed with simplified models such as diffraction of scalar field. This does not necessarily mean wave function of photons is a useful concept in general problems of interaction of light and matter. Try to describe spontaneous emission in terms of "wave function of photon".
Actually, notwithstanding the no-go result, there is a position vector for photons; but it is singular in much the same sense that spherical coordinates are singular.
The issue can be best addressed by looking at the Wigner classification - but within the framework of symplectic geometry, rather than Hilbert spaces.
The real meaning and import of the no-go theorem is that the Wigner class which photons belongs to (which I term, below, the helical subfamily of the luxons, or the "helions") has no spin-orbit decomposition, so that the usual expressions for spin and position cannot be developed for helions. The symplectic geometry for the helion subclass shares many features in common with the symplectic geometry for magnetic monopoles (the latter which is discussed in LNP 107), except that the roles of the (q,p) coordinates are reversed.
Like all symplectic geometries, the coordinates for a symplectic leaf pair off into (q,p) pairs, and the helions have 3 Darboux pairs, which can be arranged (with a little manipulation and adjustment) into the usual form (,) for position and momentum. But unlike the Newton-Wigner position vector, is singular, when expressed as a functions of (,,,E) = (angular momentum, moving moment, momentum, energy). It has a coordinate singularity of the above-mentioned type.
The Wigner classes for the Poincaré group consist of the following:
(0) Homogeneous classes (unnamed by Wigner) ( ≡ , E ≡ 0),
(1) Tardions (P² < αE²), where I will use α = 1/c² here and in the following,
(2) Luxons (P² = αE²), with ≢ ,
(3) Tachyons (P² > αE²).
where ≡ refers to conditions that hold on the symplectic leaf that characterizes the given representation,
(Most of what I describe here and below, by the way, also applies also to non-relativistic theory, by taking α = 0; except that the Luxons and Tachyons merge into a single unnamed family: the mass 0 representations for the Bargmann group - a class I named the "Synchrons". I also coined the term "Vacuon" for class (0).)
Over all classes, there are two invariants:
m² = M² − αP² = constant: mass shell constraint,
W² − αW₀² = constant: "spin/helicity shell" constraint
(the latter name being for lack of a better term),
where, for convenience, I will also use M = αE for "moving mass" here and below; where
(W₀,) = (·, M + ×)
is the Pauli-Lubanski vector. For tardions, the second invariant reduces to
W² − αW₀² = m² S² (tardions only)
where S is the spin; and there are decompositions for:
Angular Momentum (Spin-Orbit): = × +
Moving Mass Moment: = M − t + α×/(m + M)
where t may be arbitrarily selected, and adjusted accordingly. This can be inverted to express (,) in terms of (,), the result yielding that is known as the "Newton-Wigner" position vector for tardions.
For all families (1), (2), (3), there is a sub-family given by (W₀,) = (0,) Pauli-Lubanski vector - called "spin 0". For this class, too, there is a similar decomposition:
Angular Momentum: = ×
Moving Mass Moment: = M − t
and one can write
= /M + t, = M
The indeterminacy in t - the same as what occurs generally for tardions - characterizes the trajectory for a worldline:
{ (,t) ∈ ℝ³×ℝ: = /M + t }.
For this subclass, ≡ , and W₀ ≡ 0, which results as a secondary constraint.
For the quantized form of the symplectic decomposition, and M are represented by operators that do not commute with one another (their brackets are [,M] = iħα), so the quotient is only determined up to "factor ordering ambiguity" - which here means: up to an undetermined multiple of , i.e. t. So the − t term in the expression for already comes out automatically, in the quantized form of the classification.
For spin non-zero tardions, the expression for is = ₀ + t, where ₀ is:
The Newton-Wigner Position Vector: ₀ = /M − α ×/(m(m + M)).
The expression for is
Spin Vector: = /m − αW₀/(m(m + M))
The most important features of the classes and subclasses are that:
(a) they are each characterized by the invariants and by what conditions apply to them,
(b) subsidiary invariants may also occur for subfamilies,
(c) the number of free parameters left over after removing the constraints from the set (,,,M) (or (,,,E)) is even,
(d) the remaining free parameters pair off into (q,p) variables - which is the essential statement of the Darboux Theorem,
(e) upon quantization, these pairs yield Heisenberg pairs - and this is where the Heisenberg relations come from.
For classes (1)-(3), the spin-0 systems have 4 constraints (0 Pauli-Lubanski vector) and, thus, 6 free variables, which combine to give you the 3 Heisenberg pairs (,). The extra parameter t can be normalized to 0 ... which is how it is normally done with the Newton-Wigner vector ... and so is inessential. (In the quantized version of the symplectic classification, one normalizes /M − t to the symmetric product ½(M⁻¹ + M⁻¹).)
For class (0) there are subsidiary invariants K² − αJ² and · that emerge, so that only 4 parameters at most are left free. The subclasses may have 2 pairs of Darboux coordinates (a "vacuum with spin and moment") or 0 (the "vacuum"); in the latter case the additional constraints are just K² = αJ² and ≡ .
For class (1), the spin non-zero subclasses (i.e. where S² > 0) have 4 Darboux pairs. The fourth pair corresponds to the azimuthal component of angular momentum and the longitude and is normally quantized by the "m" number for spin states.
I won't describe class (3) in any detail, since it is a mess. The spin non-zero subfamilies all have 4 Darboux pairs.
Class (2), the Luxons, has 3 subclasses,
(a) spin 0: (, W₀) ≡ (, 0),
(b) helical: ∥ , i.e. × ≡ (or equivalently, W² ≡ αW₀²), with ≢ ,
(c) general (or "continuous spin"), W² − αW₀² > 0
Note that the identity · = MW₀ follows from the definition of the Pauli-Lubanski vector, so from the constraint M² = αP², must follow that W² − αW₀² ≥ 0. Equality can only occur if ∥ , which is why the constraints × ≡ and W² ≡ αW₀² are equivalent for Luxons.
The most important properties of these subclasses are that:
(a) the spin 0 subclass has only 3 Darboux pairs, which can be represented as (,),
(b₀) helicity (i.e. the component of parallel to ) is a subsidiary invariant for the helical subclass,
(b₁) the helical subclass, therefore, also has only 3 Darboux pairs(!),
(c) the continuous spin class has 4 Darboux pairs, and they are not represented by any spin orbit decomposition(!!).
Photons fall into the helical subfamily. The same is true for all fundamental particles ... in their true massless states before they are endowed with the appearance of mass by interaction with the Higgs. The reason for this is that weak nuclear charge is a multiple of left helicity for matter and right helicity for anti-matter and - by virtue of being a charge - it must first and foremost be an invariant property of the particle, which means the particles can only be helions or spin 0. That's why a Higgs mechanism is required for electroweak theory.
There is no spin-orbit decomposition, per se, for the helical subfamily, simply because there are only 3 Darboux pairs, rather than 4. Photon helicity is not spin! Classically, this corresponds to the fact (as Hehl has frequently pointed out) that the free electromagnetic field has no spin current and presents a symmetric stress tensor. For the interacting electromagnetic field (i.e. the field in a medium) the spin current would be proportional to × + ×, which is only non-zero if the constitutive laws for (,) versus (,) ... or (,) versus (,) ... are non-isotropic.
For electromagnetic fields inside a medium (like water) light goes slower than light speed in vacuuo, so the corresponding dressed quanta would fall into the tardion class and would have spin-orbit decompositions. In the quantized version of this, one would probably represent such "fields inside media" by effective Lagrangians, integrating out the external modes comprising the medium, and the dressed photons would acquire - in addition to the two values m = ±1 that come out of helicity - an extra mode for m = 0 and the dressed photons would "acquire mass". This is directly related to the very phenomenon in solid state physics that inspired the idea of the Higgs mechanism itself.
The question you're asking is: what about the helical subfamily? Since there are 3 Darboux pairs, then they do admit a quantization that has 3 Heisenberg pairs, notwithstanding the so-called no-go theorem. What it is really saying is that there is no spin-orbit decomposition and no analogue of the Newton-Wigner position operator that can be derived in that way.
However, there is a position operator, simply by virtue of the fact that the symplectic representation has 3 pairs of Darboux coordinates! The situation, like that of mapping coordinates for the sphere, is that at some point, the coordinates will go singular.
The sphere does not admit a globally non-zero linearly independent pair of vector fields on it. A similar situation occurs with the symplectic geometry that characterizes the helions. The similarity of its symplectic geometry to that of the magnetic monopole has been noted in the literature. The situation is analogous, except for the (q,p) reversal.
To write down a position operator, you can start by simply writing down a decomposition analogous to the "spin-helicity" decomposition for tardions:
= × + η/M, = M − t ⇒ W₀ = ηP²/M, = η
the helicity being ηP/M = ηc.
It does, indeed, work - except that the - Poisson bracket relations acquire a deficit that is proportional to η. It's possible to adjust the definition of to eliminate this deficit, resulting in a bona fide Heisenberg pair set for (,), but the expression for will be singular in the components of and . It's a coordinate indeterminacy, like that which the spherical coordinates (r,θ,φ) have at the poles when expressed as functions of Cartesian coordinates (x,y,z).
Would you like to see what it is? (Chomping at the bit, after all this long discussion, hmm?) Should I tell you? (Tease, tease!) No I think I'll end the reply here and leave it hanging...
Well, on second consideration...
They're in my notes somewhere and I'll have to look and check (and review it closely).
Here it is. There is no one solution. Instead, you need to pick a unit vector . Then you can write down the decomposition:
= × + ηP²/M ××/|×|², = M − t + η · ×/|×|².
This is obtained by taking the unadjusted and making an adjustment (,) → ( + δ × , + M δ) for a suitable δ that fixes the deficit in the - brackets, while preserving (W₀,).
The representation goes singular in directions ∥ , so you need a second -vector to cover this region of the symplectic geometry. Two coordinate maps and regions, at minimum, are required to cover the symplectic geometry.
It's the same situation that occurs with magnetic monopoles, and η plays a role analogous to the electric-magnetic charge product.
To find , you'll have to solve the above relations for , which I'll leave to you and to the interested reader.
If you examine the little group for this subclass, using (,υ,,τ) to denote infinitesimal (rotations, boosts, spatial translations, time translations), you will find that it includes
(1) rotations ∥ ,
i.e. rotations along the axis collinear with or "helical" rotations,
(2) spatial translations ∥
combined with time translations τ such that ε = cτ,
(3) transverse boosts/rotations, ,υ ⊥ ,
combined with a compensating translations ,
such that = (/P)×υ/c and P² + η = .
Properties (1) and (2) single out as a center-of-mass worldline, while property (3), which is just a "null boost" (combined with a translation perpendicular to both the boost and ), shows that there is a compensating relocation of the worldline, under a transverse boost.