Could Legolas actually see that far?
Fun question!
As you pointed out,
$$\theta \approx 1.22\frac{\lambda}{D}$$
For a human-like eye, which has a maximum pupil diameter of about $9\ \mathrm{mm}$ and choosing the shortest wavelength in the visible spectrum of about $390\ \mathrm{nm}$, the angular resolution works out to about $5.3\times10^{-5}$ (radians, of course). At a distance of $24\ \mathrm{km}$, this corresponds to a linear resolution ($\theta d$, where $d$ is the distance) of about $1.2\ \mathrm m$. So counting mounted riders seems plausible since they are probably separated by one to a few times this resolution. Comparing their heights which are on the order of the resolution would be more difficult, but might still be possible with dithering. Does Legolas perhaps wiggle his head around a lot while he's counting? Dithering only helps when the image sampling (in this case, by elven photoreceptors) is worse than the resolution of the optics. Human eyes apparently have an equivalent pixel spacing of something like a few tenths of an arcminute, while the diffraction-limited resolution is about a tenth of an arcminute, so dithering or some other technique would be necessary to take full advantage of the optics.
An interferometer has an angular resolution equal to a telescope with a diameter equal to the separation between the two most widely separated detectors. Legolas has two detectors (eyeballs) separated by about 10 times the diameter of his pupils, $75\ \mathrm{mm}$ or so at most. This would give him a linear resolution of about $15\ \mathrm{cm}$ at a distance of $24\ \mathrm{km}$, probably sufficient to compare the heights of mounted riders.
However, interferometry is a bit more complicated than that. With only two detectors and a single fixed separation, only features with angular separations equal to the resolution are resolved, and direction is important as well. If Legolas' eyes are oriented horizontally, he won't be able to resolve structure in the vertical direction using interferometric techniques. So he'd at the very least need to tilt his head sideways, and probably also jiggle it around a lot (including some rotation) again to get decent sampling of different baseline orientations. Still, it seems like with a sufficiently sophisticated processor (elf brain?) he could achieve the reported observation.
Luboš Motl points out some other possible difficulties with interferometry in his answer, primarily that the combination of a polychromatic source and a detector spacing many times larger than the observed wavelength lead to no correlation in the phase of the light entering the two detectors. While true, Legolas may be able to get around this if his eyes (specifically the photoreceptors) are sufficiently sophisticated so as to act as a simultaneous high-resolution imaging spectrometer or integral field spectrograph and interferometer. This way he could pick out signals of a given wavelength and use them in his interferometric processing.
A couple of the other answers and comments mention the potential difficulty drawing a sight line to a point $24\rm km$ away due to the curvature of the Earth. As has been pointed out, Legolas just needs to have an advantage in elevation of about $90\ \mathrm m$ (the radial distance from a circle $6400\ \mathrm{km}$ in radius to a tangent $24\ \mathrm{km}$ along the circumference; Middle-Earth is apparently about Earth-sized, or may be Earth in the past, though I can't really nail this down with a canonical source after a quick search). He doesn't need to be on a mountaintop or anything, so it seems reasonable to just assume that the geography allows a line of sight.
Finally a bit about "clean air". In astronomy (if you haven't guessed my field yet, now you know.) we refer to distortions caused by the atmosphere as "seeing". Seeing is often measured in arcseconds ($3600'' = 60' = 1^\circ$), referring to the limit imposed on angular resolution by atmospheric distortions. The best seeing, achieved from mountaintops in perfect conditions, is about $1''$, or in radians $4.8\times10^{-6}$. This is about the same angular resolution as Legolas' amazing interferometric eyes. I'm not sure what seeing would be like horizontally across a distance of $24\ \mathrm{km}$. On the one hand there is a lot more air than looking up vertically; the atmosphere is thicker than $24\ \mathrm{km}$ but its density drops rapidly with altitude. On the other hand the relatively uniform density and temperature at fixed altitude would cause less variation in refractive index than in the vertical direction, which might improve seeing. If I had to guess, I'd say that for very still air at uniform temperature he might get seeing as good as $1\rm arcsec$, but with more realistic conditions with the Sun shining, mirage-like effects probably take over limiting the resolution that Legolas can achieve.
Let's first substitute the numbers to see what is the required diameter of the pupil according to the simple formula: $$ \theta = 1.22 \frac{0.4\,\mu{\rm m}}{D} = \frac{2\,{\rm m}}{24\,{\rm km}} $$ I've substituted the minimal (violet...) wavelength because that color allowed me a better resolution i.e. smaller $\theta$. The height of the knights is two meters. Unless I made a mistake, the diameter $D$ is required to be 0.58 centimeters. That's completely sensible because the maximally opened human pupil is 4-9 millimeter in diameter.
Just like the video says, the diffraction formula therefore marginally allows to observe not only the presence of the knights – to count them – but marginally their first "internal detailed" properties, perhaps that the pants are darker than the shirt. However, to see whether the leader is 160 cm or 180 cm is clearly impossible because it would require the resolution to be better by another order of magnitude. Just like the video says, it isn't possible with the visible light and human eyes. One would either need a 10 times greater eye and pupil; or some ultraviolet light with 10 times higher frequency.
It doesn't help one to make the pupils narrower because the resolution allowed by the diffraction formula would get worse. The significantly more blurrier images are no helpful as additions to the sharpest image. We know that in the real world of humans, too. If someone's vision is much sharper than the vision of someone else, the second person is pretty much useless in refining the information about some hard-to-see objects.
The atmospheric effects are likely to worsen the resolution relatively to the simple expectation above. Even if we have the cleanest air – it's not just about the clean air; we need the uniform air with a constant temperature, and so on, and it is never so uniform and static – it still distorts the propagation of light and implies some additional deterioration. All these considerations are of course completely academic for me who could reasonably ponder whether I see people sharply enough from 24 meters to count them. ;-)
Even if the atmosphere worsens the resolution by a factor of 5 or so, the knights may still induce the minimal "blurry dots" at the retina, and as long as the distance between knights is greater than the distance from the (worsened) resolution, like 10 meters, one will be able to count them.
In general, the photoreceptor cells are indeed dense enough so that they don't really worsen the estimated resolution. They're dense enough so that the eye fully exploits the limits imposed by the diffraction formula, I think. Evolution has probably worked up to the limit because it's not so hard for Nature to make the retinas dense and Nature would be wasting an opportunity not to give the mammals the sharpest vision they can get.
Concerning the tricks to improve the resolution or to circumvent the diffraction limit, there aren't almost any. The long-term observations don't help unless one could observe the location of the dots with the precision better than the distance of the photoreceptor cells. Mammals' organs just can't be this static. Image processing using many unavoidably blurry images at fluctuating locations just cannot produce a sharp image.
The trick from the Very Large Array doesn't work, either. It's because the Very Large Array only helps for radio (i.e. long) waves so that the individual elements in the array measure the phase of the wave and the information about the relative phase is used to sharpen the information about the source. The phase of the visible light – unless it's coming from lasers, and even in that case, it is questionable – is completely uncorrelated in the two eyes because the light is not monochromatic and the distance between the two eyes is vastly greater than the average wavelength. So the two eyes only have the virtue of doubling the overall intensity; and to give us the 3D stereo vision. The latter is clearly irrelevant at the distance of 24 kilometers, too. The angle at which the two eyes are looking to see the 24 km distant object are measurably different from the parallel directions. But once the muscles adapt into this slightly non-parallel angles, what the two eyes see from the 24 km distance is indistinguishable.
Take the following idealized situation:
- the person of interest is standing perfectly still, and is of a fixed homogeneous color
- the background (grass) is of a fixed homogeneous color (significantly different from the person).
- Legolas knows the proprotions of people, and the colors of the person of interest and the background
- Legolas knows the PSF of his optical system (including his photoreceptors)
- Legoalas know the exact position and orientation of his eyes.
- Assume that there is essentially zero noise in his photo receptors, and he has acccess to the ouptut of each one.
From this, Legolas can calculate the exact response across his retina for any position and (angular) size of the person of interest, including any diffraction effects. He can then compare this exact template to the actual sensor data and pick the one that best matches -- note that this includes matching manner in which the response rolls off and/or any diffraction fringes around the border of the imaged person (I'm assuming that the sensor cells in his eyes over-sample the PSF of the optical parts of his eyes.)
(To make it even more simple: it's pretty obvious that given the PSF, and a black rectangle on a white background, we can compute the exact response of the optical system -- I'm just saying that Legolas can do the same for his eyes and any hypothetical size/color of a person.)
The main limitations on this are:
- how many different template hypotheses he considers,
- Any noise or turbulence that distorts his eyes' response away from the calculable ideal response (noise can be alleviated by integration time),
- His ability to control the position and orientation of his eyes, i.e. $2m$ at $24km$ is only $0.01$ radians -- maps to $\approx 0.8\mu m$ displacements in the position of a spot on the outside of his eyes (assumed $1cm$ eyeball radius).
Essentially, I'm sketching out a Bayesian type of super-resolution technique as alluded to on the Super-resolution Wikipedia page.
To avoid the problems of mixing the person with his mount, let's assume that Legolas observed the people when they were dismounted, taking a break maybe. He could tell that the leader is tall by just comparing relative sizes of different people (assuming that they were milling around at separations much greater than his's eye's resolution).
The actual scene in the book has him discerning this all while the riders were mounted, and moving -- at this stage I just have to say "It's a book", but the idea that the diffraction limit is irrelevant when you know alot about your optical system and what you are looking at is worth noting.
Aside, human rod cells are $O(3-5\mu m)$ -- this will impose a low-pass filtering on top of any diffraction effects from the pupil.
A Toy Model Illustration of Similar Problem
Let $B(x; x_0, dx) = 1$ for $x_0 < x < x_0+dx$ and be zero other wise; convolve $B(x; x_0, dx_1)$ and $B(x; x_0, dx_2)$, with $dx_2>dx_1$, with some known PSF; assume that this the width of this PSF if much much less than either $dx_1, dx_2$ but wide compared to $dx_2-dx_1$ to produce $I_1(y), I_2(y)$. (In my conception of this model, this is the response of a single retina cell as a function of the angular position of the eye ($y$).) I.e. take two images of different sized blocks, and align the images so that the left edges of the two blocks are at the same place. If you then ask the question: where do the right edges of the images cross a selected threshold value, i.e. $I_1(y_1)=I_2(y_2)=T$ you'll find that $y_2-y_1=dx_2-dx_1$ independent of the width of the PSF (given that it is much narrower than either block). A reason why you often want sharp edges is that when noise is present, the values of $y_1, y_2$ will vary by an amount that is inversely proportional to the slope of the image; but in the absence of noise, the theoretical ability to measure size differences is independent of the optical resolution.
Note: in comparing this toy model to the Legolas problem the valid objection can be raised that the PSF is not much-much smaller than the imaged heights of the people. But it does serve to illustrate the general point.