Why is gradient noise better quality than value noise?
This is just my guess.
In short: Gradient noise leads in general to more visual appealing textures because it cuts low frequencies and emphasizes frequencies around and above the grid spacing.
Let's compare a naive value noise procedure with a naive gradient one, for a grayscale image.
Value noise: we paint the points in the grid with random values (white noise) and fill the surrounding pixels by linear interpolation. This will look ugly because (among other things) some of the random grid points will happen to have similar values, and then there will be large spots with nearly uniform color (low frequency). [*] Specifically, the pixel values in the neighborhood of a grid point will be all similar - and so we depend on the other grid points being distinct to have high frequencies... and this will be at most (with luck) of the order of the grid separation.
Gradient noise: we compute a random (uniform, white noise) gradient in each grid point, and compute the values by interpolating the dot products of the gradient with the distances. Consider again what happens in the neighborhoood of a grid point, specifically over a small circumference, disregarding the effect of other distant grid points. It's seen that the computed image value (as a dot product) in this small neighborhood will visit -smoothly but fully- the white-black range. Then, we can expect that the image values will never have uniform spots, i.e., we won't practically have frequencies below that of the grid spacing.
[*] A similar problem arises in halftoning/dithering: it's visually unpleasant to use binary white noise because of the low frequency component; a nicer dithering algorithm, as Floyd-Steinberg, produces instead high frequency ("blue noise").
The answer is easy and mathematical. Mathematical quality.
With value noise the function has no zero at the point of the value - the lattice point.
If you see the value at the point as gradient, you got a zero of the function there and the gradient defines the tangent in that point.
The advantage is a smooth transition as the first derivative matches the left and the right side of the gradient, leading to a smooth seemingly transition.
With Perlins original polynomial of the 3rd degree, however, the second derivation was not zero, meaning it had a curvature at the lattice point.
Later he introduced the improved Perlin with the polynomial of the 5th degree, with also the second derivative being zero.
With this, the transition over a lattice point is absolutely linear no matter how far you "zoom" in and is always smooth.