If the candela is a base SI unit, why isn't the sone an SI unit at all?
Ultimately, you'd have to ask the BIPM.
However, there's a strong case to be made that the average human eye's subjective response to illumination as a function of wavelength (the so-called luminosity function) is relatively uniform over the entire human population (if one ignores inconvenient facts like the existence of different kinds of colorblindness, in the several-percent level of prevalence), and that this remains relatively stable over any given individual's life.
By contrast, the human ear's spectral response shows appreciable evolution, and it shows a degradation over the high-frequency part of the spectrum starting as early as the twenties and thirties (hence e.g. "ultrasonic" ringtones, and other such ways to annoy an entire high-school classroom to the bafflement of the teacher, or their weaponized form). Producing a standardized equivalent of the luminosity function for subjective loudness is therefore much more difficult.