How do you choose the classifications on choropleth maps?
A great reference, not cited enough, is "How Maps Works" by Alan M. McEachren (The Guilford Press, 1995/2004). It's not a quick guide but a comprehensive reflexion on how maps are seen and understood, based on a really impressive scientific survey and the knowledge of practitionners.
I recently purchased Thematic Cartography and Visualization (Slocum et al., 2005), and just skimming it appears to be more than sufficient for my request for general references on the topic of choosing bins. It will certainly give me plenty to read for quite some time, and it wasn't too arduous a decision to buy (there are many older cheap copies floating around).
Note I do not think I would recommend MacEachren's How Maps Work for this question in particular. The book is so monolithic I might have certainly forgot, but I do not remember any straightforward discussion about choosing the number of bins (at least not as straightforward as the chapter devoted to it in the Slocum textbook). If anything I think I remember him mentioning the topic is a bit over-done and has not come to any real conclusion, but I would certainly recommend it as a general reference for data visualization anyway.
There is a crazy amount of literature on the topic, and I will have to do some more self-study to see if I can come up with a more satisfactory answer for classifying the skewed distribution. And I will post back if I have anything more substantive to say.
But for the second question about visualizing small multiple maps I recently came across an article by Cynthia Brewer and Linda Pickle, Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series (PDF here), that is exactly aimed at my question.
In short there experiments suggest quantiles are the most useful way to represent a series of small multiple maps, both for the ease in interpretation (as I suggested in the question) and the fact that they produce equal area maps in terms of fill when the polygons are roughly the same size. This perhaps is not obvious until you see a counter example, below I have pasted a picture of some small multiple maps in which the classifications are constrained to be equal across the series of different rates of cancer (on page 674 of the cited article).
Because the incidence of liver disease is so much lower than COPD, all of the counties in the top maps tend to fall into the lower classifications. If you can't discriminate patterns within one of the maps you will be unlikely to distingish patterns between maps! Of course if reasonable one should make the classifications consistent, but that is only reasonable for some comparison maps. Also as far as the number of bins they chose 7 in their experiments.