What relationship, if any, is there between the diameter of the Cayley graph and the average distance between group elements?
I can answer one of those questions: The diameter is at most twice the median distance by the pigeonhole principle.
Let $m$ be the median. Let $S$ be the set of products of up to $m$ generators. For any $g\in G$, let $S^{-1} g = \{s^{-1}g | s\in S\}.$
By the definition of median, $|S| >= |G|/2$ with equality only if $m$ is a half-integer. If $|S| > |G|/2$ then for any $g\in G$, $S \cap S^{-1}g$ is nonempty, which lets us write $g$ as a product of two elements of $S$. If $|S| = |G|/2$ then every element of $S$ is a product of at most $m-1/2$ generators. For any $g\in G$, either $S \cap S^{-1}g$ is nonempty or $S \cup S^{-1}g = G$, so any $h\in G$ which requires $m+1/2$ generators must be in $S^{-1}g$, which means $g$ can be written as $h$ times an element of $S$, a product of at most $(m+1/2)+(m-1/2) = 2m$ generators.
Your facts about $S_n$ are actually all facts about Coxeter groups with the generating set given by simple reflections: the distribution is symmetric around the mean, which is half the diameter. It even has a unique local maximum (assuming you allow the floor and roof of the mean to be "one maximum" even if they're different). In the Weyl group case, you can think of this as Hard Lefschetz for the flag variety.
I think perhaps the lesson here is that Coxeter groups are not a good representative of "groups in general."
Your question about the median is equivalent to: if the diamaeter of a group in some set of generators is d, how long does it take to generate half the group? The answer, as you've observed, depends a lot on what the group is and how the generators look. A case lots of people are currently thinking about is that where G is a finite group of Lie type; in this case, the Cayley graph (we now know, thanks to work of Helfgott, Pyber-Szabo, Breuillard-Green-Tao, Golsefidy-Varju, etc.) is an expander; since "most random graphs are expanders," this might be thought of as a reasonably generic case.
Now I am remembering a talk I saw Breuillard give about this so forgive any inaccuracy; but I believe the size of the set of words of length n grows exponentially in n for n up to some multiple of log G (so the number of words is |G|^c for some c < 1) then there's a transitional phase, but then once you've covered c|G| of the words in the group you finish up very quickly. These correspond to the "early period," "middle period," "late period" in this post of Terry's, though in the post he's talking about the closely related random walk rather than the volume balls. In other words, the number of steps necessary to cover half the group is probably very close to the diameter in these cases.