Why does more bandwidth mean higher bit rate in digital transmission?
It's a subtle point, but your thinking is going astray when you think of a 330-Hz tone as somehow conveying 660 bits/second of information. It doesn't — and in fact, a pure tone conveys no information at all other than its presence or absence.
In order transmit information through a channel, you need to be able to specify an arbitrary sequence of signaling states that are to be transmitted, and — this is the key point — be able to distinguish those states at the other end.
With your 30-330 Hz channel, you can specify 660 states per second, but it will turn out that 9% of those state sequences will violate the bandwidth limitations of the channel and will be indistinguishable from other state sequences at the far end, so you can't use them. This is why the information bandwidth turns out to be 600 b/s.
This is only a partial answer, but hopefully it gets at the main points you're misunderstanding.
My problem is that I'm having a hard time understanding why bandwidth relates to bit rate at all. ...
If a zero is expressed as a 30 Hz carrier frequency, a one is expressed as a 330 Hz carrier frequency, and the modulation signal is 330 Hz, then the max bit rate is 660 bps.
If you switch down to 30 Hz for a zero, you need to have about 1/60 s or so to really know you got 30 Hz and not 20 Hz or 50 Hz or something. Really in this case you are just on-off keying your 300 Hz carrier, and the 30 Hz signal that's sent for 1/660 s during the zeros is just confusing things.
To talk about FSK, let's take a more realistic example. Say you use 1 MHz for the zero and 1.01 MHz for the one. It turns out you need to measure the signal for about \$1/2\Delta{}f\$, in this case 1/20,000 s, to be able to reliably distinguish those two frequencies. If you just measured the signal for 1 us, you wouldn't really be able to tell the difference between a 1 MHz signal and a 1.01 MHz signal (although in an ideal, noise-free scenario you could do it, just as Shannon's formula says you can transmit infinite data with zero bandwidth when SNR goes to infinity)
So in this example the bit rate you can send is about 20 kHz, corresponding to 2x the difference between your 1 and 0 frequencies, just as the Nyquist formula leads you to expect for a 2-level code.