What is the purpose of PLL in a general microcontroller
The PLL lets you be flexible with clock speed even after you've built the board, and of course, it lets you generate many different frequencies from the one onboard oscillator.
Honestly, just having a PLL so you can generate many frequencies off the onboard RC oscillator makes it worth having a PLL. That way you can operate flexibly with no external oscillator at all if you don't need one. From there, it's not too much more effort to make it so you can also re-route that PLL to an external oscillator.
The PLL lets you produce clocks faster than what is possible in a quartz crystal. Even though MEMS oscillators are available which can oscillate at much higher frequencies than quartz, you still might not want to operate directly off of one since a 400MHz external oscillator requires you to route a 400MHz trace.
As for how the PLL works. Do you know anything about music? Do you know how you can listen to a song and clap to the beat? You just keep equal timing between each clap and adjust the timing until each clap lands on a beat. Easy, right?
Now, do you know how you could do two, or even four claps per beat? A PLL does the same thing. You count your own claps and make sure the time between each clap is equal, but you adjust the time between claps until every fourth clap lands on the beat that you hear in the song, at which point you stop adjusting. In that way, you can produce a clap that is four times as fast even though the beat of the song is four time slower.
I don't understand how PLL relates to microcontroller. I'm not sensing any phase shift or trying to stabilize any signal here, and I don't get how the PLL magically produce a 400MHz clock.
From the point of view of a microcontroller, a PLL is just a frequency multiplier. It takes some reference frequency like from a 10 MHz oscillator and generates all the other clock frequencies a microcontroller needs.
Why is this PLL embedded in the microcontroller? If I want my processor to clock at max 80MHz as written in the specs then I just use an 80MHz external crystal. If some peripherals like USB require faster clock sources then I use a faster crystal and divide the clock to supply multiples of slower clock to other devices.
Unless you happen to be able to find a single oscillator that can be divided down exactly to all the various frequencies you need, this usually isn't practical. Instead, you take a reference clock and multiply it up (or down) as needed. I have seen cheap devices that try to divide down a single clock, and it usually works really badly. They tend to have weird glitches, like producing 48kHz audio that sounds ok but 44.1kHz that runs fast since the LCM of 48000 and 44100 is a large number.
External crystals are more accurate than internal oscillators anyway, so why bother stuffing a PLL in between an accurate external crystal and the processor, especially when I'm not dealing with any high frequency or RF application?
In this case, the PLL uses an external oscillator, so provided it isn't incompetently implemented, it will be very accurate.
To add to the other answers, there are couple of other reasons why a PLL may be useful:
To reduce EMC emissions (while also saving money, and reducing the chance of glitches)
To quote from ST application note AN1709:
Some microcontrollers have an embedded programmable PLL Clock Generator allowing the usage of standard 3 to 25 MHz crystals to obtain a large range of internal frequencies (up to a few hundred MHz). By these means, the microcontroller can operate with cheaper, medium frequency crystals, while still providing a high frequency internal clock for maximum system performance. The high clock frequency source is contained inside the chip and does not go through the PCB (Printed Circuit Board) tracks and external components. This reduces the potential noise emission of the application.
The use of PLL network also filters CPU clock against external sporadic disturbances (glitches).
To save power
In a low-power product, it can be very useful to have the option to run the processor (and its peripherals) at different speeds depending on what it needs to do at any point in time, or to generate assorted clocks at some times, but not others.
So this may involve increasing the clock speed when necessary, but decreasing it (or turning off the PLL altogether) at other times.
To give a concrete example: I worked on a battery-powered product which normally ran at 8 MHz, with the PLL off. However, periodically, we needed to generate much faster clocks to enable I2S streaming from an external audio chip. So, we spun up the PLL just for the few seconds where we needed those clocks, then shut it down when we were done.