GPU cores vs. CPU cores
A CPU is a much more general purpose machine than a GPU. We might talk about using a GPU as a "general purpose" GPU, but they have different strengths.
CPU cores are capable of a wide variety of operations and deal with (what can for all intents be considered to be) a random branching instruction stream. Multiple programs all vying for time on the processor and being controlled by the operating system. They cache and predict as much as they can while still trying to remain capable of dealing with sudden changes in the instruction stream.
GPUs on the other hand are processors designed to deal with data streams. Their processors are designed to work with a small series of instructions (a shader program) across a potentially vast amount of data. HD, 2k and 4k screens contain a huge number of pixels, and a shader must run programs across every pixel in successive runs to achieve particular effects. To that end their programs are (compared to a CPU) smaller, their per-core caches similarly smaller, but their bandwidth to memory phenomenally faster.
They might, with suitable programming, be able to achieve the same tasks, but the focus of instructions vs data processing is what separates a CPU from a GPU.
As such their cores are designed to work to those strengths. For a long while GPU shader cores have operated around 1-2GHz (modern intel graphics cores list their speeds as 500MHz to 1.5GHz) while CPUs have been anywhere between 1.5 and 4GHz and more.
Instruction processing benefits more from speed of individual units because it can be difficult or impossible to break an instruction stream down into multiple streams, hence CPUs need to be faster to deal with instructions quicker. The problem is that the faster you run a core the more heat it generates so you hit a limit in how fast you can run it. (There are other technical limitations that affect clock speed but that's something for another story.)
Data processing on the other hand lends itself to running the same task (program) on different data sets and parallelism, hence the more cores you can throw at the task the better. Running cores at a slower speed generates less heat. Less heat means you can put in more cores therefore better throughput of data. Hence data tasks benefit from a different (smaller, leaner) type of core to a CPU.
The end result is that we have two distinct types of processor. One is aimed at general purpose instruction streams, and another that is aimed at bulk data handling.