Why has the size of L1 cache not increased very much over the last 20 years?
30K of Wikipedia text isn't as helpful as an explanation of why too large of a cache is less optimal. When the cache gets too large the latency to find an item in the cache (factoring in cache misses) begins to approach the latency of looking up the item in main memory. I don't know what proportions CPU designers aim for, but I would think it is something analogous to the 80-20 guideline: You'd like to find your most common data in the cache 80% of the time, and the other 20% of the time you'll have to go to main memory to find it. (or whatever the CPU designers intended proportions may be.)
EDIT: I'm sure it's nowhere near 80%/20%, so substitute X and 1-X. :)
One factor is that L1 fetches start before the TLB translations are complete so as to decrease latency. With a small enough cache and high enough way the index bits for the cache will be the same between virtual and physical addresses. This probably decreases the cost of maintaining memory coherency with a virtually-indexed, physically-tagged cache.
Cache size is influenced by many factors:
Speed of electric signals (should be if not the speed of light, something of same order of magnitude):
- 300 meters in one microsecond.
- 30 centimeters in one nanosecond.
Economic cost (circuits at different cache levels may be different and certain cache sizes may be unworth)
- Doubling cache size does not double performance (even if physics allowed that size to work) for small sizes doubling gives much more than double performance, for big sizes doubling cache size gives almost no extra performance.
- At wikipedia you can find a chart showing for example how unworth is making caches bigger than 1MB (actually bigger caches exist but you must keep in count that those are multiprocessor cores.)
- For L1 caches there should be some other charts (that vendors don't show) that make convenient 64 Kb as size.
If L1 cache size didn't changed after 64kb it's because it was no longer worth. Also note that now there's a greater "culture" about cache and many programmers write "cache-friendly" code and/or use prefetech instructions to reduce latency.
I tried once creating a simple program that was accessing random locations in an array (of several MegaBytes): that program almost freezed the computer because for each random read a whole page was moved from RAM to cache and since that was done very often that simple program was draining out all bandwith leaving really few resources for the OS.