Why is L1 cache faster than L2 cache?
No, they're not the same type of RAM, even though they're on the same chip that uses the same manufacturing process.
Of all the caches, the L1 cache needs to have the fastest possible access time (lowest latency), versus how much capacity it needs to have in order to provide an adequate "hit" rate. Therefore, it is built using larger transistors and wider metal tracks, trading off space and power for speed. The higher-level caches need to have higher capacities, but can afford to be slower, so they use smaller transistors that are packed more tightly.
L1 is usually used as a storage for decoded instructions, while L2 is a general cache for a single core. The lower the cache the smaller size it is and faster it usually is. As a rough rule of thumb for PC processors:
L1 Cache: 2-3 clock cycle access
L2 Cache: ~10 clock cycle access
L3 Cache: ~20-30 clock cycle access
The design of the L1 cache should be to maximize the hit rate (the probability of the desired instruction address or data address being in the cache) while keeping the cache latency as low as possible. Intel uses an L1 cache with a latency of 3 cycles. The L2 cache is shared between one or more L1 caches and is often much, much larger. Whereas the L1 cache is designed to maximize the hit rate, the L2 cache is designed to minimize the miss penalty (the delay incurred when an L1 miss happens). For chips that have L3 caches, the purpose is specific to the design of the chip. For Intel, L3 caches first made their appearance in 4 way multi-processor systems (Pentium 4 Xeon MP processors) in 2002. L3 caches in this sense greatly reduced delays in multi-threaded environments and took a load off the FSB. At the time, L3 caches were still dedicated to each single core processor until Intel Dual-Core Xeon processors became available in 2006. In 2009, L3 caches became a mainstay of the Nehalem microprocessors on desktop and multi-socket server systems.
Quote sourced here from "Pinhedd's" response.
There are several reasons why speed is inversely proportional to size. The first that comes to mind is the physical domination of conductors, where signal propagation i limited to some factor from speed of light. An operation may take as long as it will take an electrical signal to travel the longest distance inside the memory tile and back. Another related reason is the separation of clock domains. Each CPU runs off its own clock generator, which allows the CPU to run on multi-GHz clocks. Level-1 cache runs at and is synced with the CPU clock, which is the fastest in the system. Level-2 cache on the other hand has to serve many CPUs, and is running in a different (slower) clock domain. Not only the L2 clock slower (larger tile) but to cross a clock domain boundary adds another delay. Then of course there are the fan-out issues (already mentioned).