Definition/meaning of Aliasing? (CPU cache architectures)

@Wu yes you do need to understand virtual memory little to understand aliasing. Let me give you a few lines of explanation first:

Lets say I have a RAM (physical memory) of 1GB. I want to present my programmer with a view that I have 4GB memory then I use virtual memory. In virtual memory, the programmer thinks that he/she has 4GB and writes their program from that perspective. They do not need to know how much physical memory exists. The advantage is that program will run on computers with different amounts of RAM. Also, the program can run on a computer together with other programs (also consuming physical memory).

So here is how virtual memory is implement. I will give a simple 1-level virtual memory system (Intel has a 2/3-level system which just makes it complicated for explanation.

Our problem here is that the programmer has 4 Billion addresses and we only have 1 billion places to put those 4 billion addresses. So, addresses are from the virtual address space need to be mapped to physical address space. This is done using a simple index table called a Page Table. You access a Page Table with a virtual address and it gives you the physical address of that memory location.

Some details: Remember that physical space is only 1GB so the system only keeps the most recently accessed 1GB worth in physical memory and keeps the rest in system disk. When the program requests a particular address, we first check if it is already in physical memory. If so, it is returned to the program. If not, it brought from the disk and put into physical memory and then returned to the program. The latter is known as a Page Fault.

Coming back to aliasing in context of virtual memory: since there is mapping between virtual -> physical addresses, it is possible to make two virtual addresses to map to the same physical address. it is the same as saying that if I look at my page table for virtual address X and Y, I will get the same physical address in BOTH cases.

I show below a simple example of a 8 entry Page Table. Say there are 8 vitual addresses and only 3 physical addresses. The page table looks as follows:


     0:    1
     1:   On disk
     2:    2
     3:    1
     4:   On disk
     5:   On disk
     6:   On disk
     7:    0


This mean that if virtual address 4 is accessed, you will get a page fault. 
If virtual addresses 3 is accessed, you will get the physical address 1
In this case, virtual addresses 0 and 3 are aliasing to the same physical address 1 for both of them

NOTE: I used the terms physical and virtual addresses everywhere to simplify the concept. In a real system, the virtual-to-physical mapping is not on a per address basis . Instead, we map chunks of virtual space to physical space. Each chunk is called a Page (thats why the mapping table is called a page table) and the size of the chunk is a property of the ISA, e.g., Intel x86 has 4Kbyte pages.


You'd need to learn about Virtual Memory first, but basically it's this:

  • The memory addresses your program uses aren't the physical addresses that the RAM uses; they're virtual addresses mapped to physical addresses by the CPU.

  • Multiple virtual addressses can point to the same physical address.

That means that you can have two copies of the same data in separate parts of the cache without knowing it... and they wouldn't be updated correctly, so you'd get wrong results.


Edit:

Exerpt of reference:

Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver, this can lead to hardware stability problems and system lockups.


For those who are still unconvinced:

On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.