Why is it dangerous when an attacker can control the `n` parameter to `memcpy()`?

Assuming buf's size is either controlled by n or larger than 16, the attacker could make n any number he wanted and use that to read an arbitrary amount of memory. memcpy and C in general do not throw exceptions or prevent this from happening. So long as you don't violate any sort of page protections or hit an invalid address, memcpy would continue merrily along until it copies the amount of memory requested.

I assume that user and this vulnerable block of code is in a function somewhere. This likely means it resides on the stack. All local function variables, the return address, and other information are contained on the stack. The below diagram shows it's structure in systems using intel assembly (which most platforms use and I assume your computer does).

Stack frame

You would be able to get the return address using this method if you were to make n large enough to cause memcpy to move forward in the stack frame. user would be in the section in this diagram labeled "Locally declared variables". EBP is a 4 byte value, so if we were to read past that and them copy the next 4 bytes with memcpy, we'd end up copying the return address.

Note the the above depends on what architecture the program is running on. This paper is about iOS, and since I don't know anything about ARM, the specifics of this information could be somewhat inaccurate.


A good answer has already been given by sasha, but I want to look at this from another angle; specifically, what memcpy actually does (in terms of what code gets executed).

Allowing for the possibility of minor bugs in this quick-and-dirty implementation, a trivial implementation of memcpy() that meets the C89/C99/POSIX function signature and contract might be something not entirely unlike:

/* copy n bytes starting at source+0, to target+0 through target+(n-1), all inclusive */
void memcpy (void* target, void* source, size_t n)
{
    for (size_t i = 0; i < n; i++)
    {
        *target++ = *source++;
        /* or possibly the here equivalent: target[i] = source[i]; */
    }
}

Now, a real implementation would probably do the copying in larger chunks than one byte at a time to take advantage of the wide memory (RAM) interconnect buses of today, but the principle remains exactly the same.

For the purposes of your question, the important part to note is that there is no bounds checking. This is by design! There are three important reasons for why this is so:

  1. C is often used as a operating system programming language, and it was designed as a "portable assembler". Thus, the general approach to many of the old library functions (of which memcpy() is one), and the language in general, is that if you can do it in assembler, it should also be doable in C. There are very few things you can do in assembler but not in C.
  2. There is no way to, given a pointer to a memory location, know how much memory is properly allocated at that location, or even if the memory pointed to by the pointer is allocated at all! (A common trick to speed up software in the old days of early x86 systems and DOS was to write directly to the graphics memory to put text on the screen. The graphics memory, obviously, was never allocated by the program itself; it was just known to be accessible at a specific memory address.) The only way to really find out if it works is to read or write the memory and see what happens (and even then I believe accessing uninitialized memory invokes undefined behavior, so basically, the C language standard allows anything to happen).
  3. Basically, arrays degenerate to pointers, where the unindexed array variable is the same thing as a pointer to the start of the array. This is not strictly true in every case, but it's good enough for us right now.

It follows from (1) that you should be able to copy any memory you want to, from anywhere to anywhere. Memory protection is Someone Else's Problem. Specifically, these days it's the responsibility of the OS and MMU (these days generally part of the CPU); the relevant portions of the OS themselves likely being written in C...

It follows from (2) that memcpy() and friends need to be told exactly how much data to copy, and they have to trust that the buffer at the target (or whatever else is at the address pointed to by the target pointer) is sufficiently large to hold that data. Memory allocation is The Programmer's Problem.

It follows from (3) that we can't tell how much data is safe to copy. Making sure memory allocations (both source and destination) are sufficient is The Programmer's Problem.

When an attacker can control the number of bytes to copy using memcpy(), (2) and (3) break down. If the target buffer is too small, whatever follows it will be overwritten. If you are lucky, that will result in a memory access violation, but C the language or its standard libraries doesn't guarantee that it will happen. (You asked it to copy memory contents, and it either does that, or it dies trying, but it doesn't know what was intended to be copied.) If you pass a source array that is smaller than the number of bytes you ask for memcpy() to copy, there is no reliable way for memcpy() to detect that such is the case, and it will happily barrage on past the end of the source array as long as reading from the source location and writing to the target location works.

By allowing an attacker to control n in your example code, in such a way that n is larger than the maximum size of the array on the source side of the copy, memcpy() will because of the above points happily keep copying beyond the length of the intended source array. This is basically the Heartbleed attack in a nutshell.

That is why the code leaks data. Exactly what data is leaked depends both on the value of n and how the compiler lays out the machine language code and data in memory. The diagram in sasha's answer gives a good overview, and every architecture is similar but different.

Depending on how exactly your variable buf is declared, allocated and laid out in memory, you might also have what is known as a stack smashing attack where data needed for the proper operation of the program is overwritten, and the data that overwrote whatever was there is subsequently referred to. In mundane cases this leads to crashes or nigh-impossible-to-debug bugs; in severe, targetted cases, it can lead to arbitrary code execution fully under the control of the attacker.


I am posting another answer, because the two answers here, although both correct, miss an important point of the question in my oppinion. The question is about the information leak concerning memory layout.

The presented memcpy might always have a correctly sized output buffer, so even if the attacker controls the size, there might be no risk of stack smashing at this point. Leaking information (as in heartbleed, as already mentioned by Linuxios) is a potential problem, depending on what information is leaked. In this example, you are leaking the address of publicFunction. This is a real problem, because it defeats Address Space Layout Randomization. ASLR is topic for example in How do ASLR and DEP work?. As soon as you publish the address of publicFunction, the address of all other functions in the same module (DLL or EXE file) are published, and can be used in return-to-libc or return-oriented-programming attacks. You need a different hole than the one presented here for those attacks, though.

Tags:

C

Memory

Appsec