Why are programs written in C and C++ so frequently vulnerable to overflow attacks?
C and C++, contrary to most other languages, traditionally do not check for overflows. If the source code says to put 120 bytes in an 85-byte buffer, the CPU will happily do so. This is related to the fact that while C and C++ have a notion of array, this notion is compile-time only. At execution time, there are only pointers, so there is no runtime method to check for an array access with regards to the conceptual length of that array.
By contrast, most other languages have a notion of array that survives at runtime, so that all array accesses can be systematically checked by the runtime system. This does not eliminate overflows: if the source code asks for something nonsensical as writing 120 bytes in an array of length 85, it still makes no sense. However, this automatically triggers an internal error condition (often an "exception", e.g. an ArrayIndexOutOfBoundException
in Java) that interrupts normal execution and does not let the code proceed. This disrupts execution, and often implies a cessation of the complete processing (the thread dies), but it normally prevents exploitation beyond a simple denial-of-service.
Basically, buffer overflow exploits requires the code to make the overflow (reading or writing past the boundaries of the accessed buffer) and to keep on doing things beyond that overflow. Most modern languages, contrary to C and C++ (and a few others such as Forth or Assembly), don't allow the overflow to really occur and instead shoot the offender. From a security point of view this is much better.
Note that there is some amount of circular reasoning involved: Security issues are frequently linked to C and C++. But how much of that is due to inherent weaknesses of these languages, and how much of it is because those are simply the languages most of the computer infrastructure is written in?
C is intended to be "one step up from assembler". There is no bounds checking other than what you yourself implemented, to squeeze the last clock cycle out of your system.
C++ does offer various improvements over C, the most relevant to security being its container classes (e.g. <vector>
and <string>
), and since C++11, smart pointers, which allow you to handle data without having to manually handle memory as well. However, due to being an evolution of C instead of a completely new language, it still also provides the manual memory management mechanics of C, so if you insist on shooting yourself in the foot, C++ does nothing to keep you from it.
So why are things like SSL, bind, or OS kernels still written in these languages?
Because these languages can modify memory directly, which makes them uniquely suited for a certain type of high-performance, low-level application (like encryption, DNS table lookups, hardware drivers... or Java VMs, for that matter ;-) ).
So, if a security-relevant software is breached, the chance of it being written in C or C++ is high, simply because most security-relevant software is written in C or C++, usually for historic and / or performance reasons. And if it's written in C/C++, the primary attack vector is the buffer overrun.
If it were a different language, it would be a different attack vector, but I am sure there would be security breaches just as well.
Exploiting C/C++ software is easier than exploiting, say, Java software. The same way that exploiting a Windows system is easier than exploiting a Linux system: The former is ubiquitous, well understood (i.e. well-known attack vectors, how to find and how to exploit them), and a lot of people are looking for exploits where the reward / effort ratio is high.
That does not mean the latter is inherently safe (safer, perhaps, but not safe). It means that -- being the harder target with lower benefits -- the Bad Boys aren't wasting as much time on it, yet.
Actually, "heartbleed" was not really a buffer overflow. To make things more "efficient", they put many smaller buffers into one big buffer. The big buffer contained data from various clients. The bug read bytes that it wasn't supposed to read, but it didn't actually read data outside that big buffer. A language that checked for buffer overflows wouldn't have prevented this, because someone went out of their way or prevent any such checks from finding the problem.