What does it mean to have a "file name with NULL bytes in serialized instances"?

A null byte is a byte with the value zero, i.e. 0x00 in hex.

There have been security vulnerabilities related to null bytes. These occur because C uses null bytes as a string terminator. Other languages (Java, PHP, etc.) don't have a string terminator; they store the length of every string separately.

Now, consider a Java web application that accepts file uploads. Perhaps we want to let users upload .jpg files, but nothing else. In fact, if a user can upload a .jsp file, this will be a serious security vulnerability.

What a hacker might try is to upload hack.jsp<NUL>.jpg. Let's think about how this will be processed. First, Java will look at the file name, see it ends in .jpg and allow the upload. It then calls the operating system library, which is written in C. C sees the <NUL> character as the string terminator, so it saves the file as hack.jsp.

Many languages fix this by explicitly disallowing <NUL> bytes in file names. I know Python and PHP do this. However, if your language does not do this for you, you must do it yourself. More information - OWASP: Null-Byte Injection

I don't know how exactly "serialised instances" is related to this, but I think this gives you some idea what's going on.


Every character has a numeric value as dictated by the corresponding character set. So, for example, typically A is 65 and a is 97. But if the number is 0, then it's not really a character, it's a "null"; which basically means not a character.

In C and C++, this "null" character is used to mean the end of a string. So "HELLO" is stored like this:

 H   E   L   L   O  
72  69  76  76  79  00 

The 00 on the end says "stop here". But not all frameworks use C-strings and their "null terminator" format; in fact most do not. Instead, the length of the string is stored up front and the actual contents can contain anything, including these null characters -- characters whose numeric value is 0.

So in this instance, if the name contains one of these null characters, most of the code won't mind; null bytes are allowed. But when the name makes it to FileUpload code, that null ends up causing some confusion. Some of the code thinks that the name ends with the null byte, while other bits think that the null byte is just another letter. That confusion, where two bits of code don't agree on what the name actually is, then leads to a security vulnerability.

Tags:

File Upload