Are there any sequences other than ../ which will be intepreted as directory traversal in *nix or Windows?
Utilizing Unicode, it's possible to encode \
and /
into multi-byte characters. If the string comparison functions are not unicode-aware, there could be a bug which allows these characters through.
Wikipedia has a section on this in relation to an old attack on Windows servers:
When Microsoft added Unicode support to their Web server, a new way of encoding
../
was introduced into their code, causing their attempts at directory traversal prevention to be circumvented.Multiple percent encodings, such as
%c1%1c
%c0%af
translated into
/
or\
characters.
Technically this is still using the slash character when it comes to the directory traversal, it's just not the true single-byte character which may confuse some code.
I believe the best advice for avoiding such characters would be to disallow any characters for file system paths except a safe subset of ASCII characters. You can also sidestep other issues of allowed characters in some operating systems and file systems at the same time.
On Windows and Unix - no. There may be obscure operating systems that use different path separators.
To handle encoding securely there is a simple rule: fully decode before doing sanitisation. If you fail to do this, you sanitisation can be circumvented. Imagine an application that does open(urldecode(normalize(path)))
. If the path contains ../
then normalize with remove it. But if it contains %2e%2e%2f
then normalize will do nothing, and urldecode will convert that to ../
. This error has led to a number of real-world vulnerabilities including the well known IIS unicode bug.
Another issue that sometimes appears is nested sequences. Suppose your normalize function does path.replaceAll('../', '')
. This can be circumvented by trying ....//
- the inner ../
is removed, leaving ../
. The solution is either to completely reject strings that contain forbidden sequences, or to recursively apply the normalisation function.
There are other characters that can have surprising results in paths. A null byte is generally allowed within strings in high-level languages, but when this is passed to the C library, the null string is a terminator. File names like evil.php%00.jpg
can bypass file extension checks. There was also the IIS semi-colon bug.
In general, file names are not a good place for untrusted data. There is the potential for second-order attacks, where other processes that read the directory listing have vulnerabilities. There might be cross-site scripting in a web page that lists files; there have been Windows explorer vulnerabilities that malicious file names can exploit; and attacking a Unix shell through escape sequences is a recent concern. Instead, I recommend storing the user-supplied file name in a database, and having the file name be the primary key.