FILTER_FLAG_STRIP_LOW vs FILTER_FLAG_STRIP_HIGH?
FILTER_FLAG_STRIP_LOW
Remove characters with ASCII
value < 32
FILTER_FLAG_STRIP_HIGH
Remove characters with ASCII
value > 127
The flags are explained in a different page of the documentation.
FILTER_FLAG_STRIP_LOW
strips bytes in the input that have a numerical value <32, most notably null bytes and other control characters such as the ASCII bell. This is a good idea if you intend to pass an input to another application which uses null-terminated strings. In general, characters with a Unicode codepoint lower than 32 should not occur in user input, except for the newline characters 10 and 13.
FILTER_FLAG_STRIP_HIGH
strips bytes in the input that have a numerical value >127. In almost every encoding, those bytes represent non-ASCII characters such as ä
, ¿
, 堆
etc. Passing this flag can be a band-aid for broken string encoding, which can become a security vulnerability. However, non-ASCII characters are to be expected in virtually all user input.
To summarize:
filter_var("\0aä\x80", FILTER_SANITIZE_STRING) == "\0aä\x80"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW) == "aä\x80"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH) == "\0a"
filter_var("\0aä\x80", FILTER_SANITIZE_STRING,
FILTER_FLAG_STRIP_LOW | FILTER_FLAG_STRIP_HIGH) == "a"