[Wordpress] Does WordPress sanitize arguments to WP_Query?
It's actually a good question. Of course user input cannot be trusted, but also sanitizing the same value twice "just in case" isn't the best solution for a problem.
The only real answer to this question can be given by looking at the code and following what goes on.
And after reading it for a few hours, here is what I can say about it:
WP Query sanitizes the values but it doesn't do it all in one place. Values are being sanitized before they are actually used. It's a very organic and on the go approach. Also, not all of the values are sanitized, but in those cases a prepared statement is used for the SQL query.
Let's get into detail
So, what happens when we do:
$query = new WP_Query($args);
The WP_Query class constructor checks if the $args
is an empty array or not. And if it's not, it will run its query()
method passing the $args
array (which is at this point called $query
array).
$this->query($query);
The query()
method calls the init()
method which unsets possible previous values and sets new defaults.
Then it runs wp_parse_args() on the $args
array. This function does not sanitize anything, it serves like a bridge between default data and input data.
The next call is for the get_posts()
method, which is in charge of retrieving the posts based on the given query variables.
The first thing that is called inside the get_posts()
method is the parse_query()
method, which starts by calling the fill_query_vars()
method (this one makes sure that a list of default keys are set. The ones that are not, get set with an empty string or empty array depending on the case).
Then, still inside the parse_query()
method, the first santization takes place.
p
is checked against is_scalar() and cleaned with intval()
Also absint() is used on the following values:
page_id
year
monthnum
day
w
paged
hour
minute
second
attachment_id
Also, a preg_replace('|[^0-9]|'...)
is run on m
, cat
, author
to only allow comma separated list of positive or negative integers on these.
For other values at this point, only the trim() function is used. This is the case for:
pagename
name
title
After this, the method starts checking what type of query we are running. Is it a search? An attachment? A page? A single post? ...
If a pagename
is set then we call (without sanitizing the value) get_page_by_path($qv['pagename'])
. But checking that function source we can see that the value is sanitized with esc_sql() before it's used for a database request.
After that, we can see that when the keys post_type
or post_status
are used, they are both sanitized with sanitize_key() (Only lowercase alphanumeric characters, dashes, and underscores are allowed).
For the taxonomy related parameters, the parse_tax_query()
method is called.
category__and
, category__in
, category__not_in
, tag_id
, tag__in
, tag__not_in
, tag__and
are sanitized with absint()
tag_slug__in
and tag_slug__and
are sanitized with sanitize_title_for_query()
At this point the parse_query()
method is over, but we still are inside the get_posts()
method.
posts_per_page
is sanitized.
title
is being used unsanitized but with a prepared statement. You may find this question interesting: Are prepared statements enough to prevent SQL injection?
Then we have post__in
and post__not_in
that are being sanitized with absint().
And if you keep reading the code and pay attention, you will see that all the keys are actually being sanitized before they get to touch a SQL statement or a prepared statement is used instead.
So, to answer your original question:
Does WordPress sanitize arguments to WP_Query?
It does sanitize most of them but not all. For example, pagename
, name
and title
are only "cleaned" with the trim() function (does not return SQL safe values!). But for the values that are not sanitized, a prepared statement is used to perform the database request.
Should you trust this?
Well, in this specific case I would prefer to go for the possibly redundant just presanitize everything approach before you throw it into the query.
Me too, as an engineering student, I would love a solid yes or no answer. But note that the WordPress codebase has evolved in a natural way so it's just like nature: messy. It does not mean it's bad. But messy means that there could be an unseen edge-case where somebody could potentially sneak in with a bomb. And you can prevent that by just doubling your guards!