Characters allowed in GET parameter

The question asks which characters are allowed in GET parameters without encoding or escaping them.

According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of the query set, see the definition below.

However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.

Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.

it is sometimes better for usability to avoid percent- encoding those characters.

RFC3986 defines query component as:

query       = *( pchar / "/" / "?" )
pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~" 

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

The conclusion is that XYZ part should encode:

special: # % = & ;
Space
sub-delims
out of query set: [ ]
non ASCII encodable characters

Unless special symbols = & ; are key=value separators.

Encoding other characters is allowed but not necessary.


There are reserved characters, that have a reserved meanings, those are delimiters — :/?#[]@ — and subdelimiters — !$&'()*+,;=

There is also a set of characters called unreserved characters — alphanumerics and -._~ — which are not to be encoded.

That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of GET parameter).

See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax


I did a test using the Chrome address bar and a $QUERY_STRING in bash, and observed the following:

~!@$%^&*()-_=+[{]}\|;:',./? and grave (backtick) are passed through as plaintext.

, ", < and > are converted to %20, %22, %3C and %3E respectively.

# is ignored, since it is used by ye olde anchor.

Personally, I'd say bite the bullet and encode with base64 :)