What characters are allowed in an OAuth2 access token?
TL;DR: There's no conflict between the standards. OAuth access tokens can generally contain any printable ASCII character, but if the access token is a Bearer token it must use "token64" syntax to be HTTP/1.1 compliant.
RFC 6749, §1.4 tells us: "An access token is a string" and "usually opaque to the client". §A.12 defines it as one or more printable ASCII characters ([ -~]+
in regex terms).
RFC 6749 defines various methods for obtaining an access token, but doesn't concern itself with how to actually use an access token, other than saying that you "present it" to a resource server, which must validate and then accept or reject it.
But RFC 6749 does require the authorization server to tell the client the token type (another string), which the client can use to determine how the access token is used.
A token type string is either an IANA-registered type name (like Bearer
or mac
), or a vendor URL (like http://oauth.example.org/v1
), though the URL is just a conveniently namespaced identifier, and doesn't have to resolve to anything.
In most deployments, the token type will be Bearer
, the semantics of which are defined in RFC 6750.
RFC 6750 defines three methods (§§2.1–2.3) of presenting a Bearer access token to the resource server. The recommended method (which resource servers must support to be standards compliant) is to send it in the HTTP Authorization header (§2.1), in which case the token must be a "b64token" ([-a-zA-Z0-9._~+/]+=*
in regex terms).
This matches what the HTTP/1.1 spec calls a "token68" (RFC 7235 §2.1), and is necessary to allow the token to be used unquoted in the HTTP Authorization header. (As for why HTTP/1.1 allows those exact characters, it comes down to historical reasons related to the HTTP/1.0 and Basic authentication standards, as well as limitations in current and historical HTTP implementations. Network protocols are a messy business.)
A "b64token" (aka "token68") permits a subset of ASCII characters usually used with base64 encoding, but (despite the name) the Bearer token does not impose any base64 semantics. It's just an opaque string that the client receives from one server and passes on to another. Implementations may assign semantics to it (e.g. JWT), but that's beyond the OAuth or Bearer token standards.
RFC 6750 doesn't state that a Bearer access token must be a b64token if used with the other two (unrecommended) methods, but given that the client is supposed to be able to choose the method, it wouldn't make much sense to give it a non-b64token token.
Other OAuth token types might not rely on being passed unquoted in an HTTP header (or they might not use HTTP at all), and would thus be free to use any printable ASCII character. This might e.g. be useful for token types that are not opaque to the client; as an example, I'm currently dealing with a setup in which the access token response looks a bit like this:
{
"access_token": "{\"endpoint\": \"srv8.example.org\", \"session_id\": \"fafc2fd\"}",
"token_type": "http://vendor.example.org/",
"expires_in": 3600,
"refresh_token": "tGzv3JOkF0XG5Qx2TlKWIA"
}
Here, the access token is a JSON-encoded data structure, which the client must act upon (according to rules associated with the vendor token type) to access the protected resource.
TLDR : Authorization header follow Basic schema defined in RFC2617. So the token should be base64 encoded.
This is highlighted by the following phrase of rfc6750,
The syntax of the "Authorization" header field for this scheme follows the usage of the Basic scheme defined in Section 2 of [RFC2617]
If you go and check RFC2617, following is the ABNF which make base64 encoding for user credentials.
credentials = "Basic" basic-credentials
basic-credentials = base64-user-pass
But as OP has pointed out, ABNF is defined as b64token
which is allows more than base64 encoding. So in real world implementations we can see for example JWT ( ABNF of base64 and .
separation) used as bearer tokens. This is acceptable as it comes within b64token
ABNF.
Answers for OP's questions,
- Access token can have any character from
%x20-7E
range. No restrictions on that and that's the definition for access token. - If Access Token is bearer token (token_type=bearer) then it must follow
b64token
AKAtoken68
. This make the access token qualified to be put in Authorization header. - RFC6749 define the format of the Access token. RFC6750 define how to utilise Authorization header to transmit access token.
b64token vs token68
There seems to be some confusion on naming of b64token
.
After some searching I came across following IETF discussions on RFC7235. RFC7235 define the current standard for HTTP authentication (which include Authorizationheader too)
According to those discussions, b64token
is an specific encoding. And there were suggestions to rename b64token
to token68
. They have made this change and basically b64token
refers to token68
.
Appendix section explains token68
on HTTP Authorization header, (NOTE - These are extracted. Go to link to check full explanation of ABNF )
Authorization = credentials
credentials = auth-scheme [ 1SP ( token68 / [ ( "," / auth-param )( OWS "," [ OWS auth-param ] ) ] ) ]
token68 = 1( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )"="**
So as I can see, RFC6750 is not updated with these naming (those definitions were in progress at the time of writing it).