[Crypto] Is there an encryption scheme that is url-safe AND compact?
Solution 1:
TL;DR You should use format-preserving encryption which is designed to solve your problem.
There are various solutions some of which are easier than others to implement. I'll estimate their sizes based on your example token and url.
Some less than perfect solutions
As suggested in the comments, a bitwise stream cipher (such as ChaCha) or block cipher in streaming mode (such as AES-CTR) followed by base 64 encoding and url tokenising will only make your string about 42% longer plus any IV overhead. You may or may not want to worry about the ability of the user to modify the data. A user who can identify their token data could then forge tokens of the same or shorter length. Estimated token length with 128-bit IV: 61 characters. Estimated url length: 92 characters.
By adding a MAC you can defend against token manipulation but add another 21-22 characters to both lengths. Modern authenticated encryption methods may integrate authentication codes automatically.
You can shave off some characters with shorter IVs and MACs, but I wouldn't recommend going below 64-bits. These would save you 10-11 characters. Estimated token length with 64-bit IV: 50 characters (61 with 64-bit MAC). Estimated url length: 81 characters (92 with 64-bit MAC).
If we assume that your tokens use a character set of size 65 {A-Za-z0-9@|.
} You could index the set and use a mod 65 stream cipher which would preserve the character set. Again IVs and MACs would be advisable, but could be expressed in the character set. There aren't any off the shelf mod 65 stream ciphers, but one could be constructed based on established stream cipher designs.
Estimated token length: 40 characters (52 with short MAC)
Estimated url length: 71 characters (73 with short MAC)
My laptop browser has a 65 character address bar and so I like urls with fewer than 64 characters to inhibit malicious url suffixes such as @http://evil.com/malware.exe. None of these solutions manage to achieve this for your example.
The designed and standardised solution
You can create url-friendly tokens and then use format-preserving encryption for 0 overhead. NIST have a standard for FPE and googling for "FF3-1 open source" should find an implementation in your language of choice. Estimated token length: 27 characters Estimated url length: 58 characters
58 character urls are still longer than the address bar on my phone browser, but this is still pretty good.
Solution 2:
The way the encryption and the encoding parts work really are orthogonal problems, with orthogonal solutions.
First: You want to make sure your plaintext strings (e.g. [email protected]|1234|A
are represented in binary with a reasonably compact character encoding. UTF-8 is likely just fine, just be sure you're not for example encrypting something like the internal UTF-16 representation some languages use (e.g., Java).
Second: Lots of ciphers feature plaintext expansion, which is the property that can produce ciphertexts longer than the input plaintexts. You almost certainly don't want to completely eliminate expansion because you almost certainly want (a) non-deterministic encryption (encrypting the same message twice with the same key should produce different ciphertexts; this often requires an IV or nonce value to be sent with the ciphertext) and (b) authenticated encryption, which generally requires some fixed expansion, but given how you wish to minimize the size of short messages, you want to make sure you choose a cipher whose expansion isn't more onerous than it needs to be.
(Advanced topic I don't want to get into: it could possibly be feasible to reduce the fixed overhead of authenticated encryption by using short tags. It's advanced because the security analysis can get a bit complicated.)
Third: Instead of URL-encoding a regular Base64 text, there are special, URL-compatible modified versions of Base64:
Using standard Base64 in URL requires encoding of
+
,/
and=
characters into special percent-encoded hexadecimal sequences (+
' becomes%2B
,/
becomes%2F
and=
becomes%3D
), which makes the string unnecessarily longer.For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the
+
and/
characters of standard Base64 are respectively replaced by-
and_
, so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. Some variants allow or require omitting the padding=
signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode=
to.
, potentially exposing applications to relative path attacks when a folder name is encoded from user data.
The neat thing about this is that you can implement these with textual substitution on regular Base64 strings.