Is it ok to remove the equal signs from a base64 string?

Looking at your code:

>>> base64.b64encode(combined.digest(), altchars="AB")
'PeFC3irNFx8fuzwjAzAfEAup9cz6xujsf2gAIH2GdUM='

The string that's being encoded in base64 is the result of a function called digest(). If your digest function is producing fixed length values (e.g. if it's calculating MD5 or SHA1 digests), then the parameter to b64encode will always be the same length.

If the above is true, then you can strip of the trailing equals signs, because there will always be the same number of them. If you do that, simply append the same number of equals signs to the string before you decode.

If the digest is not a fixed length, then it's not safe to trim the equals signs.

Edit: Looks like you might be using a SHA-256 digest? The SHA-256 digest is 256 bits (or 32 bytes). 32 bytes is 10 groups of 3, plus two left over. As you'll see from the Wikipedia section on padding; that'd mean you always have one trailing equals. If it is SHA-256, then it'd be OK to strip it, so long as you remember to add it again before decoding.


Every 3 bytes you need to encode as Base64 are converted to 4 ASCII characters and the '=' character is used to pad the result so that there are always a multiple of 4 encoded characters. If you have an exact multiple of 3 bytes then you will get no equal sign. One spare byte means you get two '=' characters at the end. Two spare bytes means you get one '=' character at the end. depending on how you decode the string it may or may not see this as a valid string. With the example string you have, it doesn't decode, but some simple strings I've tried do decode.

You can read this page for a better understanding of base64 strings and encoding/decoding.

http://www.nczonline.net/blog/2009/12/08/computer-science-in-javascript-base64-encoding/

There are free online encoder/decoders that you can use to check your output string


It's fine to remove the equals signs, as long as you know what they do.

Base64 outputs 4 characters for every 3 bytes it encodes (in other words, each character encodes 6 bits). The padding characters are added so that any base64 string is always a multiple of 4 in length, the padding chars don't actually encode any data. (I can't say for sure why this was done - as a way of error checking if a string was truncated, to ease decoding, or something else?).

In any case, that means if you have x base64 characters (sans padding), there will be 4-(x%4) padding characters. (Though x%4=1 will never happen due the factorization of 6 and 8). Since these contain no actual data, and can be recovered, I frequently strip these off when I want to save space, e.g. the following::

from base64 import b64encode, b64decode

# encode data
raw = b'\x00\x01'
enc = b64encode(raw).rstrip("=")

# func to restore padding
def repad(data):
     return data + "=" * (-len(data)%4)
raw = b64decode(repad(enc))

Tags:

Python

Base64

Md5