Last active
July 16, 2024 20:44
-
-
Save cameronmaske/f520903ade824e4c30ab to your computer and use it in GitHub Desktop.
base64 that actually encodes URL safe (no '=' nonsense)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
base64's `urlsafe_b64encode` uses '=' as padding. | |
These are not URL safe when used in URL paramaters. | |
Functions below work around this to strip/add back in padding. | |
See: | |
https://docs.python.org/2/library/base64.html | |
https://mail.python.org/pipermail/python-bugs-list/2007-February/037195.html | |
""" | |
import base64 | |
def base64_encode(string): | |
""" | |
Removes any `=` used as padding from the encoded string. | |
""" | |
encoded = base64.urlsafe_b64encode(string) | |
return encoded.rstrip("=") | |
def base64_decode(string): | |
""" | |
Adds back in the required padding before decoding. | |
""" | |
padding = 4 - (len(string) % 4) | |
string = string + ("=" * padding) | |
return base64.urlsafe_b64decode(string) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> test = "helloworld" | |
>>> encode_base64(test) | |
'aGVsbG93b3JsZA' | |
>>> e = encode_base64(test) | |
>>> decode_base64(e) | |
'helloworld' | |
>>> test = "Hello World" | |
>>> encoded = encode_base64(test) | |
>>> print encoded | |
SGVsbG8gV29ybGQ | |
>>> decoded = decode_base64(encoded) | |
>>> decoded | |
'Hello World' | |
>>> decoded == test | |
True |
Here are 2 one-liners for encoding and decoding:
(lambda string: urlsafe_b64encode(string).strip(b"="))(b"this will be converted into base64!")
(lambda string: urlsafe_b64decode((string+(b"="*(4-(len(string)%4))))))(b"this will be converted back into base64!")
Examples:
Encoding:
>>> (lambda string: urlsafe_b64encode(string).strip(b"="))(b"this will be converted into base64!")
b'dGhpcyB3aWxsIGJlIGNvbnZlcnRlZCBpbnRvIGJhc2U2NCE'
Decoding
>>> (lambda string: urlsafe_b64decode((string+(b"="*(4-(len(string)%4))))))(b'dGhpcyB3aWxsIGJlIGNvbnZlcnRlZCBpbnRvIGJhc2U2NCE')
b'this will be converted into base64!'
Note
It does require base64
's urlsafe_b64encode
and urlsafe_b64decode
to be imported like this
from base64 import urlsafe_b64encode, urlsafe_b64decode
This can be easily changed, as all you need to do is change the calls to the functions (right after (lambda string:
)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
while this is a nice workaround, I don't believe it's the "correct" way to do it anymore. Padding isn't the only thing that can break base64 encoding as it may use 0 - 9 , a - z , A - Z , + , and /. + is a special URL char (meaning a space) and / of course is another special URL char. So you'd have to take care of those yourself as well via string replacement. I don't recommend this route as at some point the standards of something may change. Better to let the libraries handle all of this stuff for you. There are 3 things that might help and be safer.
(1) Instead of stripping the padding you can use the following functions:
(2) These functions aren't really necessary. The correct way to do this would be to not build your URL using raw base64 encoded stuff and string concatenation. instead something like:
requests.get(http://example.com, params = {"base64_encoded_param" : base_64_param})
Requests, along with any other decent URL library will then percent encode these for the URL for you. No need to keep track of what needs to be percent-encoded etc.
(3) Use base58 encoding. The allowed charset is A-Z and the digits 1-9. Base58 excludes zero, uppercase 'O', uppercase 'I', and lowercase 'l'. In other words, no padding, no funny chars. Just A-Z and 1-9. It requires an additional library as I don't think there's a standard lib module yet, but I usually find it well worth it to just deal with ascii-range and no special char stuff.