Base64 encoding and decoding with Python

Suppose you have a binary image file that you want to transfer over the network. You're surprised the other party didn't receive the file correctly - it just contained weird characters!

Hmm, it looks like you're trying to send the file in raw bits and bytes, and the medium you're using is designed for streaming text.

What is the workaround to avoid such problems? The answer is Base64 encoding. In this article, I'll show you how to encode and decode binary images using Python. The program is illustrated as a standalone local program, but you can apply the concept to different applications, such as sending encoded images from a mobile device to a server, and many others.

What is Base64?

Before diving into this article, let's define what Base64 means.

Base64 is a method of encoding 8-bit binary data into a format that can be represented by 6 bits. Only the characters  A-Z, a-z, 0-9, +/ are used to represent data, which  = are used to fill data. For example, with this encoding, three octets are converted to four 6-bit groups.

The term Base64 is taken from the Multipurpose Internet Mail Extensions (MIME) standard, widely used in HTTP and XML, originally developed for encoding email attachments for transmission.

Why do we use Base64?

Base64 is very important for binary data representation, so it allows binary data to be represented in a way that looks and acts like plain text, which makes storing in databases, sending in emails, or using in other applications more reliable. Text-based formats such as XML. Base64 is primarily used to represent data in ASCII string format.

As mentioned in the introduction to this article, sometimes the data will not be readable at all without Base64.

Base64 encoding

Base64 encoding is the process of converting binary data into a limited character set of 64 characters. As shown in the first section, these characters are  A-Z, a-z, 0-9,  + and  / (count, did you notice that they add up to 64?). This character set is considered the most common and is called Base64 for MIME. It uses  A-Z, a-z and ,  0-9 for the first 62 values, and  , and  / for the last two values.

Base64-encoded data will end up being longer than the original data, so as stated above, for every 3 bytes of binary data, there are at least 4 bytes of Base64-encoded data. This is because we compress the data into a smaller character set.

Have you seen a portion of the original email file (most likely from an unsent email) like the one below? If so, then you've seen Base64 encoding in action! (If you notice that at the end  = , you can tell this is Base64 encoding because of the padding with equal signs.)

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64

2KfZhNiz2YTYp9mFINi52YTZitmD2YUg2YjYsdit2YXYqSDYp9mE2YTZhyDZiNio2LHZg9in2KrZ
h9iMDQoNCtij2YjYryDZgdmC2Lcg2KfZhNin2LPYqtmB2LPYp9ixINi52YYg2KfZhNmF2YLYsdix
2KfYqiDYp9mE2K/Ysdin2LPZitipINin2YTYqtmKINiq2YbYtdit2YjZhiDYqNmH2Kcg2YTZhdmG
INmK2LHZitivINin2YTYqtmI2LPYuSDZgdmKDQrYt9mE2Kgg2KfZhNi52YTZhSDYp9mE2LTYsdi5
2YrYjCDYudmE2YXYpyDYqNij2YbZiiDYutmK2LEg2YXYqtiu2LXYtSDYqNin2YTYudmE2YUg2KfZ
hNi02LHYudmKINmI2KPZgdiq2YLYryDZhNmE2YXZhtmH2Kwg2KfZhNi52YTZhdmKDQrZhNiw2YTZ
gy4NCg0K2KzYstin2YPZhSDYp9mE2YTZhyDYrtmK2LHYpyDYudmE2Ykg2YbYtdit2YPZhSDZgdmK
INmH2LDYpyDYp9mE2LTYo9mGLg0KDQrYudio2K/Yp9mE2LHYrdmF2YYNCg==
--089e0141aa264e929a0514593016
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: base64

Base64 is performed in multiple steps, as follows:

  • The text to be encoded is converted to its respective decimal value, ie to the corresponding ASCII value (ie a:97, b:98, etc.). This is the ASCII table.
  • Convert the decimal value obtained in the above steps to its binary equivalent (ie 97: 01100001).
  • Concatenate all binary equivalents to obtain a large set of binary numbers.
  • A large pile of binary numbers is divided into equal parts, each containing only 6 bits.
  • Equal 6-bit groups are converted to their decimal equivalents.
  • Finally, the decimal equivalent is converted to its Base64 value (ie 4:E). Below are the decimal values ​​and their Base64 alphabet.

Base64 decoding

Base64 decoding is the inverse of Base64 encoding. In other words, it is performed by reversing the steps described in the previous section.

So the steps of Base64 decoding can be described as follows:

  • Each character in the string is changed to its Base64 decimal value.
  • The obtained decimal value is converted to its binary equivalent.
  • Truncate the first two digits of the binary number from each obtained binary number and combine the group of 6 bits to form a large string of binary digits.
  • Divide the large string of binary digits obtained in the previous step into groups of 8 bits.
  • An 8-bit binary number is converted to its decimal equivalent.
  • Finally, convert the obtained decimal value to the corresponding ASCII value.

Base64 encoding and decoding of strings

Once you understand what's going on behind the scenes, it will be easier to understand how it all works. Let's try to encode and decode a simple three-letter word, Hey.

We first convert each letter of the word to its ASCII equivalent, and then convert the ASCII equivalent to binary. This gives us the following values:

letter ASCII index value 8-bit binary value
H 72 01001000
e 101 01100101
y 121 01111001

In other words, we can write in binary like this  Hey:

01001000 01100101 01111001

That's 24 bits total, and when converted to groups of 6 bits, each bit yields four values:

010010 000110 010101 111001

In the Base64 table, characters  A are  represented Z by the values  ​​0  to  25  . Characters  are represented a by  z values  ​​26  to  51  . Numbers  0 are  represented 9 by values  ​​52  to  61  . The character sum  + is  represented / by  62  and  63  . characters  = are used for padding when the bits cannot be properly grouped into groups of 6.

We now convert the rearranged bits into numeric values, and then get the characters representing those numeric values.

6-bit binary value Base64 index value letter
010010 18 Small
000110 6 G
010101 21 V
111001 57 5

According to our calculation above, the letters  Hey will become when Base64 encoded  SGV5. We can test that this is correct with the following code:        

from base64 import b64encode

text_binary = b'Hey'

# SGV5
print(b64encode(text_binary))

The whole process is reversed, and our original data is obtained after Base64 decoding.

Now I'll quickly show you  Heyo the encoding of another word to account for occurrences in encoded strings  = .

letter ASCII index value 8-bit binary value
H 72 01001000
e 101 01100101
y 121 01111001
o 111 01101111

There are 32 bits in total. This will give us five different 6-bit groups with two remaining bits: 11. We  0000 pad them with to get groups of 6 bits. Grouping the 6 bits according to the above permutations gives the following result:

010010 000110 010101 111001 011011 110000

The rearranged bits will return the following characters based on the Base64 index value.

6-bit binary value Base64 index value letter
010010 18 Small
000110 6 G
010101 21 V
111001 57 5
011011 27 b
110000 48 w

This means  Heyo the Base64 encoded value of  SGV5bw==. Each  = represents a pair  00, which we add to fill the original bit sequence.

from base64 import b64encode

text_binary = b'Heyo'

# SGV5bw==
print(b64encode(text_binary))

Base64 encode the image

Now let's get down to the main points of this article. In this section, I'll show you how to easily Base64-encode an image using Python.

I will use the following binary image. Go ahead and download it, and let's start using Python! (I assume the name of the image is  deer.gif .)

Base64 encoding and decoding with Python

In order to use Base64 in Python, the first thing we need to do is import the base64 module:

导入base64

In order to encode an image, we simply use a function  base64.b64encode(s) . Python describes this function as follows:

Encodes a bytes-like object using Base64  s and returns the encoded bytes.

So we can do the following to Base64 encode an image:

import base64 
image = open('deer.gif', 'rb') #open binary file in read mode
image_read = image.read()
image_64_encode = base64.b64encode(image_read)

If you want to see the output of the encoding process, type the following:

打印 image_64_encode

Base64 decoded image

To decode an image using Python, we simply use  base64.b64decode(s) a function. Python says the following about this function:

Decodes a Base64-encoded bytes-like object or ASCII string and returns the decoded bytes.

So, to decode the image we encoded in the previous section, we do the following:

base64.decode(image_64_encode)

put them together

Let's put together a program for Base64 encoding and decoding images. A Python script to do this should look like this:

import base64
image = open('deer.gif', 'rb')
image_read = image.read()
image_64_encode = base64.b64encode(image_read)
image_64_decode = base64.b64decode(image_64_encode) 
image_result = open('deer_decode.gif', 'wb') # create a writable image and write the decoding result
image_result.write(image_64_decode)

If you open deer_decode.gif on your desktop  , you'll find that you have the original image deer.gif we encoded in step one  .

As we have seen from this article, Python makes performing seemingly complex tasks very easy.

URL safe encoding and decoding

+ As I mentioned earlier in this tutorial, Base64 encoding uses the characters and  , in addition to regular alphanumeric values  / . However, these characters have special meaning in URLs. This means that Base64-encoded values ​​using these characters may cause unexpected behavior if used inside URLs.

One solution to this problem is to use  the urlsafe_base64encode() and  urlsafe_base64decode() functions to encode and decode any data. + These functions will be replaced with   ,  -during  coding  ./_

Here's a Python example that shows the difference:

import base64

image = open('dot.jpg', 'rb')
image_data = image.read()

unsafe_encode = base64.b64encode(image_data)
safe_encode = base64.urlsafe_b64encode(image_data)

# b'/9j/4QAYRXhpZgAASUkqAAgAAAAAAAAAAAAAAP/sABFEdWNr....
print(unsafe_encode)

# b'_9j_4QAYRXhpZgAASUkqAAgAAAAAAAAAAAAAAP_sABFEdWNr....
print(safe_encode)

learn python

Whether you're just getting started or an experienced programmer looking to learn a new skill, learn Python with our complete Python tutorial guide.

This article has been updated with contributions from Nitish Kumar. Nitish is a web developer with experience creating e-commerce websites on various platforms. He spends his spare time working on personal projects that make his day-to-day life easier, or taking evening walks with friends.

Guess you like

Origin blog.csdn.net/lmrylll/article/details/132638849