Detailed explanation of Base64 encoding knowledge

When we are doing front-end development, we often mention one item for project optimization: for smaller images, use Base64 strings to replace the embedded ones reasonably, which can reduce page http requests.
And it will be specially emphasized that it must be a small picture, the size should not exceed a few KB, and so on.
So, what exactly is Base64?

Preliminary Understanding

The following string should be very common to everyone. Through this fixed format, a picture is represented and recognized by the browser, and the picture can be displayed completely:

......

What is shown here is a picture in svg format. Of course, we can also load pictures in any format supported by browsers.

This string is encoded based on Base64, and base64,the long string of strings behind it is the Base64 encoded string.

How Base64 was born

In the early days of Internet development, e-mail was the most effective application.
In the early days, the SMTP transmission protocol of e-mail could only be used to transmit the 7-digit ASCII code, and the ASCII code was designed based on English, so resources such as characters in non-English-speaking countries could not be sent.
In order to solve this problem, there was a general Internet mail extension MIME later, which added the main structure of the mail and defined the encoding transmission rules of non-ASCII codes, which is Base64.
For knowledge about character encoding, please refer to Character encoding knowledge that needs to be understood in front-end development

basic definition

Base64 is a codec that represents binary data based on 64 printable characters.
Because it can be coded and decoded, its main role is not in security, but in allowing content to be transmitted error-free between gateways.

These 64 printable characters include uppercase letters A-Z, lowercase letters a-z, numbers, 0-9a total of 62 characters, plus another 2  + and  /.
Base64 is an index encoding, and each character corresponds to an index. The specific relationship diagram is as follows:

This is also the origin of the 64 in the name.

Encoding

Since 64 is equal to 2 to the 6th power, a Base64 character actually represents 6 binary bits (bit).
However, 1 byte of binary data corresponds to 8 bits. Therefore, 3 bytes (3 x 8 = 24 bits) of string/binary data can be converted into 4 Base64 characters (4 x 6 = 24 bits).
Why is it a group of 3 bytes? Because the least common multiple of 6 and 8 is 24, 24 bits are exactly 3 bytes.

The specific encoding method:

  1. Treat each 3 bytes as a group, 3 bytes with a total of 24 binary bits
  2. Divide these 24 bits into 4 groups of 6 bits each
  3. Add two 00s in front of each group of 6 binary digits to expand to 32 binary digits, that is, four bytes
  4. Each byte corresponds to a number less than 64, which is the character number
  5. According to the character index relationship table, each character number corresponds to a character, and the Base64 encoded character is obtained

The character string in the above figure  'you', after conversion, is encoded as:  'eW91'.

increase in size

We can see that when 3 characters are encoded with Base64, they finally become 4 characters. Because each 6-bit is filled with 2 0s, it becomes 8-bit, corresponding to 1 byte.
This is exactly one-third more, so under normal circumstances, the volume of Base64-encoded data is usually one-third larger than the volume of the original data .
This is why when we talked about using Base64 encoding to optimize pictures, we need to emphasize that they are small icons. If all pictures use this method, the static files will increase a lot, which is not suitable.

= equal sign

3 English characters can be converted into 4 Base64 characters. So if the character length is not a multiple of 3, what kind of rules should be used?
In fact, it is also simple. When we actually use Base encoding, we often find that there is a 65th character, which is a  '=' symbol. This equal sign is a processing method for this special situation.
For places less than 3 bytes, 0 will be added at the end until there are 24 binary bits.
But it should be noted that when calculating the number of bytes, the total length will be directly divided by 3. If the remainder is 1, one will be added at the end, and =if the remainder is 2, two will be added =.
Therefore, the transcoded string needs to be supplemented with a suffix equal sign, either 1 or 2, as shown in the figure below for details:

The second one in the figure uses a single character  'd'to distinguish the index 0 in the index character table. At this time, in the obtained code, there will be an A character corresponding to index 0, '='but directly add 2 characters.

Non-ASCII characters

Since  Base64 only characters can  ASCII be encoded, if it is a non-ASCII code such as Chinese characters, it is necessary to convert the Chinese characters into ASCII characters before encoding.

codec method

btoa and atob

JavaScript provides two native methods to handle Base64 encoding: btoa() and  atob().

  • btoa(): Convert a string or binary value to a Base64 encoded string.
    Note: The btoa method can only directly process characters with ASCII codes, and an error will be reported for characters with non-ASCII codes.
  • atob(): Decodes a base64 encoded string.
    Note: if the atob method passes in a string parameter that is not a valid Base64 encoding (such as a non-ASCII character), or its length is not a multiple of 4, an error will be reported.
btoa('you') // 'eW91'
atob('eW91') // 'you'
btoa('中') // Uncaught DOMException: The string to be encoded contains characters outside of the Latin1 range.
atob('y') // Uncaught DOMException: The string to be decoded is not correctly encoded.

Handle Chinese characters

Because btoa and atob only support encoding of ASCII characters, that is, single-byte characters, and our usual Chinese characters are 2-4 bytes.
Therefore, you can first convert the Chinese characters to  utf-8 the encoding, and use the utf-8 encoding as the character, so that you can encode multiple single-byte characters.

For Chinese you can use these two methods:  encodeURIComponent() and  decodeURIComponent().

  • encodeURIComponent(): UTF-8 encoding of non-ACSII characters
  • decodeURIComponent(): decoding using

As follows, the way to encode and decode Chinese:

window.btoa(encodeURIComponent('中国'))
// 'JUU0JUI4JUFEJUU1JTlCJUJE'
decodeURIComponent(window.atob('JUU0JUI4JUFEJUU1JTlCJUJE'))
// '中国'

third party library

  • js-base64

Front-end common applications

Next, let's understand some common usage scenarios of Base64 encoding in front-end development.
Most of the applications of Base64 in the front end are for image processing, and are generally used based on DataURL.

The Data URL consists of  data:前缀, MIME类型(表明数据类型), base64标志位(optional if text), and  数据本身 four parts.
Specific format: data:[<mime type>][;base64],<data>.
The fourth part of  <data> the data itself is a Base64 string.

Small picture transcoding

That is to say at the beginning, for image optimization, if using Base64 can reduce the number of requests, it can be under the img tag or in css:

<img src="......Ii8+PC9nPjwvc3ZnPg==">
.icon {
  background: url(......Ii8+PC9nPjwvc3ZnPg==);
}

When we use the vue or react framework, it can also be configured through url-loader, and the icon is converted to the size of Base64:

  .loader('url-loader')
  .tap(options => {
    options.limit = 10240 // 10kb
    return options
  })

file read

In the web environment, there is  FileReader an API provided to read the data of the file. Through its  readAsDataURL() method, the file data can be read as Base64-encoded string data:

  let reader = new FileReader()
  reader.onload = () => {
    let base64Img = reader.result
  };
  reader.readAsDataURL(file)

This method is commonly used in image uploading.

Canvas generates images

Canvas is essentially a bitmap image, and it provides  toDataURL() a method to export the canvas as a picture, which will be saved in Base64 encoded format.

const dataUrl = canvasEl.toDataURL()
// ......

other

In addition to dealing with image display, you will also see Base64 encoded strings in special data transmission, simple encoding and encryption, code obfuscation, and some certificates.

Summarize

Finally, let's summarize the characteristics of Base64:

  • Convert binary data to character string (ASCII code) to facilitate data transmission.
  • Browsers can directly display Base64-encoded images, reducing requests.
  • After encoding, the data will be at least one-third larger, and additional methods are required to handle encoding and decoding.

Guess you like

Origin blog.csdn.net/jh035/article/details/128128084