Base64 encoding knowledge record

Table of contents

Coding instructions

Encoding

increase in size

= equal sign

demo


Coding instructions

Base64  is a representation method based on 64 printable characters to represent binary data. Since 2^6=64, each 6 bits is a unit, corresponding to a certain printable character.

Base64  is often used to represent, transmit, and store some binary data in situations where text data is usually processed, including MIME emails and some complex data in XML.

When dealing with the conversion between byte array and string in the project, if the character set is not specified, the length of the character array will change, as described in the demo later

Base64  encoding requires converting three 8-bit bytes (3*8=24) into four 6-bit bytes (4*6=24), and then adding two 0s in front of the 6 bits to form an 8-bit word section form. If the remaining characters are less than 3 bytes, they will be filled with 0, and the output characters will use  = , so 1 or 2 = may appear at the end of the encoded output text.

In order to ensure that the output code bits are readable characters, Base64  has formulated a code table for uniform conversion. The size of the encoding table is  2^6=64 , which is also   the origin of the name Base64 .

 The printable characters in  Base64  include letters AZ , az , and numbers  0-9+ , so there are 62 characters in total, and the two printable symbols are different (  and  ) in different systems /,还有=.

Base64 is an index encoding, and each character corresponds to an index. The specific relationship diagram is as follows

Encoding

Since 64 is equal to 2 to the 6th power, a Base64 character actually represents 6 binary bits (bit).
However, 1 byte of binary data corresponds to 8 bits. Therefore, 3 bytes (3 x 8 = 24 bits) of string/binary data can be converted into 4 Base64 characters (4 x 6 = 24 bits).
Why is it a group of 3 bytes? Because the least common multiple of 6 and 8 is 24, 24 bits are exactly 3 bytes.

The specific encoding method:

  1. Treat each 3 bytes as a group, 3 bytes with a total of 24 binary bits

  2. Divide these 24 bits into 4 groups of 6 bits each

  3. Add two 00s in front of each group of 6 binary digits to expand to 32 binary digits, that is, four bytes

  4. Each byte corresponds to a number less than 64, which is the character number

  5. According to the character index relationship table, each character number corresponds to a character, and the Base64 encoded character is obtained

 

 The character string in the above figure  'you', after conversion, is encoded as:  'eW91'.

increase in size

We can see that when 3 characters are encoded with Base64, they finally become 4 characters. Because each 6-bit is filled with 2 0s, it becomes 8-bit, corresponding to 1 byte.
This is exactly one-third more, so under normal circumstances, the volume of Base64-encoded data is usually one-third larger than the volume of the original data .

= equal sign

3 English characters can be converted into 4 Base64 characters. So if the character length is not a multiple of 3, what kind of rules should be used?
In fact, it is also simple. When we actually use Base encoding, we often find that there is a 65th character, which is a  '=' symbol. This equal sign is a processing method for this special situation.
For places less than 3 bytes, 0 will be added at the end until there are 24 binary bits.
But it should be noted that when calculating the number of bytes, the total length will be directly divided by 3. If the remainder is 1, one will be added at the end, and =if the remainder is 2, two will be added =.
Therefore, the transcoded string needs to be supplemented with a suffix equal sign, either 1 or 2, as shown in the figure below for details:

The second one in the figure uses a single character  'd'to distinguish the index 0 in the index character table. At this time, in the obtained code, there will be an A character corresponding to index 0, '='but  directly add 2 characters.

demo

package com.cjian.security;


import com.sun.org.apache.xerces.internal.impl.dv.util.Base64;

import java.security.SecureRandom;

/**
 * @Author: cjian
 * @Date: 2022/11/9 17:09
 * @Des:
 */
public class Base64Demo {
    public static Base64 base64 = new Base64();

    public static void main(String[] args) {
        String man = base64.encode("you".getBytes());
        System.out.println("you的base64结果:"+man);

        SecureRandom secureRandom = new SecureRandom();
        byte[] randomBytes = new byte[16];
        secureRandom.nextBytes(randomBytes);

        String str = new String(randomBytes);
        System.out.println("原值:" + str);
        //问题来了,长度发生了变化
        //如果转string和获取字节的时候指定ISO-8859-1就没有问题
        System.out.println("原值转byte长度:"+str.getBytes().length);

        String r = base64.encode(randomBytes);
        System.out.println("base64后:" + r);

        String str2 = new String(base64.decode(r));
        System.out.println("base64编码:" + str2);
        System.out.println("base64解码后byte长度:" + base64.decode(r).length);
    }

}

output: 

you的base64结果:eW91
原值:�;Ķp�K�n�ώ�|/
原值转byte长度:26
base64后:1DvEtnCSS55uFMPPjqp8Lw==
base64编码:�;Ķp�K�n�ώ�|/
base64解码后byte长度:16

Guess you like

Origin blog.csdn.net/cj_eryue/article/details/127774474