Analysis of Base64 encoding

Overview of Base64 encoding

Baidu Encyclopedia has a good explanation of Base64: "Base64 is one of the most common encoding methods used to transmit 8Bit bytecode on the Internet. Base64 is a method of representing binary data based on 64 printable characters" . It is actually a "binary to text" encoding.

What are "printable characters"? Why use it to transmit 8Bit bytecode? Before answering these two questions, it is necessary for us to think about when we need to use Base64? Base64 is generally used to transmit binary data under the HTTP protocol. Since the HTTP protocol is a text protocol, it is necessary to convert binary data into character data when transmitting binary data under the HTTP protocol. However, a direct conversion is not possible. Because network transmission can only transmit printable characters . What are printable characters? According to the ASCII code, 33 characters of 0-31 and 127 belong to control characters, and 95 characters of 32-126 belong to printable characters, that is to say, network transmission can only transmit these 95 characters, characters not in this range Unable to transfer. So how can other characters be transmitted? One way is to use Base64.

(1) base64 encoding: convert binary data into characters

(2) base64 decoding: convert characters to binary data

In fact, Base64 is to convert the non-printable characters of the ASCII code table into printable characters for character transmission. When decoding, it is also decoded according to the same protocol, and the same result as the original can be obtained, so that non-printable characters can also be transmitted. .

Principle of Base64

direction chart

In different implementations, the character set consisting of 64 characters in the Base64 algorithm is different. But the usual implementation method is to choose 64 common and printable characters to form such a set. And it is necessary to ensure that the data composed of each character in this set will not be modified in the data transmission system.
As shown in the Base64 encoding index table as shown in the figure below, 64 printable characters of "AZ, az, 0-9, +, /" are selected as characters. The number represents the character index, which is stipulated by the standard Base64 standard protocol and cannot be changed. All 64 bytes can be represented by 6 bits . Note here that a Base64 character has 8 bits, but the effective part only has 6 bits on the right, and the two on the left are always 0.
insert image description here

The standard Base64 is not suitable for direct transmission in the URL, because the URL encoder will change the "/" and "+" characters in the standard Base64 into the form of "%XX", and these "%" signs exist It needs to be converted again when entering the database, because the "%" sign has been used as a wildcard in ANSI SQL, an improved Base64 encoding for URL, which fills the '=' sign at the end, and the "+" in the standard Base64 " and "/" were changed to "-" and "_" respectively, thus eliminating the need for conversion during URL encoding and decoding and database storage, avoiding the increase in the length of encoded information during this process, and unifying the Format of object identifiers in databases, forms, etc.

How to convert?

So how to use 6 effective bits to represent 8 bits of traditional characters? The least common multiple of 8 and 6 is 24, that is to say, 3 traditional bytes can be represented by 4 Base64 characters, and the effective digits are guaranteed to be the same, so that 1/3 more bytes are added to make up for Base64 only having 6 Insufficient effective bits. You can also say that two Base64 characters can also represent a traditional character, but the least common multiple scheme is actually the least wasteful. It is easier to understand with the diagram below. Man is three characters, a total of 24 effective bits, so we have to use 4 Base64 characters to make up 24 effective bits. The red box indicates the corresponding Base64. The 6 valid bits are converted into corresponding index values ​​and then correspond to the Base64 character table. It is found that the Base64 character corresponding to "Man" is "TWFU". Speaking of this, there is a principle. I don’t know if you have discovered it. The smallest unit to be converted into Base64 is three bytes. For a string, it is a conversion of three bytes and three bytes each time. The corresponding is Four bytes of Base64. This is actually almost clear.
insert image description here
But at the end of the conversion, what should you do if you find that there are not enough three bytes? We can use two Base64 to represent a character or use three Base64 to represent two characters, as shown in the figure below, the second Base64 corresponding to A has only two binary digits, just fill the last four with 0. So the Base64 character corresponding to A is QQ. As mentioned above, the principle is that the minimum unit of a Base64 character is a group of four characters, so there are only two characters, and two "=" should be added after it. In fact, there is no delay in decoding without using "=". The reason for using "=" may be to consider that the multi-segment encoded Base64 strings will not cause confusion. It can be seen that only one or two "=" may appear at the end of the Base64 string, and "=" cannot appear in the middle. The encoding process of the character "BC" in the figure below is the same.
insert image description here
Therefore, through such Base64 conversion, we can get such a conclusion: the ratio of output to input of Base64 is 4:3. In particular, when the input is n bytes (1 byte equals 8 bits), the output will be (4/3)*n bytes.

Java practice

public class Test {
    
    
    public static void main(String[] args) throws UnsupportedEncodingException {
    
    
        String encode = Base64.getEncoder().encodeToString("Son".getBytes("UTF-8"));
        System.out.println(encode);
        // 解码
        byte[] decode = Base64.getDecoder().decode(encode);
        System.out.println(new String(decode, "UTF-8"));
    }
}

Java code implements Base64

import java.io.UnsupportedEncodingException;
/**
 * @PROJECT_NAME: demo
 * @DESCRIPTION:
 */
public class base64 {
    
    
    static private final int BASELENGTH = 255;
    static private final int LOOKUPLENGTH = 64;
    static private final int TWENTYFOURBITGROUP = 24;
    static private final int EIGHTBIT = 8;
    static private final int SIXTEENBIT = 16;
    static private final int SIXBIT = 6;
    static private final int FOURBYTE = 4;
    static private final int SIGN = -128;
    static private final byte PAD = (byte) '=';
    static private byte[] base64Alphabet = new byte[BASELENGTH];
    static private byte[] lookUpBase64Alphabet = new byte[LOOKUPLENGTH];

    static {
    
    
        for (int i = 0; i < BASELENGTH; i++) {
    
    
            base64Alphabet[i] = -1;
        }
        for (int i = 'Z'; i >= 'A'; i--) {
    
    
            base64Alphabet[i] = (byte) (i - 'A');
        }
        for (int i = 'z'; i >= 'a'; i--) {
    
    
            base64Alphabet[i] = (byte) (i - 'a' + 26);
        }
        for (int i = '9'; i >= '0'; i--) {
    
    
            base64Alphabet[i] = (byte) (i - '0' + 52);
        }

        base64Alphabet['+'] = 62;
        base64Alphabet['/'] = 63;

        for (int i = 0; i <= 25; i++)
            lookUpBase64Alphabet[i] = (byte) ('A' + i);

        for (int i = 26, j = 0; i <= 51; i++, j++)
            lookUpBase64Alphabet[i] = (byte) ('a' + j);

        for (int i = 52, j = 0; i <= 61; i++, j++)
            lookUpBase64Alphabet[i] = (byte) ('0' + j);

        lookUpBase64Alphabet[62] = (byte) '+';
        lookUpBase64Alphabet[63] = (byte) '/';
    }

    public static boolean isBase64(String isValidString) {
    
    
        return isArrayByteBase64(isValidString.getBytes());
    }

    public static boolean isBase64(byte octect) {
    
    
        return (octect == PAD || base64Alphabet[octect] != -1);
    }

    public static boolean isArrayByteBase64(byte[] arrayOctect) {
    
    
        int length = arrayOctect.length;
        if (length == 0) {
    
    
            // shouldn't a 0 length array be valid base64 data?
            // return false;
            return true;
        }
        for (int i = 0; i < length; i++) {
    
    
            if (!base64.isBase64(arrayOctect[i])) {
    
    
                return false;
            }
        }
        return true;
    }

    public static String encode(String str) {
    
    
        if (str == null)
            return "";
        try {
    
    
            byte[] b = str.getBytes("UTF-8");
            return new String(encode(b), "UTF-8");
        } catch (UnsupportedEncodingException e) {
    
    
            return "";
        }
    }

    public static byte[] encodeStr2Byte(String str) {
    
    
        if (str == null)
            return null;
        try {
    
    
            byte[] b = str.getBytes("UTF-8");
            return encode(b);
        } catch (UnsupportedEncodingException e) {
    
    
            return null;
        }
    }

    public static String encodeByte2Str(byte[] bytes) {
    
    
        if (bytes == null)
            return "";
        try {
    
    
            return new String(encode(bytes), "UTF-8");
        } catch (UnsupportedEncodingException e) {
    
    
            return null;
        }
    }


    /**
     * Encodes hex octects into Base64.
     *
     * @param binaryData Array containing binary data to encode.
     * @return Base64-encoded data.
     */
    public static byte[] encode(byte[] binaryData) {
    
    
        int lengthDataBits = binaryData.length * EIGHTBIT;
        int fewerThan24bits = lengthDataBits % TWENTYFOURBITGROUP;
        int numberTriplets = lengthDataBits / TWENTYFOURBITGROUP;
        byte encodedData[] = null;

        if (fewerThan24bits != 0) {
    
    
            //data not divisible by 24 bit
            encodedData = new byte[(numberTriplets + 1) * 4];
        } else {
    
    
            // 24 bit 
            encodedData = new byte[numberTriplets * 4];
        }

        byte k = 0, l = 0, b1 = 0, b2 = 0, b3 = 0;

        int encodedIndex = 0;
        int dataIndex = 0;
        int i = 0;

        for (i = 0; i < numberTriplets; i++) {
    
    
            dataIndex = i * 3;
            b1 = binaryData[dataIndex];
            b2 = binaryData[dataIndex + 1];
            b3 = binaryData[dataIndex + 2];

            l = (byte) (b2 & 0x0f);
            k = (byte) (b1 & 0x03);

            encodedIndex = i * 4;
            byte val1 =
                    ((b1 & SIGN) == 0)
                            ? (byte) (b1 >> 2)
                            : (byte) ((b1) >> 2 ^ 0xc0);
            byte val2 =
                    ((b2 & SIGN) == 0)
                            ? (byte) (b2 >> 4)
                            : (byte) ((b2) >> 4 ^ 0xf0);
            byte val3 =
                    ((b3 & SIGN) == 0)
                            ? (byte) (b3 >> 6)
                            : (byte) ((b3) >> 6 ^ 0xfc);

            encodedData[encodedIndex] = lookUpBase64Alphabet[val1];
            encodedData[encodedIndex + 1] =
                    lookUpBase64Alphabet[val2 | (k << 4)];
            encodedData[encodedIndex + 2] =
                    lookUpBase64Alphabet[(l << 2) | val3];
            encodedData[encodedIndex + 3] = lookUpBase64Alphabet[b3 & 0x3f];
        }

        // form integral number of 6-bit groups
        dataIndex = i * 3;
        encodedIndex = i * 4;
        if (fewerThan24bits == EIGHTBIT) {
    
    
            b1 = binaryData[dataIndex];
            k = (byte) (b1 & 0x03);
            byte val1 =
                    ((b1 & SIGN) == 0)
                            ? (byte) (b1 >> 2)
                            : (byte) ((b1) >> 2 ^ 0xc0);
            encodedData[encodedIndex] = lookUpBase64Alphabet[val1];
            encodedData[encodedIndex + 1] = lookUpBase64Alphabet[k << 4];
            encodedData[encodedIndex + 2] = PAD;
            encodedData[encodedIndex + 3] = PAD;
        } else if (fewerThan24bits == SIXTEENBIT) {
    
    

            b1 = binaryData[dataIndex];
            b2 = binaryData[dataIndex + 1];
            l = (byte) (b2 & 0x0f);
            k = (byte) (b1 & 0x03);

            byte val1 =
                    ((b1 & SIGN) == 0)
                            ? (byte) (b1 >> 2)
                            : (byte) ((b1) >> 2 ^ 0xc0);
            byte val2 =
                    ((b2 & SIGN) == 0)
                            ? (byte) (b2 >> 4)
                            : (byte) ((b2) >> 4 ^ 0xf0);

            encodedData[encodedIndex] = lookUpBase64Alphabet[val1];
            encodedData[encodedIndex + 1] =
                    lookUpBase64Alphabet[val2 | (k << 4)];
            encodedData[encodedIndex + 2] = lookUpBase64Alphabet[l << 2];
            encodedData[encodedIndex + 3] = PAD;
        }

        return encodedData;
    }

    public static String decode(String str) {
    
    
        if (str == null)
            return "";
        try {
    
    
            byte[] b = str.getBytes("UTF-8");
            return new String(decode(b), "UTF-8");
        } catch (UnsupportedEncodingException e) {
    
    
            return "";
        }
    }

    public static byte[] decodeStr2Byte(String str) {
    
    
        if (str == null)
            return null;
        try {
    
    
            byte[] b = str.getBytes("UTF-8");
            return decode(b);
        } catch (UnsupportedEncodingException e) {
    
    
            return null;
        }
    }

    /**
     * Decodes Base64 data into octects
     *
     * @param binaryData Byte array containing Base64 data
     * @return Array containing decoded data.
     */
    public static byte[] decode(byte[] base64Data) {
    
    
        // handle the edge case, so we don't have to worry about it later
        if (base64Data.length == 0) {
    
    
            return new byte[0];
        }

        int numberQuadruple = base64Data.length / FOURBYTE;
        byte decodedData[] = null;
        byte b1 = 0, b2 = 0, b3 = 0, b4 = 0, marker0 = 0, marker1 = 0;

        // Throw away anything not in base64Data

        int encodedIndex = 0;
        int dataIndex = 0;
        {
    
    
            // this sizes the output array properly - rlw
            int lastData = base64Data.length;
            // ignore the '=' padding
            while (base64Data[lastData - 1] == PAD) {
    
    
                if (--lastData == 0) {
    
    
                    return new byte[0];
                }
            }
            decodedData = new byte[lastData - numberQuadruple];
        }

        for (int i = 0; i < numberQuadruple; i++) {
    
    
            dataIndex = i * 4;
            marker0 = base64Data[dataIndex + 2];
            marker1 = base64Data[dataIndex + 3];

            b1 = base64Alphabet[base64Data[dataIndex]];
            b2 = base64Alphabet[base64Data[dataIndex + 1]];

            if (marker0 != PAD && marker1 != PAD) {
    
    
                //No PAD e.g 3cQl
                b3 = base64Alphabet[marker0];
                b4 = base64Alphabet[marker1];
                decodedData[encodedIndex] = (byte) (b1 << 2 | b2 >> 4);
                decodedData[encodedIndex + 1] =
                        (byte) (((b2 & 0xf) << 4) | ((b3 >> 2) & 0xf));
                decodedData[encodedIndex + 2] = (byte) (b3 << 6 | b4);
            } else if (marker0 == PAD) {
    
    
                //Two PAD e.g. 3c[Pad][Pad]
                decodedData[encodedIndex] = (byte) (b1 << 2 | b2 >> 4);
            } else if (marker1 == PAD) {
    
    
                //One PAD e.g. 3cQ[Pad]
                b3 = base64Alphabet[marker0];
                decodedData[encodedIndex] = (byte) (b1 << 2 | b2 >> 4);
                decodedData[encodedIndex + 1] =
                        (byte) (((b2 & 0xf) << 4) | ((b3 >> 2) & 0xf));
            }
            encodedIndex += 3;
        }
        return decodedData;
    }

    public static void main(String[] args) {
    
    
        String s = "Son";
        System.out.println("原串: ");
        System.out.println(s);
        System.out.println("--------------------------------------------------");
        String r = encode(s);
        System.out.println("BASE64编码后: ");
        System.out.println(r);
        System.out.println("--------------------------------------------------");
        String decode = decode(r);
        System.out.println("BASE64解码后:");
        System.out.println(decode);
        System.out.println("--------------------------------------------------");


    }

}

reference article

What is the Base64 algorithm? ——The most detailed explanation
of BASE64 encoding and decoding in the whole network
What is Base64

Guess you like

Origin blog.csdn.net/doublepg13/article/details/128383734