Three common mail content delivery encodings

1. ASCII code

       ASCII codes use a specified combination of 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters. Standard ASCII, also called Basic ASCII, uses 7 binary digits (the remaining 1 binary 0 is 0) to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation, and special controls used in American English character. in:

0~31 and 127 (33 in total) are control characters or special communication characters (the rest are displayable characters ), such as

Control characters: LF (line feed), CR (carriage return), FF (form feed), DEL (delete), BS (backspace), BEL (bell), etc.;

Special characters for communication: SOH (header), EOT (tail), ACK (acknowledgment), etc.;

ASCII values ​​of 8, 9, 10, and 13 are converted to backspace, tab, linefeed, and carriage return characters, respectively . They do not have a specific graphic display, but will have different effects on the text display depending on the application.

32 to 126 (95 in total) are characters (32 is a space), of which 48 to 57 are ten Arabic numerals from 0 to 9.

65 to 90 are 26 uppercase English letters, 97 to 122 are 26 lowercase English letters, and the rest are some punctuation marks, operation symbols, etc.

       Also note that in standard ASCII, its most significant bit (b7) is used as the parity bit. The so-called parity check refers to a method used to check whether there is an error during the code transmission process, and is generally divided into two types: odd check and even check. Odd check regulation: the number of 1s in a byte of the correct code must be an odd number, if not, add 1 to the highest bit b7; even check regulation: the number of 1s in a byte of the correct code must be an even number , if it is not even, add 1 to the highest bit b7.

       The last 128 are called extended ASCII codes. Many x86-based systems support the use of extended (or "high") ASCII. Extended ASCII allows the 8th bit of each character to be used to determine an additional 128 special-symbol characters, foreign-language letters, and graphic symbols.

二、quoted-printable

       This encoding method is suitable for only a small amount of non-ASCII codes in the transmitted data, such as Chinese characters. The point of this encoding method is that for all printable ASCII codes , except for the special character equals sign "=", it does not change. The encoding method of the equal sign "=" and non-printable ASCII and non- ASCII data is to first represent the binary code of each byte with two hexadecimal digits, and then add an equal sign in front "=". E.g:

The binary code of the " system " of Chinese characters is: 11001111 10110101 11001101 10110011 (32 bits in total, but these four bytes are not ASCII codes)

Its hexadecimal number representation is: CF B5 CD B3

The queted-printable encoding is expressed as: = CF=B5=CD=B3, these 12 characters are all printable ASCII characters, their binary encoding requires 96 bits, and the overhead is 200% compared with the original 32 bits.

The binary code of the equal sign "=" is 00111101, which is 3D in hexadecimal, so the queted-printable encoding of the equal sign "=" is "=3D".

Third, base64 encoding

       This encoding method first divides the binary code into 24-bit units, and then divides each 24-bit unit into 4 6-bit groups. Each 6-bit group is converted into ASCII code as follows.

       The 6-bit binary code has a total of 64 different values, from 0 to 63. Use A for 0, B for 1, and so on. After 26 uppercase letters are arranged, 26 lowercase letters are arranged next, followed by 10 numbers, and finally "+" means 62, and "/" means 63. Then use two consecutive equal signs "==" and an equal sign to indicate that the code of the last group is only 8 bits or 16 bits, respectively. Carriage returns and line feeds are ignored, they can be inserted anywhere.

       Here is an example of base64 encoding:

       24位二进制代码       01001001   00110001   01111001

       划分为4个6位组       010010         010011      000101        111001

       对应的base64编码           S                 T               F                5

       用ASCII编码发送       01010011   01010100   01000110   00110101            

24位的二进制代码采用base64编码后变成了32位,开销为25%。


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325210630&siteId=291194637