Base64 online conversion tool


1. Why did you do this project? (Background of the project)

There are 256 types of byte in the computer, namely ascii code table, and the value between 128 and 255 of ascii code is invisible characters. For some protocols that only support visible characters, such as the mail transfer protocol (SMTP), only visible characters are supported. If you want to transfer binary files, such as pictures and videos, it is impossible to transfer the ASCII characters. Therefore, there is the Base64 encoding format. The Base64 encoding format can be converted into displayable characters for all binary format data.

  1. When data is exchanged on the network, for example, from A to B, it often passes through multiple routing devices. Because different devices handle characters differently, those invisible characters may be processed incorrectly. Not conducive to transmission.
  2. http first uses the key-value field in it, which must be url-encoded, otherwise the equal sign or space may be a failure to parse
  3. Some text protocols do not support the transmission of visible characters, such as Simple Mail Transfer Protocol (SMTP)
  4. Simple pictures embedded in web pages

2. What kind of functions can this project achieve? (Project Objectives)

Realize an online base64 conversion tool that supports reversible base64 conversion of text and base64 conversion of pictures.

3. What framework is used to realize the project? (Project Framework)

Insert picture description here

Fourth, how is it realized? (Implementation)

1. Base64 algorithm + implementation
2. Simple front-end knowledge: html + CSS + javascript + ajax
3. Page build
4. Build http server ----> Use httplib library: only need to include header files
5. Front-end and back-end transfer

1. Encoding/decoding algorithm module

The base64 encoding is called base64 because it uses 64 characters to encode binary data

A~Z,a~z,0~9,+ / 一共64个
从0到63:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
 = 用来不够补位

1.1 Coding process

For the binary bit stream given by the user, a group of three and three bytes are encoded
3 bytes -------->base64 --------->4 bytes that can be displayed
24 One bit---->base64 ---------->The 24 bits are equally divided into 4 groups, each group of 6 bits---->turn into 4 words Section
Insert picture description here

For example, when the
string length is divisible by 3,
use base64 encoding: man
Insert picture description here

When the string length is not divisible by 3,
use base64 encoding: manm
Insert picture description here

When the number of bytes to be encoded is not divisible by 3, one or two bytes will be added at the end. First, 0 bytes will be added at the end to make it divisible by 3, then base64 will be encoded, and finally the result will be encoded. Add one or two = signs after it, which represents the number of bytes to be supplemented.
For example, if 4 bytes are transmitted, if one extra byte is added at this time, two bytes of 0 (16 bits of 0) must be added, and then coded, and two = signs are added after the final coded result.

Note: Base64 can display up to 76 visible characters in one line.
Defect: Base64 converts three bytes into four bytes, so the text after Base64 encoding will be about one-third larger than the original text

Detailed coding ideas

1. Group the characters into three groups: ch1, ch2, ch2
2. Key operation The
number of bytes that can be divisible by 3:
ch1: Take the high 6 digits directly -----> ch1>>2 - ---->For example, M(01001101), shift two bits to the right------>00010011 ------>Convert to decimal 19 ----->Check base64 table: T
will be left at the end of ch1 Two bits, spliced ​​with the upper 4 bits of ch2, and spliced ​​into 6 bits -------> (ch1 << 4 | ch2 >> 4) & 0x3F, and 0x3F is equivalent to clearing after bitwise AND The first two digits, that is, add 0 to the first two digits.
M(01001101) is shifted to the left by 4 digits --------->11010000
a(01100001) is shifted to the right by 4 digits ---------- ->00000110
| 11010110 after splicing
& 0x3F after 00010110 --------convert to decimal 22 ----->Check base64 table: W
ch2: low 4 bits left, splicing with high 2 bits of ch3 , Spliced ​​into 6 bits: (ch2 << 2 | ch3 >> 6) &0x3F
a(01100001) shifted to the left by 2 bits ----------->10000100
n(01101110) shifted to the right by 6 bits After -----------> 00000001 After
splicing 10000101
& 0x3F After 00000101--------Convert to decimal 5 ----->Check base64 table: F
ch3: The lower 6 bits are left , Become a group, add two 0s in front: (ch3 & 0x3F)
n(01101110) & 0x3F-----------> 00101110--------Convert to decimal 46 ----->Check base64 table: u
finally get base64 encoding: TWFu

The number of bytes not divisible by 3:
Complement 0 of the required number of bytes. For example, if 4 bytes are missing two bytes are divisible by 3, then complement 2 bytes of 0. There are cases where there is one more byte, and there are cases where there are two more bytes. The basic operation is the same as the above idea.

Encoding module code

//编码模块
std::string Base64::Encode(const std::string& strData)//编码模块
{
	std::string strEncode;
	std::string strEncodeTable("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"); //Base64编码表
	unsigned char temp[4];
	size_t index = 0;
	size_t lineLength = 0;//记录base64每行的字节数
	for (size_t i = 0; i < strData.size()/3; ++i)
	{
		//3个字符为一组进行base64编码转换
		temp[1] = strData[index++];
		temp[2] = strData[index++];
		temp[3] = strData[index++];

		//转换后每个字节都由 补位的00 + 拼接后的6位 组成
		//取第一个字节的高6位
		strEncode += strEncodeTable[temp[1] >> 2];

		//取第一个字节的低2位 + 第二字节的高四位拼接
		strEncode += strEncodeTable[((temp[1] << 4) | (temp[2] >> 4)) & 0x3F];

		//取第二个字节的低4位 + 第三个字节的高2位拼接
		strEncode += strEncodeTable[((temp[2] << 2) | (temp[3] >> 6)) & 0x3F];

		//取第三个字节的低6位
		strEncode += strEncodeTable[temp[3] & 0x3F];

		lineLength += 4;
		if (lineLength == 76)//base64一行只能显示76的字节
		{
			strEncode += "\r\n";
			lineLength = 0;
		}
	}

	size_t mod = strData.size() % 3;//查看传入数据字节数多出3的倍数几个字节
	if (mod == 1)//如果多出3的倍数一个字节,补两个 = 符号
	{
		temp[1] = strData[index++];
		//取多出来的这个字节的高6位
		strEncode += strEncodeTable[temp[1] >> 2];
		//取多出来的这个字节的低2位 + 补4位的0进行拼接
		strEncode += strEncodeTable[(temp[1] & 0x03) << 4];
		//最后补两个 = 符号
		strEncode += "==";
	}
	else if (mod == 2)//如果多出3的倍数两个字节,补一个 = 符号
	{
		temp[1] = strData[index++];
		temp[2] = strData[index++];
		//取多出的第一个字节的高6位
		strEncode += strEncodeTable[temp[1] >> 2];
		//取多出的第一个字节的低2位 + 多出的第二个字节的高4位拼接
		strEncode += strEncodeTable[((temp[1] << 4) | (temp[2] >> 4)) & 0x3F];
		//取多出的第二个字节的低4位 + 补2位的0进行拼接
		strEncode += strEncodeTable[(temp[2] & 0x0F) << 2];
		//最后补一个 = 符号
		strEncode += "=";
	}
	return strEncode;
}

1.2 Decoding process

1. Get each character ch after base64 encoding.
2. Get the subscript of ch in the encoding table.
3. The subscript is the 6 bits corresponding to the original character
. After 4 groups are parsed, 24 bits are formed. Then divide the 24 bits into 3 bytes. The
decoding process is to reverse the encoding. Taking aGVsbG8= as an example, the four bytes are decoded as a group. The first four decoding processes are shown in the figure, and finally all decoded as hello
Insert picture description here

//快速解码表
const char DecodeTable[] =
{
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
62, // '+'
0, 0, 0,
63, // '/'
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9'
0, 0, 0, 0, 0, 0, 0,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z'
0, 0, 0, 0, 0, 0,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z'
};

Decoding detailed ideas

For example, decode
aGVsbG8= int nvalue = 0 nvalue: 00000000 00000000 00000000 00000000 in
a group of four to decode aGVs:
DecodeTable['a'] ---->DecodeTable[97] = 26
--------> 26 (011010) is stored in the position of the 18th to 23rd bits of a 32-bit int: (26 << 18) | nvalue-------> 00000000 01101000 00000000 00000000
DecodeTable['G'] ---- >DecodeTable[71] = 6 -------->6(000110)
-------->6(000110) is stored in the 12th to 17th bit position of a 32-bit int type: ( 6 << 12) | nvalue-------> 00000000 01101000 01100000 00000000
DecodeTable['V'] ---->DecodeTable[86] = 21 -------->21(010101)
- ------->21(010101) is stored in the position of the 06~11 bits of a 32-bit int type: (21 << 6) | nvalue-------> 00000000 01101000 01100101 01000000
DecodeTable[ 's'] ---->DecodeTable[115] = 44 -------->44(101100)
-------->44(101100) is stored in the position of the 00~05th bit of a 32-bit int type: (44) | nvalue-------> 00000000 01101000 01100101 01101100
and then the nvalue from Starting from the 23rd bit, eight bits and eight bits are read and a total of three bytes are taken to obtain the decoded data: hel
00000000 01101000 01100101 01101100 ------->h (01101000) e (01100101) l (01101100)
bG8=same as the decoding process above, the difference is that it ends when it encounters'=' during decoding, the decoding result: lo
aGVsbG8= the decoding result is: hello

Encoding module code

//解码模块
std::string Base64::Decode(const std::string& strData)//解码模块
{
	std::string strDecode;
	//快速解码表
	const char DecodeTable[] =
	{
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	62, // '+'
	0, 0, 0,
	63, // '/'
	52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9'
	0, 0, 0, 0, 0, 0, 0,
	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
	13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z'
	0, 0, 0, 0, 0, 0,
	26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
	39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z'
	};
	size_t value = 0; //保存解码的4组6个比特位(总共24个比特位)
	size_t index = 0;
	while (index < strData.size())
	{
		//编码时一行只能放置76个字符,超过76个字符会放到下一行
		if (strData[index] != '\r' && strData[index + 1] != '\n')//说明一行还没解码完毕
		{
			//解析第一个编码
			value = DecodeTable[strData[index++]] << 18;
			//解析第二个编码
			value = (DecodeTable[strData[index++]] << 12) | value;
			strDecode += ((value >> 16) & 0xFF);//存储第一个解码后的数据16-23位

			if (strData[index] != '=')
			{
				//解析第三个编码
				value = (DecodeTable[strData[index++]] << 6) | value;
				strDecode += ((value >> 8) & 0xFF);//存储第二个解码后的数据08-15位

				if (strData[index] != '=')
				{
					//解析第四个编码
					value = (DecodeTable[strData[index++]]) | value;
					strDecode += (value & 0xFF);//存储第三个解码后的数据00-07位
				}
			}
			else
			{
				break;
			}
		}
		else
		{
			//解码到该行的末尾了
			index += 2; //跳过 \r\n
		}
	}
	return strDecode;
}

(To be continued)

Guess you like

Origin blog.csdn.net/NanlinW/article/details/113480369