Document basics

This study note mainly talks about my understanding of text file character encoding and binary value encoding. I didn't understand the difference in class, so I had to collect some information by myself to get a superficial understanding of hahahaha.


Preface

1. In order to save the data and make it easy to modify, the data is usually stored in the form of files in external storage media such as disks.
2. Simple character texts to complex word documents, static images to multimedia videos, from desktop databases to complex network databases, Information is stored on disk in the form of files


1. Classification

Regardless of the type of files, they are stored in the memory or disk in binary code , that is, the physical level is consistent. Therefore, the classification is based on the difference in logical level coding.

1. Text file

Based on character encoding , such as ASCI code, Unicode code, etc. The text file stores ordinary strings, which can be directly displayed and edited with a text editor such as Notepad.

2. Binary files

Binary files are coded based on values ​​and are stored in the form of byte strings. The length of the code is variable according to the size of the value. The encoding length of the value is usually defined in the relevant attributes of the file header. Binary files cannot be displayed or edited with a text editor, such as sound, image and other files.

3. The difference between the two-my understanding

1. Binary file, when saving data, the memory data is directly saved byte by byte. When reading, after reading a byte by byte into the memory, a specification is required , that is, the value- based encoding mentioned earlier is uncertain, and the computer needs to tell you how many bytes are combined to represent numbers or letters.
2. For text files, when saving data, the data is converted from ASCII code to binary code and saved one byte by one byte. When reading, convert each byte read according to the ASCII code table.
3. The difference lies in the interpretation of the content by the program that opens the file.
It can be simply considered that it can be opened with a text editor, and shows that we can read files, which can be regarded as text files, txt, html, etc. The encoding of these files conforms to a certain text encoding specification. If not, open it If you see garbled characters, you can think it is a binary file. For example, when you open a file with the extension txt, or although the extension is not txt, but you use a text editor to force it to open, the text editor will think that it is a text file, and then use the rules corresponding to the text file to "Translate" these binary sequences. If it is a text file, we can understand it. If not, the parsed code is garbled
. 4. Then, for the common picture format JPEG, audio format, video format, they cannot open with a text editor or open garbled, they are binary files, and it is necessary to use common specifications Coding requires computer specific software processing and analysis to be displayed.


to sum up

The reason why we can understand the content of the text file is because the text file uses ASCII, UTF-8, GBK and other character encodings. The text editor can recognize these encoding formats and convert the encoded values ​​into characters for display. For binary files, the text editor cannot recognize the encoding format of these files, and can only parse it randomly according to the character encoding format, so what you see in the end is a bunch of garbled codes.

In fact, after finishing writing, it turns out that it is so easy to understand! !
It’s not difficult at all, but yesterday I was really confused hahahaha

Guess you like

Origin blog.csdn.net/m0_50316716/article/details/109219314