Several common coding

coding

Coding information format or converted from one form to another form of the process.
Decoding, encoding the reverse process.

Common types are: ASCII, GB2312, GBK, Unicode, UTF-8

ASCII code

All the information in the computer are ultimately in binary form. I.e., binary 0 and binary 1,8 is the one byte, there can be 2 ** 8 = 256 kinds, it is imagined 00000000-11111111.
Encoding a predetermined total ASCII code 128 characters, such as capital letter A is 65 (binary 01000001) 97 is a lower case letter. The 128 symbols (including 32 control symbols can not be printed out), only it takes a byte 7 behind the foremost one uniform predetermined zero.

GB2312

GB2312 (1980 years) a total of 7,445 characters, including 6,763 Chinese and 682 other symbols. The range of the high byte of the code characters from the region B0-F7, the low byte from the A1-FE, the code bit occupies 72 * 94 = 6768. There are five vacancies D7FA-D7FE. GB2312-80 CCP contains 7545 characters, two-byte encoding a character. Each character the most significant bit is zero. Referred to encode GB2312-80 GB code.
GB2312 Chinese characters too little support. 1995 Character Extension Specification GBK1.0 containing 21,886 symbols, it is divided into graphic symbols and characters District area. Character area includes 21,003 characters. Then came the GBK, expansion GB2312.

GBK

Full name "Chinese Internal Code Specification", the State Bureau of Technical Supervision for the new Chinese characters within the code developed by windows95 specification, which appears to extend GB2312, adding more characters, its encoding range is 8140 ~ FEFE ( removing XX7F) a total of 23,940 yards bits, it can represent 21,003 characters, and its encoding is compatible GB2312, GB2312 that is encoded by GBK characters can be decoded, and no distortion.

Unicode

Unicode character set, ISO established ISO / IEC JTC1 / SC2 / WG2 working group in April 1984, carried out for national unity encoded text, symbols. In 1991 American companies established Unicode Consortium, and in October 1991 an agreement with WG2 use the same code word set. Unicode is currently using 16-bit coding system, its content and character set of BMP ISO10646 (Basic Multilingual Plane) the same. Unicode in June 1992 by the DIS (Draf International Standard), the current version V2.0 published in 1996, the content includes 6811, 20,902 Chinese characters, Korean alphabet 11172, 6400-defined area, reserved 20249, 65534 total a. The size of the Unicode encoding is the same. For example an English letter "a" and a Chinese character "good", the codes are the same amount of space, are two bytes!

UTF-8

Popularity of the Internet, urged a unified coding appears. UTF-8 is the most widely used implementation using a unicode on the Internet. Other implementations also include UTF-16 and UTF-32, but the basic need on the Internet. Repeat, here is the relationship, UTF-8 Unicode is one of implementation.
UTF-8 biggest feature is that it is a variable length encoding. It can be 1 to 4 bytes of one symbol, byte length varies depending on the symbol.

Published 12 original articles · won praise 8 · views 317

Guess you like

Origin blog.csdn.net/qq_45309297/article/details/102992470