Detailed coding software development

Foreword 
I think, for software developers, "code" is certainly no stranger to the concept, even to say "regular contact" in the process we write code, "the coding problem" is a programmer us helpless and headaches .

"Coding problem" is not difficult to solve, but the principle, I believe that many programmers are plausible, then the next we take a look into this issue up.

Computer code 
computer code refers to a recording mode on behalf of the computer's internal data of letters or numbers.

Why is there coding? We know, from a computer by the electronic components are stored, and because of limitations of industrial technology, the electronic components can only record two stable states "on" and "off", represented by numbers, and a is 0. That, in essence, the computer can record only two numbers 0 and 1. Each 0 or 1, which we call a 'bit, the smallest unit of computer. This is only the digits 0 and 1, which we call binary digits.

But obviously, we need to record a lot of things, so only two digits will not do so three together to represent a digital bit, there is octal. 4 bit together to represent a single digit, there is a hex.

Digital issue is resolved, however, if you want to store a character 'a', but it can not be done in the computer. To solve this problem, people wanted a solution: put all the usual characters are uniform numbers, such as' a 'number is 97, so that when we need to store' a ', when not directly store' a ', but digital storage 97, when out of the 97 then becomes' a', so the perfect solution to this problem.

And we usually call "code" is the number of these characters.

All characters and their number corresponds to the form, which we call "code table."

Common coding table: ASCII coding, GB2312 encoding (Simplified Chinese), GBK, BIG5 encoding (Traditional Chinese), utf-8 encoding, etc.

ASCII coding:
computer at the beginning of creation, popular in the "Western world" or "English-speaking countries", open view, the language of the Western world, text, etc., at best, is 26 letters plus some symbols, even though English sub-case letters, and never more than 128, each character represented by one byte, is sufficient. Use a byte to represent a character of this encoding is the earliest: ASCII encoding.

Note 1: The basic unit of a byte is composed of eight bit of the composition, represents the range: -128 --- 127

Note 2: coding no negative

Note 3: ASCII does not support Chinese.

 

 

 

GBK Code:
Later, with the popularity of computers, the whole world needs to use a computer to store data, if used ASCII coding certainly not (can not store characters other than Roman alphabet), and a prescribed byte ASCII encoded representation of a this provision characters obviously can not be applied to the entire world (Chinese characters have at least a few thousand bar). Therefore, all countries of the ASCII coding was expanded from the original one byte to represent a character, converted to multiple bytes to represent a character.

For example, China, GB2312, GBK two coding format, two bytes are used to represent a character. Of course, two bytes can be represented on a large, almost all the Chinese characters can be included.

Note: No matter what encoding, but the character and ASCII encoding within the first range of 0-127 are represented by exactly the same.

UTF-8 encoding:
Of course, if the world uses its own code, and that the exchange between the state and the country is more trouble, such as would have you here is a tribute to the meaning, to the other side where, because of the different encoding, parsing out a curse meaning, it would not do. Therefore, in order to solve this problem, by an organization called the Unicode Consortium, developed a set of encoding rules -Unicode coding. This rule supports over 650 kinds of languages in the world. Is a world-wide character rules.

UTF-8 is an international coding table introduced in this rule.

UTF-8 is an international coding table, support Chinese encoding. In the code table Chinese generally occupies 3 bytes

Note 1: Unicode encoding is not a code table, but a coding rule. UTF-8 is the encoding table, UTF-8 is specified out of the code table in accordance with such a rule

Note 2: UTF-8 in Chinese is not all three bytes, commonly occupy three bytes, some special, very rare word may occupy six bytes.

 

Coding issues:
development, so-called "coding problems", in fact, is the emergence of Chinese garbled. Why is there such a problem?

We Chinese people are generally used Chinese operating system, while the Chinese operating system default encoding format is GBK. And international, in order to be able to understand the world, the general use UTF-8 encoding. (International sites usually UTF-8 encoding)

GBK encoding, a character generally occupies 2 bytes.

UTF-8 encoding, a character generally occupies 3 bytes.

 

If you are: UTF-8 -> GBK

 

 

 

 

If it is: GBK -> UTF-8

 

 

Solve coding problems:
a "coding problem" is nothing more than the others we have resolved to our characters when encoding used wrong, if we get is GBK, GBK on the use of analytical, If you get a UTF-8 it is to use UTF-8 parsed, so not solved it?

So, if you are experiencing Chinese garbled string appears:

1. Chinese distortion rewrite string broken, becomes bytes.

2. String constructor String (byte [] bytes, String charsetName) recombinant string.

 

For example: UTF-8 analytical

 

 

Note: Chinese garbled because only when we resolve to bytes, assembly mistakes, similar to play with blocks of time, wrong place, but did not change the nature of bytes.

Guess you like

Origin blog.51cto.com/14473726/2440994