[Java-based character problem] The difference and connection between UTF-8 and GBK

Basic concepts:

1 Character set A
character set is a collection of encodings, such as unicode, GBK, GB2312, etc. are all character sets.

2 Encoding
Encoding is the representation of characters. For example, the unicode character set can be encoded with UTF-8, UTF-16 and UTF-32 .

Analysis of the core problem of GBK and UTF-8 conversion.
GBK is a character set, and UTF-8 is an encoding, so the conversion problem between GBK and UTF-8 that we usually study is actually the conversion problem between GBK and Unicode character set, because GBK and Unicode characters (mainly discussed There is no necessary connection between Chinese characters), so the conversion between GBK and Unicode is usually realized by looking up the table. After completing the conversion of GBK and Unicode encoding, the remaining work is how to express Unicode in the form of UTF-8.

Figure out the encoding problem, the next thing to do is how to perform encoding conversion? Under the linux platform, there is the iconv() function available, so how to deal with it under the Windows platform? In fact, there are many processing methods, such as Windows API / IBM ICU4C , etc. The author recommends using the iconv() function under the windows platform, because compared to using Windows API, the iconv library can be easily cross-platform; compared with IBM's ICU4C , The iconv library is much smaller.

Here are some links to the iconv library under Windows:
(1) The compiled library used by MinGW can be used directly.
      http://sourceforge.net/projects/mingw/files/MinGW/Base/libiconv/libiconv-1.14-2/

(2) The GNU open source source code needs to be compiled into the corresponding dynamic library or static library.
      http://www.gnu.org/software/libiconv/
      This iconv source code needs to be compiled by yourself. The process of compiling with MinGW and MSYS under Windows is given below.
       (a) Install the autoconf tool
         (b) Execute the following commands in sequence (compile as a static library)
                  ./configure --prefix=/home --enable-static --disable-shared
                   make
                   make install
                   Then after compilation, in the /home directory You will see the compiled output.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324389267&siteId=291194637