eucCN是什么

2009-05-13 01:56:01来源：未知阅读 (1)

在FreeBSD 4.11中，国际化、本地化支持中有个设置locale的细节，对于简体中文环境，很多的资料都推荐使用zh_CN.eucCN这个locale，但是却没有说为什么要使用这个环境，我很好奇，也想搞清楚这个eucCN到底是一种什么样的编码，系统要怎样处理 eucCN 和 UTF-8 之间的转换问题。

首先说说这个EUC

我找到了一个英文文档（中文文档很少有介绍EUC）

EUC
EUC stands for Extended Unix Code. It is a multibyte encoding standard developed by AT&T and supported on all System V implementations used to represent large Asian characters sets. There are several variants, two of them are for Chinese.
It defines both a fixed length and variable length encoding. It's a 8 bit coding method
If codeset 0 is ASCII, then the EUC codeset is ASCII transparent. Often this is the local version of ASCII. The rules for describing a legal EUC codeset. These rules are the following:
1) Each character of an EUC multibyte string is chosen from among four distinct multibyte codesets (0,1,2,and 3).
2) Codeset 0 must be a 7bit codeset.
3) No multibyte character of Codeset 1 will use either SS2 or SS3 as its first byte.
4) Characters from codeset 2 will be preceded by the byte SS2.
5) Characters from codeset 3 will be preceded by the byte SS3.
6) For codesets 1, 2, and 3, every byte of every character must have the eighth bit set.

EUC是Unix环境下的一种扩展编码体系，采用8位字节编码，支持固定字节和可变字节的编码，广泛使用在Unix环境下对多国语言的支持，目前对于中文环境，有两个编码集。Unix在推广EUC的时候是充分的借鉴，吸纳多字节编码国家的编码集本地化实现，两种中文编码集基本上都是吸纳了简体和繁体的编码集，很关键的一点就是它不是UTF-8，Unix系统（FreeBSD）要利用NLS机制，进行多字节和系统库使用的宽字符之间的转换，确切的说是euc和UTF-8的转换。

eucCN是什么

EUC-TW

codeset 0 : ASCII
codeset 1 : CNS 11643-1992 plane 1
codeset 2 : CNS 11643-1992 plane 2 - 16
codeset 3 : [not used]

EUC-CN

codeset 0 : ASCII
codeset 1 : GB 2312-80
codeset 2 : [not used]
codeset 3 : [not used]

从上面看来，eucCN就是GB2312,在FreeBSD 4.11中，已经不存在GB2312这个locale，eucCN就是GB2312，使用8位的两字节编码。

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u/12258/showart_64289.html

zh_CN.EUC是UNIX下GB2312的一种叫法

eucCN是什么

猜你喜欢