java中的file.encoding

在windows server 2008上运行一个java程序,发现其系统字符集为“Cp1252”:

/* java.net. */ Socket Sock = ...;
InputStreamReader is = new InputStreamReader(Sock.getInputStream());
System.out.println("Character encoding = " + is.getEncoding());
// Prints "Character encoding = Cp1252"

通过chcp命令查看系统字符集是936:

C:\>chcp
活动代码页: 936

应用程序接收到字节 0x81, 在936字符集中是 ü. 但java程序中的系统字符集是1252,而在这个字符集中不含这个字符。

可以在java启动参数中增加字符集设置,来临时解决这个问题:

java.exe -Dfile.encoding=Cp850 ...
但为什么java中的系统字符集和windows中的活动代码页不一样呢?查看windows中的codepage定义如下:
Based on the usage, the codepage supported in Windows can be categorized in the following:

    ANSI codepage

    ANSI codepages are codepages for which non-ASCII values (values greater than 127) represent international characters.<1>

    Windows codepages are also sometimes referred to as active codepages or system active codepages. Windows always has one currently active Windows codepage. All ANSI Windows functions use the currently active codepage.

    The usual ANSI codepage ID for US English is codepage 1252.

    Windows codepage 1252, the codepage commonly used for English and other Western European languages, was based on an American National Standards Institute (ANSI) draft. That draft eventually became ISO 8859-1, but Windows codepage 1252 was implemented before the standard became final, and is not exactly the same as ISO 8859-1.

    OEM codepage

    Original equipment manufacturer (OEM) codepages are codepages for which non-ASCII values represent line drawing and punctuation characters. These codepages are still used for console applications. They are also used for the non-extended file names in the FAT12, FAT16, and FAT32 file systems. The usual OEM codepage ID for US English is codepage 437.

    Extended codepage

    These codepages cannot be used as ANSI codepages, or OEM codepages. Windows can support conversions between Unicode and these codepages. These codepages are generally used for information exchange purpose with international/national standard or legacy systems. Examples are UTF-8, UTF-7, EBCDIC, and Macintosh codepages.
 
难道java读取的是ANSI codepage?儿不是OEM codepage ?
通一个java程序通过cmd启动时,编码是cp936。通过windows服务启动时,编码是Cp1252。由此可见,console application使用OEM codepage。windows服务使用ANSI codepage。
在那里设置ANSI codepage呢?
控制面板--区域和语言--管理--非Unicode程序的语言
 
注册表:
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Nls\CodePage\ACP

参考文章:

http://blog.csdn.net/is2120/article/details/26708895

http://stackoverflow.com/questions/1336930/how-do-you-specify-a-java-file-encoding-value-consistent-with-the-underlying-wind

http://stackoverflow.com/questions/1826771/encoding-cp1252

https://en.wikipedia.org/wiki/Windows_code_page#ANSI_code_page

http://www.360doc.com/content/11/0316/14/5482098_101636415.shtml

猜你喜欢

转载自weifly.iteye.com/blog/2299196
今日推荐