由ftp4j导致的中文乱码问题的解决方法

由ftp4j导致的中文乱码问题的解决方法

本文来自:http://www.vktone.com/articles/ftp4j_chaos.html

ftp4j是一个FTP客户端Java类库,实现了FTP客户端应具有的大部分功能。可以用来传输文件(包括上传和下 载),浏览远程FTP服务器上的目录和文件,创建、删除、改名、移动远程目录和文件。ftp4j提供多种方式连接到远程FTP服务器包括:通过 TCP/IP直连,通过FTP代理、HTTP代理、SOCKS4/4a代理和SOCKS5代理连接、通过SSL安全连接等。

在本站的博客系统后台管理程序中就使用了ftp4j作为博客上传至空间的组件,但在开始使用的时候,发现将静态网页上传至空间后,用浏览器打开是乱码(惨不忍睹),要手动修改成UTF-8编码才能正常浏览,实际上所有网页是采用GBK编码的。

首页的html开头部分如下:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" lang="zh-CN">
<head>
    <title>移动互联网技术及应用 - V客小站 - V客小站</title>

使用Linux下的hexdump工具查看如下,注意黄色背景的部分:

[root@localhost blog_wwwroot]# hexdump -C index.html | head -100
00000000  3c 21 44 4f 43 54 59 50  45 20 68 74 6d 6c 20 50  |<!DOCTYPE html P|
00000010  55 42 4c 49 43 20 22 2d  2f 2f 57 33 43 2f 2f 44  |UBLIC "-//W3C//D|
00000020  54 44 20 58 48 54 4d 4c  20 31 2e 30 20 54 72 61  |TD XHTML 1.0 Tra|
00000030  6e 73 69 74 69 6f 6e 61  6c 2f 2f 45 4e 22 20 22  |nsitional//EN" "|
00000040  68 74 74 70 3a 2f 2f 77  77 77 2e 77 33 2e 6f 72  |http://www.w3.or|
00000050  67 2f 54 52 2f 78 68 74  6d 6c 31 2f 44 54 44 2f  |g/TR/xhtml1/DTD/|
00000060  78 68 74 6d 6c 31 2d 74  72 61 6e 73 69 74 69 6f  |xhtml1-transitio|
00000070  6e 61 6c 2e 64 74 64 22  3e 0a 3c 68 74 6d 6c 20  |nal.dtd">.<html |
00000080  78 6d 6c 6e 73 3d 22 68  74 74 70 3a 2f 2f 77 77  |xmlns="http://ww|
00000090  77 2e 77 33 2e 6f 72 67  2f 31 39 39 39 2f 78 68  |w.w3.org/1999/xh|
000000a0  74 6d 6c 22 20 78 6d 6c  3a 6c 61 6e 67 3d 22 7a  |tml" xml:lang="z|
000000b0  68 2d 43 4e 22 20 6c 61  6e 67 3d 22 7a 68 2d 43  |h-CN" lang="zh-C|
000000c0  4e 22 3e 0a 3c 68 65 61  64 3e 0a 09 3c 74 69 74  |N">.<head>..<tit|
000000d0  6c 65 3e d2 c6 b6 af bb  a5 c1 aa cd f8 bc bc ca  |le>.............|
000000e0  f5 bc b0 d3 a6 d3 c3 20  2d 20 56 bf cd d0 a1 d5  |....... - V.....|
000000f0  be 20 2d 20 56 bf cd d0  a1 d5 be 3c 2f 74 69 74  |. - V......</tit|
00000100  6c 65 3e 0a 09 3c 6d 65  74 61 20 68 74 74 70 2d  |le>..<meta http-|

下面是从空间下载的文件内容,其中黄色背景的部分与上面对应:(很显然,是不一样的)

[root@localhost ~]# curl -s http://www.vktone.com/ | hexdump -C index.html | head -100 00000000  3c 21 44 4f 43 54 59 50  45 20 68 74 6d 6c 20 50  |<!DOCTYPE html P| 00000010  55 42 4c 49 43 20 22 2d  2f 2f 57 33 43 2f 2f 44  |UBLIC "-//W3C//D| 00000020  54 44 20 58 48 54 4d 4c  20 31 2e 30 20 54 72 61  |TD XHTML 1.0 Tra| 00000030  6e 73 69 74 69 6f 6e 61  6c 2f 2f 45 4e 22 20 22  |nsitional//EN" "| 00000040  68 74 74 70 3a 2f 2f 77  77 77 2e 77 33 2e 6f 72  |http://www.w3.or| 00000050  67 2f 54 52 2f 78 68 74  6d 6c 31 2f 44 54 44 2f  |g/TR/xhtml1/DTD/| 00000060  78 68 74 6d 6c 31 2d 74  72 61 6e 73 69 74 69 6f  |xhtml1-transitio| 00000070  6e 61 6c 2e 64 74 64 22  3e 0a 3c 68 74 6d 6c 20  |nal.dtd">.<html | 00000080  78 6d 6c 6e 73 3d 22 68  74 74 70 3a 2f 2f 77 77  |xmlns="http://ww| 00000090  77 2e 77 33 2e 6f 72 67  2f 31 39 39 39 2f 78 68  |w.w3.org/1999/xh| 000000a0  74 6d 6c 22 20 78 6d 6c  3a 6c 61 6e 67 3d 22 7a  |tml" xml:lang="z| 000000b0  68 2d 43 4e 22 20 6c 61  6e 67 3d 22 7a 68 2d 43  |h-CN" lang="zh-C| 000000c0  4e 22 3e 0a 3c 68 65 61  64 3e 0a 09 3c 74 69 74  |N">.<head>..<tit| 000000d0  6c 65 3e e7 bb 89 e8 af  b2 e5 a7 a9 e6 b5 9c e6  |le>.............| 000000e0  8e 95 e4 bb 88 e7 bc 83  e6 88 9e e5 a6 a7 e9 8f  |................| 000000f0  88 ee 88 9a e5 bc b7 e6  90 b4 e6 97 82 e6 95 a4  |................| 00000100  20 2d 20 56 e7 80 b9 e3  88 a0 e7 9a ac e7 bb 94  | - V............| 00000110  ef bf bd 20 2d 20 56 e7  80 b9 e3 88 a0 e7 9a ac  |... - V.........| 00000120  e7 bb 94 ef bf bd 2f 74  69 74 6c 65 3e 0a 09 3c  |....../title>..<|

仔细想了一下,我本地文件的编码为GBK,而用浏览器浏览空间里的文件需要改成UTF-8编码才对,这应该是ftp4j在上传文件时进行了编码转换(或者它默认把文件当成UTF-8编码的)。为了避免ftp4j自动转换,应该采用binary方式传送,在上传之前进行设置即可。 client.setType(FTPClient.TYPE_BINARY); client.upload(filename); 下面的文字是从ftp4j的官方文档中摘录下来的,说明了ftp4j对文本和二进制传输方式的处理方式: http://www.sauronsoftware.it/projects/ftp4j/manual.php?PHPSESSID=l7o2bb276feu51v4p8cih0sq81#16 Another data transfer key concept concerns the binary and the textual types. When a transfer is binary the file is treated as a binary stream, and it is stored by the target machine as it is received from the source. A textual data transfer, instead, treats the transferred file as a character stream, performing charset transformation. Suppose your client is running on a Windows platform, while the server runs on UNIX, whose default charsets are usually different. The client send a file to the server selecting textual type. The client assumes that the file is encoded with the machine standard charset, so it decodes every character and encodes it in an intermediate charset before sending. The server receives the stream, decode the intermediate charset and encodes the file with its machine default charset before storing. Bytes has been changed, but contents are the same. You can choose your transfer type calling: client.setType(FTPClient.TYPE_TEXTUAL);
client.setType(FTPClient.TYPE_BINARY);
client.setType(FTPClient.TYPE_AUTO);

The TYPE_AUTO constant, which is also the default one, let the client pick the type automatically: a textual transfer will be performed if the extension of the file is between the ones the client recognizes as textual type markers. File extensions are sniffed through a FTPTextualExtensionRecognizer (it.sauronsoftware.ftp4j.FTPTextualExtensionRecognizer) instance. The default extension recognizer, which is an instance of it.sauronsoftware.ftp4j.recognizers.DefaultTextualExtensionRecognizer, recognizes these extensions as textual ones:

abc acgi aip asm asp c c cc cc com conf cpp
csh css cxx def el etx f f f77 f90 f90 flx
for for g h h hh hh hlb htc htm html htmls
htt htx idc jav jav java java js ksh list
log lsp lst lsx m m mar mcf p pas php pl pl
pm py rexx rt rt rtf rtx s scm scm sdml sgm
sgm sgml sgml sh shtml shtml spc ssi talk
tcl tcsh text tsv txt uil uni unis uri uris
uu uue vcs wml wmls wsc xml zsh

You can build your own recognizer implementing the FTPTextualExtensionRecognizer interface, but maybe you'll like more to instance the convenience class ParametricTextualExtensionRecognizer (it.sauronsoftware.ftp4j.recognizers.ParametricTextualExtensionRecognizer). Anyway, don't forget to plug your recognizer in the client:

client.setTextualExtensionRecognizer(myRecognizer);

本文来自:http://www.vktone.com/articles/ftp4j_chaos.html

扫描二维码关注公众号,回复: 794078 查看本文章

更多精彩内容请访问:http://www.vktone.com/

猜你喜欢

转载自coding1688.iteye.com/blog/1711572