使用过滤器Filter解决tomcat默认编码配置下的传值乱码

最近又拿起jsp+servlet来做项目，遇到了乱码的问题。解决方案很多，这次决定用过滤器来解决传值过程中的乱码问题。当然前提是tomcat没修改过connector配置项中的URIEncoding，即使用tomcat默认的编码配置。
在表单中使用“method”这个属性来指定提交表单时使用的http请求方式。默认是Get方式。
使用Get方式，提交的参数放在请求的url后面，浏览器会对url进行转义。所以，在服务端要用以下代码取得正确的参数，“utf-8”是我使用的页面字符编码。

String str = new String(request.getParameter("str").getBytes("iso-8859-1"),"utf-8");

如果使用的是Post方式，则提交的参数将放在请求的消息体中。这种情况下，只要指定字符串编码就可以了。

request.setCharacterEncoding(encode);

好，下面说一下我使用过滤器解决传值乱码的例子。
首先，写个继承HttpServletRequestWrapper的类，扩展一下servlet提供的request的功能。

	public class Request extends HttpServletRequestWrapper {
     private String encode;
		public Request(HttpServletRequest request, String encode) {
         super(request);
         this.encode = encode;
		}

		public String toChi(String para) {
			try {
				byte[] bytes = para.getBytes("iso-8859-1");
				return new String(bytes, encode);
			} catch (Exception ex) {
			}
			return null;
		}
		
		private HttpServletRequest getHttpServletRequest() {
			return (HttpServletRequest) super.getRequest();
		}

		public String getParameter(String name) {
			return toChi(getHttpServletRequest().getParameter(name));
		}

		public String[] getParameterValues(String name) {
			String values[] = getHttpServletRequest().getParameterValues(name);
			if (values != null) {
				for (int i = 0; i < values.length; i++) {
					values[i] = toChi(values[i]);
				}
			}
			return values;
		}
	}

在上面的类中，我们其实就完成了一件事。在我们使用getParameter()，getParameterValues()取得参数前，先对参数进行了我们之前所说到的第一个动作。

接着，编写一个过滤器。

public class EncodeFilter implements Filter{
	private String encode;
	public void destroy() {
	}

	public void doFilter(ServletRequest request, ServletResponse response,
			FilterChain chain) throws IOException, ServletException {
		HttpServletRequest httpreq = (HttpServletRequest) request;
		if (httpreq.getMethod().equals("POST")) {
			request.setCharacterEncoding(encode);
		} else {
			request = new Request(httpreq, encode);
		}
		chain.doFilter(request, response);
	}

	public void init(FilterConfig filterConfig) throws ServletException {
		encode = filterConfig.getInitParameter("encode");
	}
}

这里，我们使用过滤器完成了两个动作。首先，如果请求是Post方式的，则request.setCharacterEncoding(encode)；如果请求是Get方式的，则使用我们进行了扩展的Request代替原来的Request，这样就做到了对参数进行转码再取值。
接下来，在web.xml中要配置我们的这个过滤器。

	<filter>
		<filter-name>encodeFilter</filter-name>
		<filter-class>com.ieread.search.filter.EncodeFilter</filter-class>
		<init-param>
			<param-name>encode</param-name>
			<param-value>utf-8</param-value>
		</init-param>
	</filter>
	<filter-mapping>
		<filter-name>encodeFilter</filter-name>
		<url-pattern>/*</url-pattern>
	</filter-mapping>

在Filter中有个encode参数，用来指定我们所使用的字符编码。

--------------------------------------------事情还有后续----------------------------------------

本来以为问题到此就圆满解决了。谁知道，又发现个问题。怎么描述呢？写个例子吧！

http://www.baidu.com/s?wd=
这是百度的一个搜索地址，你在wd后面随便填，百度都能正确解码。
http://localhost:8015/search/search?userLevel=41001&keyword=
这是我的，在keyword后面随便填，chrome上面也可以正确解码，但是ie和火狐不行。

纠结了一个下午，跟朋友是各种讨论。终于发现了问题所在。

引用

浏览器发送请求时,不管有没有把一个 byte 的数据编成 %xx 的形式,这都不是导致乱码的问题。
问题是出在如果没有预先按某个字符集编成字节,再把各字节转成 %xx 形式的话, 浏览器在发出请求时,一个汉字,是按 GBK 发送 2byte 的数据, 还是按UTF-8发送 3byte 的数据呢
各个浏览器间,不统一。
服务器在接收字节数据后,如果解码使用的字符集与客户端发送的不一致,就导致乱码。

好，知道问题的所在就可以对症下药了。在我们后台对参数进行解码前，如果能得知参数的编码格式就可以进行正确的解码了。以下，是使用正则表达式对参数进行编码的判断。

Pattern p = Pattern.compile("^(?:[\\x00-\\x7f]|[\\xfc-\\xff]" +
		"[\\x80-\\xbf]{5}|[\\xf8-\\xfb][\\x80-\\xbf]{4}|" +
		"[\\xf0-\\xf7][\\x80-\\xbf]{3}|[\\xe0-\\xef][\\x80-\\xbf]" +
		"{2}|[\\xc0-\\xdf][\\x80-\\xbf])+$");
Matcher ma = p.matcher(para);
if(!ma.find()){
	encode = "gbk";
}else{
	encode = "utf-8";
}

在对字符串进行解码前，用正则表达式先进行一下encode的判断。
OK，问题解决~

以上，就是我个人使用过滤器解决tomcat默认编码配置下传值乱码的一个例子。如果有说错误导人的地方，可以在留言中说下。

使用过滤器Filter解决tomcat默认编码配置下的传值乱码

猜你喜欢