URLDecoder异常Illegal hex characters in escape (%)

最近在公司系统上进行二次开发，遇到这样的一个错误URLDecoder异常Illegal hex characters in escape (%) ...，先来介绍下这个开发的结构，首先有一个提供服务的组件系统，暂且叫系统A，然后有一个对外提供访问的系统，就叫它系统B，而我要开发的功能是，从系统B上调用系统A的组件服务，实现下载Word文档到系统B对应的浏览器上。大概是这样的：

其中发送请求的时候呢，就报了这个异常，那么来介绍一下为什么会这样：如果收到的HTTP请求参数（URL中的GET请求）中有一个字符串，是中文，比如“10%是黄段子”，服务器段使用URLDecoder.decode就会出现此异常。URL只能使用英文字母、阿拉伯数字和某些标点符号，不能使用其他文字和符号。如果内容中存在中文，必须要进行编解码。“10%是黄段子”转码过后是“10%25%E6%98%AF%E9%BB%84%E6%AE%B5%E5%AD%90%”被用来作为转义字符使用。下面上代码：

public static String decode(String s, String enc)  throws UnsupportedEncodingException{

    boolean needToChange = false;
    int numChars = s.length();
    StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars);
    int i = 0;

    if (enc.length() == 0) {
        throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter");
    }

    char c;
    byte[] bytes = null;
    while (i < numChars) {
        c = s.charAt(i);
        switch (c) {
        case '+':
            sb.append(' ');
            i++;
            needToChange = true;
            break;
        case '%':
            /*
             * Starting with this instance of %, process all
             * consecutive substrings of the form %xy. Each
             * substring %xy will yield a byte. Convert all
             * consecutive  bytes obtained this way to whatever
             * character(s) they represent in the provided
             * encoding.
             */

            try {

                // (numChars-i)/3 is an upper bound for the number
                // of remaining bytes
                if (bytes == null)
                    bytes = new byte[(numChars-i)/3];
                int pos = 0;

                while ( ((i+2) < numChars) &&
                        (c=='%')) {
                    int v = Integer.parseInt(s.substring(i+1,i+3),16);
                    if (v < 0)
                        throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");
                    bytes[pos++] = (byte) v;
                    i+= 3;
                    if (i < numChars)
                        c = s.charAt(i);
                }

                // A trailing, incomplete byte encoding such as
                // "%x" will cause an exception to be thrown

                if ((i < numChars) && (c=='%'))
                    throw new IllegalArgumentException(
                     "URLDecoder: Incomplete trailing escape (%) pattern");

                sb.append(new String(bytes, 0, pos, enc));
            } catch (NumberFormatException e) {
                throw new IllegalArgumentException(
                "URLDecoder: Illegal hex characters in escape (%) pattern - "
                + e.getMessage());
            }
            needToChange = true;
            break;
        default:
            sb.append(c);
            i++;
            break;
        }
    }

    return (needToChange? sb.toString() : s);
}

上面的字符串中'%'是一个中文字符'是'，而转换的实现是将%后面的两个字符一起转为一个16进制数。拿"%是"来转换数字，肯定会有NumberFormatException异常。

类似的如果请求字符串中有'+'，也会有问题。因为'+'被当做空格使用了。一个解决办法就是将%替换为%25。

public static String replacer(StringBuffer outBuffer) {
  String data = outBuffer.toString();
  try {
     data = data.replaceAll("%(?![0-9a-fA-F]{2})", "%25");
     data = data.replaceAll("\\+", "%2B");
     data = URLDecoder.decode(data, "utf-8");
  } catch (Exception e) {
     e.printStackTrace();
  }
  return data;
}

这里使用了一个特殊正则表达式：零宽负向先行断言(zero-widthnegative lookahead assertion)，模式为(?!pattern)，代表字符串中的一个位置，紧接该位置之后的字符序列不能匹配pattern。%(?![0-9a-fA-F]{2})意思是'%'开始，但是后面两个字符不是数字，也不是字母。

参考:

1. 关于URL编码

2. 正则表达式的先行断言和后行断言

3. URLDecoder:Illegal hex characters in escape (%) pattern - For input string: “</”

最后，这个是老系统要求使用get提交，所以出现了这样的问题，如果可以还是使用ajax的post提交比较好。

URLDecoder异常Illegal hex characters in escape (%)

猜你喜欢