x-www-form-urlencoded 陷阱

最近在对一个url执行签名验证的时候,发现这个签名验证偶尔成功偶尔失败。我始终相信一句话“程序是科学的”,所以带着这想法一直在找为什么这个概率性的事件会发生,之前一直以为是签名验证的地方有问题,经过多方核查这个确保是正常的。后来开始分析成功失败的url有什么规律,仔细看发现失败的url里有个加号(+),而成功的里面没有,当然2个url还有很多字符都不一样,按照大的字符集来看,差别在于几个特殊字符,例如+,/之类的,然后发现只要是有加号的必然验证失败。放狗之发现众人都遇到过此问题,后来用URLEncode一把尽然好了。到此才发现如此简单的问题,其实只需要encode一把就好了。但是原因呢。为什么我这里encode了,别人没有调用对应的decode也能正常了。那这个肯定是一种潜规则了。带着这个疑问。我go之http rfc 后来找寻到了html spec。好吧,原来恶果竟由此产生。找寻恶果之书,书中自有一番陈论如下:

application/x-www-form-urlencoded  

This is the default content type. Forms submitted with this content type must be encoded as follows:

   1. Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
   2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'.

找寻HTML: The Definitive Guide 对于encode描述如下:

The application/x-www-form-urlencoded encoding

The standard encoding-- application/x-www-form-urlencoded--converts any spaces in the form values to a plus sign (+), nonalphanumeric characters into a percent sign (%) followed by two hexadecimal digits that are the ASCII code of the character, and the line breaks in multiline form data into %0D%0A.

The standard encoding also includes a name for each field in the form. (A "field" is a discrete element in the form, whose value can be nearly anything, from a single number to several lines of text--the user's address, for example.) If there is more than one value in the field, the values are separated by ampersands.

For example, here's what the browser sends to the server after the user fills out a form with two input fields labeled name and address; the former field has just one line of text, while the latter field has several lines of input:

name=O'Reilly+and+Associates&address=103+Morris+Street%0D%0A
Sebastopol,%0D%0ACA+95472

We've broken the value into two lines for clarity in this book, but in reality, the browser sends the data in an unbroken string. The name field is "O'Reilly and Associates" and the value of the address field, complete with embedded newline characters, is:

103 Morris Street
Sebastopol,
CA 95472

 

空格会被编码为 +  ,那相反 + 也会被译码为 空格

参考自:

1:http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1

2:http://docstore.mik.ua/orelly/web/html/ch10_01.html

猜你喜欢

转载自dikar.iteye.com/blog/759823