URL percent-encoding

What percent-encoding is!

Percent-encoding (Percent-Encoding) is also known as URL encoded, an encoding mechanism. This mechanism is mainly used in URI encoding, URI includes URL and URN, they are also applicable. In addition, also for the content of the MIME type "application / x-www-form-urlencoded" a.

url codec , also known as percent-encoding, is the Uniform Resource Locator (URL) encoding. URL address (URL used to say) sets out common ground numbers, letters can be used directly, another group of users as a special character can be directly used (/,: @, etc.), all other remaining characters must be processed through% xx encoding. It has now become a norm, and basically all programming languages have this coding, such as js: There encodeURI, encodeURIComponent, PHP has urlencode, urldecode and so on. Coding method is very simple, in front of the byte hexadecimal characters plus ascii code such as a space character%, ascii code is 32, corresponding to a hex '20', then the result of the coding is urlencode: 20%

 foo://example.com:8042/over/there?name=ferret#nose  
   \_/  \______________/ \________/\_________/ \__/ 
     |                 |                        |                    |             | 
scheme     authority               path             query      fragment

URI is a Uniform Resource Identifier mean a usually what we call Url only URI. Url typical format as shown above. Url coding mentioned below, should actually refers to the URI encoding.

Why Url coding

Usually if something needs to be encoded, indicating that such things are not suitable for transmission. A variety of reasons, such as Size is too large, contain private data, for Url for the reason to be encoded, because some of the characters Url cause ambiguity .

E.g. Url parameter string used in key = value form such a key-transfer parameters, to the key-value pair & delimited between, such as / s? Q = abc & ie = utf-8. If your string contains the value or & =, it is bound to cause the server to receive Url parse error, it must be ambiguous and & symbol = escape, that is encoded.

Again, Url encoding format is used in ASCII code, rather than Unicode, which means that you can not contain any non-ASCII characters in the Url, such as Chinese. Otherwise, if the client browser and server browsers support different character sets, the Chinese can cause problems.

Url coding principle is safe to use characters (no special purpose or special significance printable character) to represent characters that are unsafe.

Which characters need to be encoded

RFC3986 document specifies, Url only allowed to contain letters (a-zA-Z), numbers (0-9), -_. 1-4 and special characters all reserved characters.

US-ASCII character set does not correspond to printable characters

Url only allows the use of printable characters. US-ASCII codes 10-7F byte represents all control characters, these characters can not appear directly in the Url. Meanwhile, for the 80-FF byte (ISO-8859-1), since the byte beyond the scope of US-ACII-defined, nor can be placed in Url.

 

Blank Url in the process of transmission, or the user in the process of publishing, text processing program or in the course of processing Url, there may introduce insignificant white spaces, or spaces that make sense to get rid of
Quotes and <> Quotes and angle brackets spaced generally play a role for the general text Url
# Typically used to represent a bookmark or anchor
% Percent sign itself as a special character to use when unsafe characters are encoded, and therefore need to be coded itself
{}|\^[]`~ Some of gateway or transfer agent will be tampering with these characters

Note that, for legal characters, coding and non-coding Url is equivalent, but for these characters mentioned above, if not encoded, they may cause different Url semantics. So for Url, only the ordinary English characters and numbers, special characters $ -_. +! * '() Also reserved characters to appear in uncoded Url . Other characters are required after encoding to appear in the Url.

Url Encoding list of common characters:

 

Reserved Url encoded characters
! * " ' ( ) ; : @ &
%21 %2A %22 %27 %28 %29 %3B %3A %40 %26
= + $ , / ? % # [ ]
%3D %2B %24 %2C %2F %3F %25 %23 %5B %5D

Reference: https://www.cnblogs.com/leaven/archive/2012/07/12/2588746.html

 

 

Guess you like

Origin www.cnblogs.com/niuyaomin/p/11788732.html
Recommended