URL encoding and parsing

1. What is a URL?

URL (Uniform Resource Locator, Uniform Resource Locator) is the address of a standard resource on the Internet. Each file (ie resource) on the Internet has a unique URL, which contains information such as the location of the file and the processing method of the browser.

URL standard format
Generally speaking, the common definition format of the URL we are familiar with is:

scheme://host[:port#]/path/.../[;url-params][?query-string][#anchor]

Scheme: There are http, https, ftp that we are very familiar with, as well as the famous ed2k, Thunder's thunder, etc.
host: IP address or domain name of the HTTP server
port: The default port of the HTTP server is 80, and the port number can be omitted in this case. If another port is used, it must be specified. For example, the default port of tomcat is 8080 http://localhost:8080/
path: the path to access resources
url-params: the parameter
query-string: the data sent to the http server
anchor: Anchor positioning

2. URI sum URL

Many people confuse these two terms.

URL: (abbreviation for Uniform/Universal Resource Locator, Uniform Resource Locator).
URI: (abbreviation for Uniform Resource Identifier, Uniform Resource Identifier).
Relationships:
URI is a lower-level abstraction of URL, a standard for string literals.
That is, URIs belong to the parent class, and URLs belong to the subclass of URI. URLs are a subset of URIs.
The difference between the two is that URI represents the path of the request server and defines such a resource. The URL also describes how to access this resource (http://).

3. URL and encoding and parsing

Why URL encoding? Usually if something needs to be encoded, it means that such a thing is not suitable for direct transmission.

1. It will cause ambiguity: for example, key=value is used in the URL parameter string to pass parameters, and the key-value pairs are separated by & symbols, such as ?postid=5038412&t=1450591802326. & and = parse the parameters. If the value string contains = or &, such as Procter & Gamble’s abbreviation P&G, if it needs to be passed as a parameter, then the parameter in the URL may be like this?name=P&G&t=1450591802326 , because an extra & in the parameter will inevitably cause parsing errors on the server receiving the URL, so the ambiguous & and = symbols must be escaped, that is, encoded.

2. Illegal characters: Another example is that the encoding format of the URL is ASCII instead of Unicode, which means that you cannot include any non-ASCII characters in the URL, such as Chinese. Otherwise, if the character sets supported by the client browser and the server browser are different, Chinese may cause problems.

Next, introduce URL encoding and the corresponding parsing method

encodeURI() and decodeURI()

encodeURI() is the function in Javascript that actually encodes a URL. It looks at encoding the entire URL.

encodeURI("https://blog.csdn.net/CYL_2021/some other thing")
//https://blog.csdn.net/CYL_2021/some%20other%20thing

From the above encoding results, it can be seen that spaces will be replaced by %20, but this method will not encode ASCII letters, numbers, ~ ! @ # $ & * ( ) = : / , ; ? + '.

decodeURI() decode

decodeURI("https://blog.csdn.net/CYL_2021/some%20other%20thing")
//https://blog.csdn.net/CYL_2021/some other thing

encodeURIComponent()和decodeURIComponent()

Our URL looks like this, with another URL in the request parameter:

var URL = "http://www.a.com?foo=http://www.b.com?t=123&s=456";

It is obviously not possible to encodeURI directly to it. Because encodeURI will not escape the colon: and slash /, then there will be ambiguity in parsing after the server receives it as mentioned above.

encodeURI(URL)
// "http://www.a.com?foo=http://www.b.com?t=123&b=456"

At this time, encodeURIComponent() should be used. Its role is to encode the parameters in the URL, remember to encode the parameters, not the entire URL.
Because it just doesn't encode ASCII letters, numbers ~ ! * ( ) '.
Incorrect usage:

var URL = "http://www.a.com?foo=http://www.b.com?t=123&s=456";
encodeURIComponent(URL);
// "http%3A%2F%2Fwww.a.com%3Ffoo%3Dhttp%3A%2F%2Fwww.b.com%3Ft%3D123%26s%3D456"

Wrong usage, see that the colon and slash of the first http are also encoded
Correct usage: encodeURIComponent() focuses on encoding a single parameter:

var param = "http://www.b.com?t=123&s=456"; // 要被编码的参数
URL = "http://www.a.com?foo="+encodeURIComponent(param);
//"http://www.a.com?foo=http%3A%2F%2Fwww.b.com%3Ft%3D123%26s%3D456"

decodeURIComponent() decode

decodeURIComponent(URL)
//http://www.a.com?foo=http://www.b.com?t=123&s=456

4. Application: parse URL Params into objects

let url = 'http://www.domain.com/?user=anonymous&id=123&id=456&city=%E5%8C%97%E4%BA%AC&enabled';
parseParam(url)
/* 结果
{ user: 'anonymous',
id: [ 123, 456 ], // 重复出现的 key 要组装成数组,能被转成数字的就转成数字类型
city: '北京', // 中文需解码
enabled: true, // 未指定值得 key 约定为 true
}
*/
function parseParam(url){
    
    
    const paramsStr=/.+\?(.+)$/.exec(url)[1];//将?后面的字符串取出来
    const paramsArr=paramsStr.split('&');// 将字符串以 & 分割后存到数组中
    let paramsObj={
    
    };//将params存到对象中
    paramsArr.forEach(param=>{
    
    
        if(/=/.test(param)){
    
     // 处理有 value 的参数
            let [key,val]=param.split('='); // 分割 key 和 value
            val=decodeURIComponent(val);// 解码
            val=/^\d+&/.test(val)?parseFloat(val):val;// 判断是否转为数字
            if(paramsObj.hasOwnProperty(key)){
    
    // 如果对象有 key,则添加一个值
                paramsObj[key]=[].concat(paramsObj[key],val);

            }else{
    
    // 如果对象没有这个 key,创建 key 并设置值
                paramsObj[key] = val;
            }
        }else{
    
     // 处理没有 value 的参数
            paramsObj[param] = true;
        }
    })
    return paramsObj;
}

Reference article: https://www.cnblogs.com/coco1s/p/5038412.html

Guess you like

Origin blog.csdn.net/CYL_2021/article/details/130042085
Recommended