URL encoding guidelines

Introduction to URL

URL is the abbreviation of Uniform Resource Locator. It is an identification method used to locate and access resources on the Internet.

URLs usually consist of the following components:

  1. Protocol: Indicate the protocol to be used, such as HTTP, HTTPS, FTP, etc.
  2. Host: Specify the name or IP address of the host or server where the resource is located.
  3. Port: Optional, specify the specific port number that provides resources on the server. If not provided, the default port of the protocol will be used by default.
  4. Path: Indicates the location of the resource on the server, which can be a file path or folder path.
  5. Query Parameters: Optional, used to pass additional information to the server to affect the rendering or operation of the resource.
  6. Fragment: Optional, specifies a specific part or fragment of the resource.
    Insert image description here

For example, here is a common URL example:
https://www.example.com:8080/myfolder/mypage.html?param1=value1¶m2=value2#section2

In the above example, the protocol is HTTPS, the hostname is www.example.com, the port number is 8080, the path is /myfolder/mypage.html, the query parameters are param1=value1 and param2=value2, and the fragment is section2.

URLs allow us to easily access and locate various resources on the Internet, such as web pages, images, videos, APIs, etc.

What is URL encoding?

URL encoding is the process of converting special and non-ASCII characters into a specific character encoding for transmission and processing in URLs.

In URL encoding, special characters and non-ASCII characters are converted into a format called percent-encoding. This encoding format uses a percent sign (%) plus two hexadecimal values ​​to represent the character encoding.

The purpose of URL encoding is to ensure that the URL does not contain characters that are not allowed or may cause conflicts, and that various characters can be transmitted and parsed correctly. Some common special characters such as spaces, slashes, question marks, etc. have special semantic meanings in URLs. In order to represent them as ordinary characters, URL encoding is required.

For example, spaces are not allowed in URLs, so spaces need to be encoded as %20. Similarly, other special characters also have corresponding encoding methods, such as slash (%2F), question mark (%3F), equal sign (%3D), plus sign (%2B), etc.

URL encoding enables URLs to be transmitted and parsed correctly, ensuring reliability and consistency on the Internet. It is widely used in web development, website analysis and other fields.

Why use URL encoding?

The main purpose of URL encoding is to ensure reliability and consistency in transmitting and parsing various characters in URLs. Here are a few main reasons to use URL encoding:

  1. Semantic issues with special characters: Some characters in the URL have special semantic meanings, such as question mark (?), equal sign (=), slash (/), etc. In order to represent these characters as normal characters rather than playing a special role, they need to be URL encoded.

  2. Security: URL encoding prevents malicious attacks and injections, such as XSS (cross-site scripting) attacks. Encoding protects the security of your system by ensuring that the transmitted data does not contain malicious scripts or unexpected characters.

  3. Conflict of special characters: Some characters in the URL may conflict with the URL structure, especially for user input that contains special characters, such as file names, paths, etc. Through URL encoding, these special characters can be converted into a safe representation to avoid causing conflicts.

  4. Support for non-ASCII characters: URL encoding can also handle non-ASCII characters, which cannot be transmitted directly in the URL. URL encoding allows you to convert non-ASCII characters into a URL-safe encoded form, such as Unicode characters.

To sum up, URL encoding is a standardized character conversion method that ensures the accuracy and consistency of transmitting and parsing various characters in URLs, while improving the security and reliability of URL transmission.

Basic rules for URL encoding

The basic rules for URL encoding are as follows:

  1. Letters, numbers, and some special characters such as -_.~ remain unchanged.

  2. Spaces are encoded as "+" or "%20".

  3. Other non-alphanumeric characters are encoded as the percent sign (%) plus the hexadecimal representation of their ASCII value.

  4. For non-ASCII characters, such as Unicode characters, use "UTF-8" encoding, convert the characters into a sequence of bytes, and then encode the value of each byte as a percent sign (%) plus its hexadecimal value.

  5. For the encoding of certain characters, such as slash (/, %2F) and question mark (?, %3F), although they have no semantic special meaning in most cases, it is best to still encode them for the sake of conservatism.

  6. Encodings are case-sensitive. Uppercase and lowercase letters (AZ and az) are treated as different characters when encoding.

Summarize

URL encoding is a method of converting non-alphanumeric characters in a URL into a specially encoded format. These special encoding formats take the form of percent sign + hexadecimal representation to avoid special characters in the URL being misunderstood for other purposes. URL encoding enables URLs containing special characters to be transmitted and parsed correctly on the network, and ensures the integrity and accuracy of the URL.

Guess you like

Origin blog.csdn.net/weixin_44369049/article/details/132378152