Design and implementation of Zhuanzhuan short chain platform

1 Background introduction

Zhuanzhuan is China's leading second-hand trading platform. Links, as an important medium for users to interact and transmit information on the platform, play an indispensable role.

Traditional long links usually contain a large number of characters and special symbols, which are difficult to remember and spread. Due to the number of words, long links will have certain limitations when sending text messages, generating QR codes, and publishing on social platforms.

2 Working principle

2.1 Short link generation and storage

After receiving the long link provided by the business party, the short link platform first checks whether the short link mapping relationship already exists through the hash algorithm (MD5). If it exists, it will be returned. If it does not exist, it will generate a unique ID identification (number segment mode), and then select the appropriate one. The short link generation algorithm (Base62) converts the unique ID into a short link. The mapping relationship between the generated short link and the original long link needs to be persisted so that the user can quickly find and locate the original long link when accessing.

2.2 Short link return and propagation

Once the short link is successfully generated, the short link platform will return the short link to the business party. Business parties can spread short links to users in a variety of ways, such as embedding them in web pages, sending text messages, and sharing them on social media. After the user obtains this short link, he can click to access the corresponding resource.

2.3 User clicks and jumps

When a user clicks on a short link, the browser sends a request to the short link platform. The short link platform needs to find the mapping relationship based on the short link, and then correctly guide the user to the business system of the original long link. This step requires efficient data retrieval and jumping mechanisms.

HTTP status codes 301 and 302 can both represent redirects. 301 permanent redirects will use browser cache, resulting in incorrect statistics on the number of short link visits, and 302 temporary redirects will access the short link platform every time, thereby increasing service pressure.

3 core algorithms

The conversion of long links to short links is the core function of the short link platform, which requires an efficient and unique algorithm to ensure that each long link can be mapped to a corresponding short link.

3.1 Hash algorithm

3.1.1 MD5

MD5 is a widely used hash algorithm that converts input data into a 128-bit hash value, which can be used to generate the basic hash value of short links in short link platforms.

3.1.2 SHA-256

SHA-256 is a more secure hash algorithm that generates a 256-bit hash value. Although SHA-256 is more secure than MD5, it is also longer, affecting the length of the short link.

3.2 Distributed ID

When using hash results directly as short links, hash collision and link length are issues that need to be considered. In short chain platforms, measures need to be taken to prevent hash collisions, such as using unique identifiers.

3.2.1 Global increment

Auto-increment ID is another common distributed unique ID generation method, which uses an auto-increment counter to generate unique IDs. For example, MySQL's auto-increment primary key, or Redis's incr command. This method is simple and efficient and suitable for many scenarios.

3.2.2 Number segment mode

The number segment mode allocates different number segment ranges to different nodes. Each node generates a unique ID internally and then redistributes it after use to ensure global uniqueness.

3.2.3 SnowFlake

SnowFlake (Snowflake algorithm) is a commonly used distributed unique ID generation algorithm. It splits a large integer ID into multiple parts, including timestamp, machine ID, data center ID, serial number, etc., thus ensuring the generated IDs are unique and increasing.

However, while the Snowflake algorithm is excellent at generating unique IDs in a distributed environment, it is not immune to clock rollback problems. If a clock rollback occurs, it may cause the generated ID to be reversed in time.

3.3 Base62 encoding

Base62 encoding is a method of converting data to contain only numbers and letters. It uses 62 characters, namely 0-9, a-z, A-Z, which can be used as a string representation of URL short links, file names and other scenarios. Compared with other encodings such as hexadecimal or hexadecimal, Base62 has a higher Readability and stability.

import java.util.ArrayList;
import java.util.List;

public class Base62Encoder {
    
    

    private static final String BASE62_CHARACTERS = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    public static String encode(long num) {
    
    
        StringBuilder sb = new StringBuilder();
        do {
    
    
            int remainder = (int) (num % 62);
            sb.insert(0, BASE62_CHARACTERS.charAt(remainder));
            num /= 62;
        } while (num != 0);
        return sb.toString();
    }
}

Only 6-bit Base62 encoding can represent about 56.8 billion (62 to the 6th power) numbers.

4 Safety and protection

In the design and implementation process of the Zhuanzhuan short-link platform, ensuring the security of user data and the stability of the platform is the top priority. To this end, we have adopted a series of security and protection strategies to deal with potential risks and threats, and to ensure user privacy and the normal operation of the system.

4.1 Long link legality verification

Before generating a short link, the original long link provided by the user first needs to be verified to ensure that the link points to a legal and trustworthy target resource.

Legality verification usually covers the following aspects:

  1. Legacy of the main domain name: First, the platform will parse the original long link and extract the domain name information. This domain name is then compared against a predefined list of legitimate domain names to confirm that the link points to the expected domain name. Doing so can effectively prevent malicious links or pointing to unsafe websites.
  2. Query parameter domain name validity: The query parameter domain name in the link may also affect user security. The platform also needs to verify whether these domain names are legitimate to avoid potential security risks.

4.2 Repeated generation of short link protection strategy

The protection strategy for repeatedly generating short links is of great significance in the design of short link platforms. It is designed to prevent resource waste and system confusion caused by repeatedly generating the same short link.

The short link platform can adopt an idempotent design based on the MD5 value of the long link to ensure that the processing results of multiple identical requests are consistent and no additional short links are generated.

4.3 Short link validity verification

After a user clicks or enters a short link, the short link platform needs to quickly and accurately determine whether the link is valid, thereby deciding whether to redirect the user to the original long link or provide corresponding error information.

The short link platform will verify the validity of the short link by querying the database. If the short link has a valid mapping relationship, the platform will confirm that the link is valid, otherwise it will determine that the link is invalid.

5 System performance optimization

Optimizing system performance is the key to ensuring efficient and stable operation of the Zhuanzhuan short-chain platform. By adopting a series of strategies and technologies, we continuously improve the platform's response speed, concurrent processing capabilities and resource utilization efficiency to meet user needs and provide an excellent user experience.

5.1 Database index

The database is the core data storage component of the short-chain platform, so it is very important to optimize the design and access of the database. Use the unique identification ID of the long link as the primary key index and the MD5 value of the long link as the ordinary index to support fast link validity verification and redirection operations.

5.2 Caching applications

The use of caching technology can significantly reduce the number of database accesses, thus improving the response speed of the system. We use distributed cache Redis to asynchronously store short link mapping relationships in the cache to reduce the pressure on the database. In this way, link mapping information can be quickly obtained under high concurrency conditions and the efficiency of user access can be improved.

5.3 Segment mode optimization

The traditional number segment mode will only request the dispatcher to allocate a new number segment when the node consumes all the number segments, which may cause a performance bottleneck in a short period of time. We introduce an independent monitoring thread to regularly check the usage of number segments, and request the allocation of new number segments once the number of used IDs exceeds the threshold. The new number segment mode can smoothly switch number segments under high concurrency conditions and pre-allocate number segments to avoid blocking business processes, thereby improving system performance and stability.

5.4 Table splitting strategy

As the number of users and linked data grows, a single database table may face performance bottlenecks. To deal with this problem, we adopted a table splitting strategy. Splitting the linked data evenly into 64 tables according to the rule of taking the remainder of 64 by the unique ID can effectively reduce the pressure on a single table and improve the scalability and performance of the database.

5.5 Business monitoring

Business monitoring is one of the key aspects of the system, aiming to track the performance and operating status of the system in real time to ensure high availability and high performance. With the help of Prometheus, the Zhuanzhuan monitoring system, we can collect and display key performance indicators, such as the frequency of requests to generate short link links and obtain long links, the security verification of links, etc., so that we can view the system operation status at a glance, thus Better decision-making and optimization.

6 Summary

Through in-depth research and practice, Zhuanzhuan's short link platform provides users with efficient and safe link services. In the ever-evolving Internet environment, short-chain platforms will continue to innovate to meet the changing needs of users.

About the author:

Cao Jiantao, Zhuanzhuan C2C & Consignment Business R&D Engineer

Zhuanzhuan is a technical learning and exchange platform for R&D centers and industry partners, regularly sharing frontline practical experience and cutting-edge technical topics in the industry.
Follow the public accounts "Zhuanzhuan Technology" (comprehensive), "Zhuanzhuan FE" (focused on FE), "Zhuanzhuan QA" (focused on QA), and more useful practices, Welcome to communicate and share~

Guess you like

Origin blog.csdn.net/zhuanzhuantech/article/details/132195316