Short URL service design and realization

table of Contents

Foreword

Surely you often receive spam messages it ... link in the text are generally short link, so similar to the following:

2019-06-23-23-46-53

Why is there a short url is it? What good is it? How to do it?

The benefits are short url:

  1. Short. SMS and many platforms (microblogging) have word limits, too long links added to the list have no way to write a text.
  2. Good-looking. I do not know so than a lot of parameters, short link more simple and friendly.
  3. Easy to do some statistics you point someone will link recording and analysis.
  4. Security. Not exposed access parameters.

That is why we now receive spam messages are mostly the cause of the short URL.

So short URL is how to do it?

Short URL Fundamentals

Short URL into the following steps from generation to use.

  1. There is a service that will be sent to your long URL corresponds to a short URL. For examplewww.baidu.com -> www.t.cn/1
  2. The short url spliced ​​to send text messages and other content.
  3. The user clicks on a short URL, browser with 301/302 redirection, access to long URL corresponding.
  4. Display the corresponding content.

This article focuses on the first step, that is, how long will a URL corresponds to a short URL.

Service Design

If you want to go on the real correspondence between the length of the URL, then it is gone.

Ideally: We use an algorithm, for each long URL, only converted into a short URL can maintain the ability to reverse the conversion.

But this is not possible, if there is such algorithms, compression algorithms on all the world can place died.

The idea is to establish a correct number is issued, every time there is a new long URL in, we have increased by one, and the new values ​​to be returned. The first return to the url "www.x.cn/0", the first two return "www.x.cn/1".

Then write a few minor problems with QA form:

How to store correspondence between?

This is certainly to be off corresponding data disc can not be re Arranging each system restart, the database mysql the like can be employed to store. And if the small amount of data and low QPS, directly increment primary key database can be achieved.

How to ensure that the length of the link correspondence?

According to the above number is issued strategy is no guarantee that the length of one to one link, you request twice in a row with the same URL, the resulting value is not the same.

In order to achieve the length of the link correspondence, we need to pay the cost of a lot of space, especially in order to respond quickly, we may need to do one cached in memory, this way too wasteful.

But we can achieve some variants, to achieve one-part, such as the KV in the database, this way you can save space recent / most popular correspondence between the storage at the same time, speed up response time.

Short URL of storage

Short URL Our return is generally convert a number into 32 hex, this way you can more effectively shorten the length of the URL, then the 32 digit hexadecimal string is just a computer, how to store it? Strings are stored directly on the equivalence Find easy to find, on the scope of the search and other unfriendly too.

In fact, it can be directly stored in decimal numbers, not only take up less space, support for looking better, but also can be more easily converted to more / less decimal to further shorten the URL.

High concurrency

If stored directly in MySQL, when concurrent requests increases, the pressure on the database is too large, it may cause a bottleneck, this time can have some optimization.

Cache

Above to ensure that the length of the link correspondence also mentioned cache, here we are in order to speed up the processing speed of the program can be popular long link (the need for a long number of incoming links is counted), the nearest long link (you can use to save redis carried out the last hour) like a cache memory is stored in the memory or the like redis database, if the requested cache hit long URL, the URL corresponding to a short direct access is returned, no further generation operation.

Batch Fa

Every Fa need access to a MySQL to get the current maximum number, and updates the maximum number after the acquisition, the pressure is relatively large.

We can get from the database each time the number 10000, and then distributed in memory, when the remaining number less than 1,000, MySQL re-request to number at 10,000. After the last batch numbers issued over, batch writing.

This database will continue to conduct operations moved to the code, and asynchronous access and write operations, to ensure the continued high concurrent services.

distributed

The above system is designed to have a single point, that is, Fa is a single-point, easy to hang.

You can use distributed services, distributed, then, if after every Fa Fa is conducted need to be synchronized to other Fa is, it may not be too much trouble.

In other words an idea, there may be two Fa, a hair after a single number, a double Fa, Fa is no longer incremented by 1, but incremented by two.

Analogy available, we can use the 1000 service, disbursed digital tail number 0-999, in increments of 1000. This is very simple to do after Fa, basic services do not communicate between each other, do things like a.

achieve

Because I'm too lazy to write JDBC code, but too lazy to get Mybatis, so the code used to use MySQL where the Redis.

package util;

import redis.clients.jedis.Jedis;

/**
 * Created by pfliu on 2019/06/23.
 */
public class ShortUrlUtil {


    private static final String SHORT_URL_KEY = "SHORT_URL_KEY";
    private static final String LOCALHOST = "http://localhost:4444/";
    private static final String SHORT_LONG_PREFIX = "short_long_prefix_";
    private static final String CACHE_KEY_PREFIX = "cache_key_prefix_";
    private static final int CACHE_SECONDS = 1 * 60 * 60;

    private final String redisConfig;
    private final Jedis jedis;

    public ShortUrlUtil(String redisConfig) {
        this.redisConfig = redisConfig;
        this.jedis = new Jedis(this.redisConfig);
    }

    public String getShortUrl(String longUrl, Decimal decimal) {
        // 查询缓存
        String cache = jedis.get(CACHE_KEY_PREFIX + longUrl);
        if (cache != null) {
            return LOCALHOST + toOtherBaseString(Long.valueOf(cache), decimal.x);
        }

        // 自增
        long num = jedis.incr(SHORT_URL_KEY);
        // 在数据库中保存短-长URL的映射关系,可以保存在MySQL中
        jedis.set(SHORT_LONG_PREFIX + num, longUrl);
        // 写入缓存
        jedis.setex(CACHE_KEY_PREFIX + longUrl, CACHE_SECONDS, String.valueOf(num));
        return LOCALHOST + toOtherBaseString(num, decimal.x);
    }

    /**
     * 在进制表示中的字符集合
     */
    final static char[] digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8',
            '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
            'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y',
            'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'};

    /**
     * 由10进制的数字转换到其他进制
     */
    private String toOtherBaseString(long n, int base) {
        long num = 0;
        if (n < 0) {
            num = ((long) 2 * 0x7fffffff) + n + 2;
        } else {
            num = n;
        }
        char[] buf = new char[32];
        int charPos = 32;
        while ((num / base) > 0) {
            buf[--charPos] = digits[(int) (num % base)];
            num /= base;
        }
        buf[--charPos] = digits[(int) (num % base)];
        return new String(buf, charPos, (32 - charPos));
    }

    enum Decimal {
        D32(32),
        D64(64);

        int x;

        Decimal(int x) {
            this.x = x;
        }
    }


    public static void main(String[] args) {

        for (int i = 0; i < 100; i++) {
            System.out.println(new ShortUrlUtil("localhost").getShortUrl("www.baidudu.com", Decimal.D32));
            System.out.println(new ShortUrlUtil("localhost").getShortUrl("www.baidu.com", Decimal.D64));
        }
    }
}

复制代码

Finish.



ChangeLog

2019-06-24 completed

Above are all personal income and think, correct me if wrong welcomed the comments section.

Welcome to reprint, please sign, and retain the original link.

Contact E-mail: [email protected]

More study notes, see the individual blog ------> Huyan ten

Guess you like

Origin juejin.im/post/5d10ecab518825795a4d380e