This is probably the most down to earth of the Eastern Hemisphere short link system design

This afternoon, the smoke pit brother and co-workers wait in line in the toilet when (people cratered less). Imagine a scene, I was side of the line, while holding a cell phone tease sister. In front of a colleague, he turned his head and holding a cell phone text messaging to chat with me.
So we began to discuss the implementation of the principles of this short link below (yes, the toilet did not forget to learn!).

After a short click on the link, we will jump to the following address
http://h5.dangdang.com/mix_20191015_or4x
of this article, let's talk about its implementation principle!

text

Demand origin

Said here about why you need a short link? This simple, such as when you send microblogging

If the URL address is too long, obviously the fewer can write keyword!

Another example send text messages if the message is too long, then a message will be split into two hair, waste of money!

So a short link, not only save resources, but also very beautiful!

Request Process

First, we look at the short link coupons http://dwz.win/nXR
which is composed of two parts
http://dwz.win: the domain name system address short link
nXR: request parameter
request http://dwz.win/nXRthe address returns to the state as shown below

Thus, we can infer, Qiaoxia http://dwz.win/nXRthe address, what happened then?

Here slag slag smoke will speak out of turn one up. As shown in FIG upper short link system, it can be returned to a state 301 or 302, except using a 301 coupons.
Here I have to say it, we should understand that 30Xthe state, in the HTTP protocol, on behalf of the redirection of state.
What 301 representatives?
301 represents a permanent redirect. What does it mean?
For GET requests, 301 jumps will be the default browser cache. In other words, the user first accesses a short link, if the server returns a 301 status code, then the access to the same user multiple times in the following short link address, the browser will jump directly address the request, but will not go short take the link system!

Obviously the advantages of doing so, reduce the pressure on the server, but can not count the number of clicks to link short address.

What 302 represents?
302 represents a temporary orientation. What does it mean?
For GET requests, the default will not be 302 jumps browser cache, browser cache unless implied by Cache-Control or Expires in the HTTP response. Therefore, each time a user accesses the same short link address, the browser will go to take a short link system.

The advantage of doing so is able to count the number of times the short address is clicked. But the pressure on the server larger.

Below that the most critical period, how to http://h5.dangdang.com/mix_20191015_or4xcompress nXRcharacters

Algorithm theory

First of all it, we need a table to store the mapping relationship between the length of the link. The following table structure

Column Name Explanation
id BIGINT, auto-increment primary keys
url Long address, which is the need to jump original address

Well, this time we assume the following data table

id url
1 http://h5.dangdang.com/mix_20191015_or4x
2 http://h5.dangdang.com/mix_20191102_ad3x

We take this time increment id key as a short link. Assuming that the domain name http://dwz.winis short link system, which means that the request:
(1) http://dwz.win/1jumps http://h5.dangdang.com/mix_20191015_or4x;
(2) http://dwz.win/2jumps http://h5.dangdang.com/mix_20191102_ad3x;

To do so, nor is it not, there are two drawbacks that you want to evaluate can not accept!

  • (1) If the data is relatively large, such as tens of billions, it is still too long your url address
  • (2) your data have regularity, others with a simple script you can traverse the jump address!

In order to solve the above two drawbacks, we have added a column to store key value. At this time, the following table structure

Column Name Explanation
id BIGINT, auto-increment primary keys
key Short string, you need to add a unique index
url Long address, which is the need to jump original address

Id we have to shorten the length of it, generally do so. Because of our short link is selected by az, AZ and 0-9 a total of 62 characters. Therefore, we can say that digital id decimal, hexadecimal 62 is converted to a number, such as 201314 can be converted to Qn0.
Algorithm is as follows

private static final String BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

public static String toBase62(long num) { StringBuilder sb = new StringBuilder(); do { int i = (int) (num % 62); sb.append(BASE.charAt(i)); num /= 62; } while (num > 0); return sb.reverse().toString(); }

In addition, we need to introduce a global number is issued as far back as the global auto-incremented ID. Equivalent, short link to our request to go to this global system increment ID, and then convert the global increment is 62 hexadecimal ID number as key.

Next, the second problem is solved! Data regularity problems. After all, after you converted to hexadecimal 62, just solve the problem of data is too long, data regularity issues still not resolved.
Therefore, we need to introduce a random algorithm. So this time, you point to consider is that if you want based on key values, the introduction of a global anti-id value! To choose a different random algorithm!
(1) do not want to launch a global anti-ID
the OK, then use a shuffling algorithm, disrupt the value calculated. For example, you can convert a decimal 201314 to Qn0. Then use the shuffle algorithm can return n0Q, Q0n .... one of them. But there is a chance of conflict, wash several times on the trip.
(2) hopes to launch a global anti-ID
after OK, then get Qn0 in this number, convert it to a binary number. Then the holding position, fifth, tenth ... (etc.) into a random value.
As for how the thrust reverser is also very simple, you get a short link key, digital fixed place of removal, and then can be converted to decimal.

In talking about this, basically how key generation logic it clear. Then the user has clicked on the link a short time, such as the address http://dwz.win/nXR, parses the short link key is NXR system, according to a unique index to the table will return to NXR corresponding url.

Details of the optimization

(1) sub-library sub-table
if the system is on a public network, for everyone to use. Recommend up on sub-library sub-table, the amount of data over 10 million is very easy. Here it comes to a question, get an overall Fa to do self-energizing id fragmented health, or take the key converted to make shard key.

Obviously, with the key converted to make shard key is easier. If the ID as a shard key, there are two problems!
(1) user request key, a need to do back projection inverse ID, and then based on ID, go to find the corresponding list, increase the response time.
(2) depending on the random selection algorithm, key ID may not return values can be estimated. In this case, only each table to check more slowly.

So do with the key shard key, easy to do. KEY get the user's request, positioned directly into the table corresponding to url removed.

(2) separate read and write
such a system is clearly, much larger than the read write. It should consider doing separate read and write.

(3) the introduction of the cache
assumptions, we are at a time. After the link to the mobile push SMS text messages. Obviously, in the later period of time, the amount of short link request will be greatly enhanced. There is no need to go to every database query, it can be introduced redis cache.

(4) Global Fa okay Used other algorithms
can. Here are just a globally unique ID to it. To estimate your own good, use affect the performance of other algorithms brings. And the use of other algorithms, will not cause too-generated ID generation rule.

(5) anti-attack
ready to be malicious attacks, to prevent the value of the increment ID, and is all run out.

Guess you like

Origin www.cnblogs.com/liliuguang/p/11841446.html