Short link algorithm principle

Usually when we access to the Internet, the most impressive one time is short link service. For example: usually in the micro-channel point of view of a web page, if we choose, when the browser opens, you will see a very long URL, we share the time, you will see a short URL, which is mentioned in this one of the applications of short links.
Long link Example: https:? //Mp.weixin.qq.com/s __biz = MzAxNzMwOTQ0NA == & mid = 2653355437 & idx = 1 & sn = 5901826ea638462ff71b7f2d06c6331d & chksm = 8035d7c6b7425ed06661866af60657414bb71765d2ce915d14726736fa1e72ea8a529331c947 & scene = 0 & key = 34df968fd24033237ff036c7de8b6745e1968de9564cf2a8db689025dd0c3682848381771dab960824f506e6f9d484614746f9c0eecb48b884ce4320bb86470a77ce811cc5b401a8800b6fd6b36be097 & ascene = 0 & uin = ODU5NDQ1NjI1 & devicetype = iMac + MacBookAir6% 2C2 + OSX + OSX + 10.12.6 + build (16G29) & version = 12020810 & nettype = WIFI & fontScale = 100 & pass_ticket = IvPqxUmCJqZg9% 2B3GfAIQSbQ4IGRIHx796D0UwlCyUCu4b5P4bSsjlN89A0eRzSfL
I generated short link system (long term will expire): https: //0x9.me/FAKcm

How to convert long links into short links it?

Without further ado, we are very short short link is a link. Our aim is to convert long links into short links. Next, I want to ask a question: how to convert long links into short links it?
1. Compression
implement an algorithm to convert the short address to the long address. Then realize a reverse operation, converting the short address to the original address. In fact, carefully think about, this is impossible.
2.Hash algorithm
may be a way most people would think of. Someone will propose, will be a long URL Hash operation, then the Hash value that uniquely identifies this as a long link. But generally easy to think it is not necessarily the best, we know that it is possible to produce collisions Hash, Hash collision solution will enhance the complexity of the short-link system.

Optimal algorithm

By the principle of Fa

As the name suggests this system is the first request came, we microblogging, for example, the first request we can give short link systems become t.cn/0, second t.cn/1 etc;
implementation way will be very simple
one, compact system with auto-increment index MySQL should suffice.
2, large systems can be considered distributed key-value system.

Storage principle

Fa strategy is such that, when a new link over, Fa is issued a corresponding number. Whenever a new link coming back, kept Fa Fa is just fine. For example, the first incoming link Fa Fa is 0, corresponding to the link is short xx.xxx/0, the second incoming link Fa is Fa No. 1, corresponding to a short link to xx.xxx/ 1, and so on.
Decimal number Fa emitted band 62 need to be converted, which can greatly reduce the length of the converted number into a string. For example Fa emitted 10,000,000,000 this number, if not converted to hexadecimal 62, directly behind the splicing domain, to obtain such a link xx.xxx/10000000000. Converted into the above 62-ary number, the result is AOYKUa, length of only 6, appends links to xx.xxx/AOYKUa. Can see, the band of the converted short link length becomes shorter a little. 6 hexadecimal number 62, corresponding to the number space 626 is approximately equal to 56800000000, so basically do not worry about where Fa is no number can be issued.

High concurrency scenarios

The above design appears to have a single point, that is number is issued. If you make a distributed, multi-node then add 1 to maintain synchronization, multi-point simultaneous write. Thus it is difficult to avoid a single point of performance bottleneck. Therefore, we can consider a single point into a multi-point. We can introduce a plurality of machine, we can set the number of machine A sends only made equal to 0 modulo 100 to 100N, empathy only sent to the machine B is equal to 1 modulo 100 numbers 100n + 1, and so on, each of the machine independent of each other without disturbing each other, we can extend our machines at any time.

The same long links, each link is transformed into the same short

The same long links, each link does not necessarily turn into short, like, because if the query cache, if not hit, Fa will send a new number to the link. It should be noted that the cache should be cached frequently convert popular links, provided that the set cache expiration time of one hour, if a link is active, then the query cache hit, the cache will refresh the survival time of the link, re-timing, this Links will be long-term existence of the cache.
We can also introduce LRU algorithm. We do not often be eliminated to use the link.

Redirection

Select 301, or 302?
301 is a permanent redirect, 302 redirect is temporary.

If you choose 301 : After a short address generation will not change, it is in line with the 301 http semantics. There will also be some pressure to reduce the server. As a result, we can not count the number of times the short address is clicked.
If you choose 302 : Select 302, although it will increase the pressure on the server, but you can count the number of times the short address is clicked, and I can be late for big data processing clicks, machine learning, and recommendation algorithm.
Select 302 or 301, presumably the hearts of readers there are certainly several of the.

other

The following is from the Internet to find some algorithm Summary:
Algorithms for a:

The most obvious possibility is to use encryption algorithms md5 class, and then processed for encrypted string.

1) The length of the URL string md5 generates 32-bit signature, divided into four segments, each 8 bytes;
2) these four loop processing, taking eight bytes as a hexadecimal string and he 0x3FFFFFFF ( 30 1) and the operation, i.e., more than 30 ignores processing;
3) 30 which is divided into 6 segments, each five-digit number as an index to obtain a particular alphabet character string sequentially performed to obtain 6;
4) total md5 strings may obtain four 6-bit string; can be taken arbitrarily as the inside of a short length url url addresses.

Two algorithms:
A-zA-Z0-9 it takes six 64-bit combinations, can produce 500 million combinations of orders. The numbers and characters do some combination of the mapping, we can generate a unique character string, such as 62 combinations is aaaaa9, the first composition 63 is aaaaba, reuse shuffling algorithm, the original string storage disrupted, then the corresponding combination position in the string will be a combination of disordered.
The length of the URL stored in the database, taking returned id, find the corresponding character string, for example, returns the ID 1, then the corresponding character string is a combination of the above BBB, empathy ID is 2, the string combination BBA, and so on after it appears, until it reaches the 64 combinations may be repeated

Published 158 original articles · won praise 119 · views 810 000 +

Guess you like

Origin blog.csdn.net/u013474436/article/details/104764524