Table of contents
1. Why do you need a short chain
The most common way to push marketing messages to users in content marketing is to send text messages. For example, the three major operators, China Mobile, China Unicom, and China Telecom, usually send some text messages such as package management, consumption inquiries, and phone bill recharges, and banks, cloud service providers, etc. And so on, all kinds of short messages containing query services and so on.
We all know that the content length of a single SMS is limited, and if you want to push a message containing a URL , if the URL is too long, it will not only affect the user's perception, but also take up too many useless words.
At this time, we need to convert the long URL into a short URL , which is the short chain ( 短域名
+ 短请求路径
) we will talk about next.
2. Short chain jump access principle
The principle is still very simple. In fact, the mapping relationship between short links and long links is saved in the background, and then redirected to let the browser jump to the corresponding long link.
For example, the original long link is:https://www.baidu.com, I generated a short link through a certain platform:https://suo.nz/378IQe.
We can see that when accessinghttps://suo.nz/378IQe, the backend returns 302
, and at the same time there is an additional Location
response header, the value is the original linkhttps://www.baidu.com.
Here is a small question, about redirection use 301 or 302 ?
coding | meaning | Remark |
---|---|---|
301 | Moved Permanently | Permanent redirection means that the original URL is no longer used, and the new URL should be preferred. The search engine will directly update the URL related to the resource, which is generally used for website reconstruction. |
302 | Found | Temporary redirection, the search engine will not record the temporary link corresponding to the resource, which is generally used for the temporary unavailability of the page due to unforeseen reasons. |
-
301 is actually more in line with the semantics of the HTTP protocol, but the browser will cache the target URL, skip the short link and jump to the target URL when you visit next time, and cannot do some statistics, such as the number of visits to the short link.
-
302: When the browser accesses, it will access the short-chain proxy service and the target service successively, and the pressure on the server will be correspondingly greater, but some statistics can be made.
Remarks: Many short chain generation platforms are actually redirects
302
.
3. Short chain generation implementation plan
1. Auto-increment sequence algorithm
Commonly used auto-increment sequence algorithms include snowflake algorithm, Redis auto-increment, MySQL primary key auto-increment, etc. After the unique ID is generated, it is converted into a 62-ary string, and the converted 62-ary string can be used as a short chain.
Question: Why do you need to convert to a 62-ary string?
Because the auto-increment ID will become longer and longer, it can become shorter after 62-ary conversion.
Now let’s talk about the advantages and disadvantages of generating short chains from self-incrementing sequences:
- Advantages: The ID is unique, and the generated short chains will not repeat and conflict.
- Disadvantage: As the ID gets bigger and bigger, the length of the short chain will also change, and the length is not fixed.
Let's talk about the advantages and disadvantages of various self-incrementing sequence algorithms:
algorithm | advantage | shortcoming |
---|---|---|
snowflake algorithm | High performance, does not depend on any middleware | There is a system clock callback problem. The length of the original snowflake algorithm is 64 bits, and the generated ID is relatively long. |
Redis self-increment | High performance, high concurrency | Since it is middleware, there are maintenance costs, and persistence, disaster recovery, etc. must be considered at the same time. |
MySQL primary key auto increment | Simple to use and easy to expand | There are performance bottlenecks under high concurrency. |
2. Hash algorithm
Simply put, it is to hash the target long link, and then convert the hash value into a short link by 62-ary encoding. The Hash algorithm we are familiar with is MD5
, SHA
and other algorithms.
These two algorithms are encrypted hash algorithms, and their performance is relatively low. Here we generally use the algorithm Google Guava
implemented in Murmurhash
, which is a non-encrypted hash algorithm. MD5
The advantages are as follows:
- It is faster than MD5.
- The probability of hash collision is low. The algorithm supports 32-bit and 128-bit hash values. MD5 is also a 128-bit hash value. Basically, there is no need to worry about hash conflicts.
- The degree of dispersion is high, and the hash value is relatively uniform.
About Murmurhash
the example as follows:
String url = "https://www.baidu.com/";
// 输出:e9ac4fbdc398e8c104d1b8415f42cbf8
System.out.println(Hashing.murmur3_128().hashString(url, StandardCharsets.UTF_8));
// 输出:06105412
System.out.println(Hashing.murmur3_32_fixed().hashString(url, StandardCharsets.UTF_8));
// 输出:bf447182
System.out.println(Hashing.murmur3_32_fixed().hashLong(Long.MAX_VALUE));
// 转成Long型
// 输出:307499014
System.out.println(Hashing.murmur3_32_fixed().hashString(url, StandardCharsets.UTF_8).padToLong());
// 输出:2188461247
System.out.println(Hashing.murmur3_32_fixed().hashLong(Long.MAX_VALUE).padToLong());
Here are the advantages and disadvantages of generating short chains through the Hash algorithm:
- Advantages: Decentralization, the length of the short chain generated after hashing is basically fixed.
- Disadvantages: There is a probability that hash conflicts will occur, and the methods to solve hash conflicts mainly include
拉链法
and重新哈希
. Since the short chain is generated by converting the hash value to a 62-ary string, if a hash conflict occurs, it must be re-hash generated. If there is a library, each time a short chain is generated, there will be at least one query and one save operation, which will cause performance loss.
4. Code example
In the next example, we mainly use Hash
algorithms + Base62
coding to generate short chains. The flow chart is as follows:
1. Table structure and index
# 短链信息表
create table `t_short_link`
(
`id` bigint primary key auto_increment comment '主键ID',
`short_link` varchar(32) not null default '' comment '短链接',
`long_link_hash` bigint not null default 0 comment 'hash值',
`long_link` varchar(128) not null default '' comment '长链接',
`status` tinyint not null default 1 comment '状态:1-可用,0-不可用',
`expiry_time` datetime null comment '过期时间',
`create_time` datetime not null default current_timestamp comment '创建时间'
) comment '短链信息表';
create index idx_sl_hash_long_link on t_short_link (long_link_hash, long_link);
create index idx_sl_short_link on t_short_link (short_link);
2. External dependencies
<!--Google Guava-->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.1-jre</version>
</dependency>
3、Base62Utils
public abstract class Base62Utils {
private static final int SCALE = 62;
private static final char[] BASE_62_ARRAY = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
private static final String BASE_62_CHARACTERS = String.valueOf(BASE_62_ARRAY);
/**
* 将long类型编码成Base62字符串
* @param num
* @return
*/
public static String encodeToBase62String(long num) {
StringBuilder sb = new StringBuilder();
while (num > 0) {
sb.insert(0, BASE_62_ARRAY[(int) (num % SCALE)]);
num /= SCALE;
}
return sb.toString();
}
/**
* 将Base62字符串解码成long类型
* @param base62Str
* @return
*/
public static long decodeToLong(String base62Str) {
long num = 0, coefficient = 1;
String reversedBase62Str = new StringBuilder(base62Str).reverse().toString();
for (char base62Character : reversedBase62Str.toCharArray()) {
num += BASE_62_CHARACTERS.indexOf(base62Character) * coefficient;
coefficient *= SCALE;
}
return num;
}
}
Remarks:
BASE_62_ARRAY
The order of characters in can be randomly scrambled, not necessarily in order, and scrambling is more secure.
Question: Can Base64
codes be used to generate short chains?
Base64.getEncoder()
The encoded characters corresponding to the encoder obtained in JDK8 will contain special characters such as '+'
, '/'
which are not allowed in the URL, but Base64.getUrlEncoder()
the encoded characters corresponding to the obtained encoder will replace and with '+'
, respectively , so it is actually possible. as follows:'/'
'-'
'_'
4. DAO layer
@Repository
public class ShortLinkManagerImpl implements ShortLinkManager {
@Autowired
private ShortLinkMapper shortLinkMapper;
@Override
public void saveShortLink(String shortLink, long longLinkHash, String longLink) {
ShortLinkDO shortLinkDO = ShortLinkDO.builder()
.shortLink(shortLink)
.longLinkHash(longLinkHash)
.longLink(longLink)
.status(true)
.build();
shortLinkMapper.insert(shortLinkDO);
}
@Override
public String getShortLink(long longLinkHash, String longLink) {
Wrapper<ShortLinkDO> wrapper = Wrappers.lambdaQuery(ShortLinkDO.class)
.select(ShortLinkDO::getShortLink)
.eq(ShortLinkDO::getLongLinkHash, longLinkHash)
.eq(ShortLinkDO::getLongLink, longLink)
.last(CommonConst.LIMIT_SQL);
ShortLinkDO shortLinkDO = shortLinkMapper.selectOne(wrapper);
return Optional.ofNullable(shortLinkDO).map(ShortLinkDO::getShortLink).orElse(null);
}
@Override
public boolean isShortLinkRepeated(String shortLink) {
Wrapper<ShortLinkDO> wrapper = Wrappers.lambdaQuery(ShortLinkDO.class).eq(ShortLinkDO::getShortLink, shortLink);
return shortLinkMapper.selectCount(wrapper) > 0;
}
}
5. Business layer
@Service
public class ShortLinkServiceImpl implements ShortLinkService {
@Autowired
private ShortLinkManager shortLinkManager;
@Override
public String generateShortLink(String longLink) {
long longLinkHash = Hashing.murmur3_32_fixed().hashString(longLink, StandardCharsets.UTF_8).padToLong();
// 通过长链接Hash值和长链接检索
String shortLink = shortLinkManager.getShortLink(longLinkHash, longLink);
if (StringUtils.isNotBlank(shortLink)) {
return shortLink;
}
// 如果Hash冲突则加随机盐重新Hash
return regenerateOnHashConflict(longLink, longLinkHash);
}
private String regenerateOnHashConflict(String longLink, long longLinkHash) {
// 自增序列作随机盐
long uniqueIdHash = Hashing.murmur3_32_fixed().hashLong(SnowFlakeUtils.nextId()).padToLong();
// 相减主要是为了让哈希值更小
String shortLink = Base62Utils.encodeToBase62String(Math.abs(longLinkHash - uniqueIdHash));
if (!shortLinkManager.isShortLinkRepeated(shortLink)) {
shortLinkManager.saveShortLink(shortLink, longLinkHash, longLink);
return shortLink;
}
return regenerateOnHashConflict(longLink, longLinkHash);
}
}
5. Test cases
@SpringBootTest(classes = Application.class)
public class ApplicationTest {
@Autowired
private ShortLinkService shortLinkService;
@Test
public void generateShortLinkTest() {
String shortLink = shortLinkService.generateShortLink("https://www.baidu.com/");
System.err.println("生成的短链为:" + shortLink);
}
}
Console output:
生成的短链为:D4PTSU
Remarks: The length of the generated short chain is basically a 6-digit string, and remember to choose one for the short chain proxy service
短域名
.