What is the principle of short chain? How to achieve it?

1. Why do you need a short chain

The most common way to push marketing messages to users in content marketing is to send text messages. For example, the three major operators, China Mobile, China Unicom, and China Telecom, usually send some text messages such as package management, consumption inquiries, and phone bill recharges, and banks, cloud service providers, etc. And so on, all kinds of short messages containing query services and so on.

We all know that the content length of a single SMS is limited, and if you want to push a message containing a URL , if the URL is too long, it will not only affect the user's perception, but also take up too many useless words.

At this time, we need to convert the long URL into a short URL , which is the short chain ( 短域名+ 短请求路径) we will talk about next.



2. Short chain jump access principle

The principle is still very simple. In fact, the mapping relationship between short links and long links is saved in the background, and then redirected to let the browser jump to the corresponding long link.
insert image description here
For example, the original long link is:https://www.baidu.com, I generated a short link through a certain platform:https://suo.nz/378IQe.
insert image description here
We can see that when accessinghttps://suo.nz/378IQe, the backend returns 302, and at the same time there is an additional Locationresponse header, the value is the original linkhttps://www.baidu.com.

Here is a small question, about redirection use 301 or 302 ?

coding meaning Remark
301 Moved Permanently Permanent redirection means that the original URL is no longer used, and the new URL should be preferred. The search engine will directly update the URL related to the resource, which is generally used for website reconstruction.
302 Found Temporary redirection, the search engine will not record the temporary link corresponding to the resource, which is generally used for the temporary unavailability of the page due to unforeseen reasons.
  • 301 is actually more in line with the semantics of the HTTP protocol, but the browser will cache the target URL, skip the short link and jump to the target URL when you visit next time, and cannot do some statistics, such as the number of visits to the short link.
    insert image description here

  • 302: When the browser accesses, it will access the short-chain proxy service and the target service successively, and the pressure on the server will be correspondingly greater, but some statistics can be made.

Remarks: Many short chain generation platforms are actually redirects 302.



3. Short chain generation implementation plan

1. Auto-increment sequence algorithm

Commonly used auto-increment sequence algorithms include snowflake algorithm, Redis auto-increment, MySQL primary key auto-increment, etc. After the unique ID is generated, it is converted into a 62-ary string, and the converted 62-ary string can be used as a short chain.

Question: Why do you need to convert to a 62-ary string?
Because the auto-increment ID will become longer and longer, it can become shorter after 62-ary conversion.

Now let’s talk about the advantages and disadvantages of generating short chains from self-incrementing sequences:

  • Advantages: The ID is unique, and the generated short chains will not repeat and conflict.
  • Disadvantage: As the ID gets bigger and bigger, the length of the short chain will also change, and the length is not fixed.

Let's talk about the advantages and disadvantages of various self-incrementing sequence algorithms:

algorithm advantage shortcoming
snowflake algorithm High performance, does not depend on any middleware There is a system clock callback problem. The length of the original snowflake algorithm is 64 bits, and the generated ID is relatively long.
Redis self-increment High performance, high concurrency Since it is middleware, there are maintenance costs, and persistence, disaster recovery, etc. must be considered at the same time.
MySQL primary key auto increment Simple to use and easy to expand There are performance bottlenecks under high concurrency.

2. Hash algorithm

Simply put, it is to hash the target long link, and then convert the hash value into a short link by 62-ary encoding. The Hash algorithm we are familiar with is MD5, SHAand other algorithms.

These two algorithms are encrypted hash algorithms, and their performance is relatively low. Here we generally use the algorithm Google Guavaimplemented in Murmurhash, which is a non-encrypted hash algorithm. MD5The advantages are as follows:

  1. It is faster than MD5.
  2. The probability of hash collision is low. The algorithm supports 32-bit and 128-bit hash values. MD5 is also a 128-bit hash value. Basically, there is no need to worry about hash conflicts.
  3. The degree of dispersion is high, and the hash value is relatively uniform.

About Murmurhashthe example as follows:

String url = "https://www.baidu.com/";

// 输出:e9ac4fbdc398e8c104d1b8415f42cbf8
System.out.println(Hashing.murmur3_128().hashString(url, StandardCharsets.UTF_8));
// 输出:06105412
System.out.println(Hashing.murmur3_32_fixed().hashString(url, StandardCharsets.UTF_8));
// 输出:bf447182
System.out.println(Hashing.murmur3_32_fixed().hashLong(Long.MAX_VALUE));

// 转成Long型

// 输出:307499014
System.out.println(Hashing.murmur3_32_fixed().hashString(url, StandardCharsets.UTF_8).padToLong());
// 输出:2188461247
System.out.println(Hashing.murmur3_32_fixed().hashLong(Long.MAX_VALUE).padToLong());

Here are the advantages and disadvantages of generating short chains through the Hash algorithm:

  • Advantages: Decentralization, the length of the short chain generated after hashing is basically fixed.
  • Disadvantages: There is a probability that hash conflicts will occur, and the methods to solve hash conflicts mainly include 拉链法and 重新哈希. Since the short chain is generated by converting the hash value to a 62-ary string, if a hash conflict occurs, it must be re-hash generated. If there is a library, each time a short chain is generated, there will be at least one query and one save operation, which will cause performance loss.


4. Code example

In the next example, we mainly use Hashalgorithms + Base62coding to generate short chains. The flow chart is as follows:
insert image description here

1. Table structure and index

# 短链信息表
create table `t_short_link`
(
    `id`             bigint primary key auto_increment comment '主键ID',
    `short_link`     varchar(32)  not null default '' comment '短链接',
    `long_link_hash` bigint       not null default 0 comment 'hash值',
    `long_link`      varchar(128) not null default '' comment '长链接',
    `status`         tinyint      not null default 1 comment '状态:1-可用,0-不可用',
    `expiry_time`    datetime     null comment '过期时间',
    `create_time`    datetime     not null default current_timestamp comment '创建时间'
) comment '短链信息表';
create index idx_sl_hash_long_link on t_short_link (long_link_hash, long_link);
create index idx_sl_short_link on t_short_link (short_link);

2. External dependencies

<!--Google Guava-->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.1-jre</version>
</dependency>

3、Base62Utils

public abstract class Base62Utils {
    
    

	private static final int SCALE = 62;

	private static final char[] BASE_62_ARRAY = {
    
    
		'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
		'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
		'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
		'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
	};

	private static final String BASE_62_CHARACTERS = String.valueOf(BASE_62_ARRAY);

	/**
	 * 将long类型编码成Base62字符串
	 * @param num
	 * @return
	 */
	public static String encodeToBase62String(long num) {
    
    
		StringBuilder sb = new StringBuilder();
		while (num > 0) {
    
    
			sb.insert(0, BASE_62_ARRAY[(int) (num % SCALE)]);
			num /= SCALE;
		}
		return sb.toString();
	}

	/**
	 * 将Base62字符串解码成long类型
	 * @param base62Str
	 * @return
	 */
	public static long decodeToLong(String base62Str) {
    
    
		long num = 0, coefficient = 1;
		String reversedBase62Str = new StringBuilder(base62Str).reverse().toString();
		for (char base62Character : reversedBase62Str.toCharArray()) {
    
    
			num += BASE_62_CHARACTERS.indexOf(base62Character) * coefficient;
			coefficient *= SCALE;
		}
		return num;
	}
}

Remarks: BASE_62_ARRAYThe order of characters in can be randomly scrambled, not necessarily in order, and scrambling is more secure.

Question: Can Base64codes be used to generate short chains?

Base64.getEncoder()The encoded characters corresponding to the encoder obtained in JDK8 will contain special characters such as '+', '/'which are not allowed in the URL, but Base64.getUrlEncoder()the encoded characters corresponding to the obtained encoder will replace and with '+', respectively , so it is actually possible. as follows:'/''-''_'
insert image description here

4. DAO layer

@Repository
public class ShortLinkManagerImpl implements ShortLinkManager {
    
    

	@Autowired
	private ShortLinkMapper shortLinkMapper;

	@Override
	public void saveShortLink(String shortLink, long longLinkHash, String longLink) {
    
    
		ShortLinkDO shortLinkDO = ShortLinkDO.builder()
			.shortLink(shortLink)
			.longLinkHash(longLinkHash)
			.longLink(longLink)
			.status(true)
			.build();
		shortLinkMapper.insert(shortLinkDO);
	}

	@Override
	public String getShortLink(long longLinkHash, String longLink) {
    
    
		Wrapper<ShortLinkDO> wrapper = Wrappers.lambdaQuery(ShortLinkDO.class)
			.select(ShortLinkDO::getShortLink)
			.eq(ShortLinkDO::getLongLinkHash, longLinkHash)
			.eq(ShortLinkDO::getLongLink, longLink)
			.last(CommonConst.LIMIT_SQL);
		ShortLinkDO shortLinkDO = shortLinkMapper.selectOne(wrapper);
		return Optional.ofNullable(shortLinkDO).map(ShortLinkDO::getShortLink).orElse(null);
	}

	@Override
	public boolean isShortLinkRepeated(String shortLink) {
    
    
		Wrapper<ShortLinkDO> wrapper = Wrappers.lambdaQuery(ShortLinkDO.class).eq(ShortLinkDO::getShortLink, shortLink);
		return shortLinkMapper.selectCount(wrapper) > 0;
	}
}

5. Business layer

@Service
public class ShortLinkServiceImpl implements ShortLinkService {
    
    

	@Autowired
	private ShortLinkManager shortLinkManager;

	@Override
	public String generateShortLink(String longLink) {
    
    
		long longLinkHash = Hashing.murmur3_32_fixed().hashString(longLink, StandardCharsets.UTF_8).padToLong();
		// 通过长链接Hash值和长链接检索
		String shortLink = shortLinkManager.getShortLink(longLinkHash, longLink);
		if (StringUtils.isNotBlank(shortLink)) {
    
    
			return shortLink;
		}
		// 如果Hash冲突则加随机盐重新Hash
		return regenerateOnHashConflict(longLink, longLinkHash);
	}

	private String regenerateOnHashConflict(String longLink, long longLinkHash) {
    
    
		// 自增序列作随机盐
		long uniqueIdHash = Hashing.murmur3_32_fixed().hashLong(SnowFlakeUtils.nextId()).padToLong();
		// 相减主要是为了让哈希值更小
		String shortLink = Base62Utils.encodeToBase62String(Math.abs(longLinkHash - uniqueIdHash));
		if (!shortLinkManager.isShortLinkRepeated(shortLink)) {
    
    
			shortLinkManager.saveShortLink(shortLink, longLinkHash, longLink);
			return shortLink;
		}
		return regenerateOnHashConflict(longLink, longLinkHash);
	}

}


5. Test cases

@SpringBootTest(classes = Application.class)
public class ApplicationTest {
    
    

	@Autowired
	private ShortLinkService shortLinkService;

	@Test
	public void generateShortLinkTest() {
    
    
		String shortLink = shortLinkService.generateShortLink("https://www.baidu.com/");
		System.err.println("生成的短链为:" + shortLink);
	}
}

Console output:

生成的短链为:D4PTSU

Remarks: The length of the generated short chain is basically a 6-digit string, and remember to choose one for the short chain proxy service 短域名.

insert image description here

Guess you like

Origin blog.csdn.net/lingbomanbu_lyl/article/details/128264414