IM message ID technology topic (6): Deep decryption of Didi's high-performance ID generator (Tinyid)

1 Introduction

In medium and large IM systems, the unique ID generation strategy for chat messages is a very important technical point. It is not an exaggeration to say that the chat message ID runs through almost every algorithm, logic and process of the entire chat life cycle. The quality of the ID generation strategy may directly determine the difficulty of designing the system on certain technical points.

In small and medium IM scenarios, the message ID can be handled simply, anyway, as long as it is unique, but in medium and large scenarios, because distributed performance and consistency should be considered, the problem to be considered is another matter.

In short, the generation of IM’s message ID can be deep or shallow. It seems simple but the actual exploreable boundary can be very large. This is why the instant messaging network has specially compiled a series of articles on "IM Message ID Technology Topics". the reason. The so-called accumulation of technology, the more you understand, the greater your technical maneuverability will be. I hope that reading this series of articles can bring you more useful inspirations for the technical selection of ID generation.

In addition, because the instant messaging network is mainly concerned with the development of instant messaging systems, it does not mean that this system article is only applicable to real-time communication systems such as IM or message push. It is also applicable to other applications that require unique IDs.

What this article will share is the technical principle, usage method, etc. of Didi's open source distributed ID generator Tinyid, hoping to further open up your technical vision in this regard.

study Exchange:

-5 groups for instant messaging/push technology development and exchange: 215477170 [recommended]

-Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch "

-Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK

(This article was published synchronously at: http://www.52im.net/thread-3129-1-1.html )

2. Thematic catalogue

This article is the sixth in a series of articles on "IM Message ID Technology Topics". The list of topics is as follows:

" IM Message ID Technology Topic (1): Practice of Generating Massive IM Chat Message Sequence Numbers in WeChat (Principles of Algorithms) "

" IM Message ID Technology Topic (2): Practice of Generating Massive IM Chat Message Sequence Numbers in WeChat (Disaster Recovery Solution) "

" IM Message ID Technology Topic (3): Decrypting the Chat Message ID Generation Strategy of Rongyun IM Products "

" IM Message ID Technology Topic (4): Deep Decryption of Meituan's Distributed ID Generation Algorithm "

" IM Message ID Technology Topic (5): Technical Implementation of Open Source Distributed ID Generator UidGenerator "

" IM Message ID Technology Topic (6): Deep Decryption Didi's High-Performance ID Generator (Tinyid) " (* This article)

3. What is Tinyid?

Tinyid is a distributed id generation system developed by Didi in Java, based on the database number segment algorithm.

Tinyid is an extension of Meituan's ID generation algorithm Leaf. It supports the database multi-master node mode. It provides two access methods, REST API and Java client, which is relatively more convenient to use. However, unlike Meituan's Leaf algorithm, Tinyid only supports one mode of number segment (Snowflake mode is not supported). (For the Leaf algorithm of Meituan, you can read " IM Message ID Technology Topic (4): Deep Decryption of Meituan's Distributed ID Generation Algorithm ")

Tinyid is currently used in Didi's customer service department, and it is accessed through tinyid-client. It generates 100 million IDs every day. In terms of performance, it is said that a single instance can reach 10 million QPS.

Its open source address is:

PS: Didi wrote a sentence on the Tinyid project page, "tinyid is not an official product of Didi, but a code owned by Didi". My language is not good. How should I understand this sentence?

4. The main technical characteristics of Tinyid

The main features are summarized as follows:

  • 1) Globally unique long ID: the limit number of IDs is 2 to the 64th power;
  • 2) ID with increasing trend: Increasing trend means that the id is increasing but not necessarily continuous (this is similar to the ID generation strategy of WeChat );
  • 3) Provide http and java-client access;
  • 4) Support batch obtaining ID;
  • 5) Support generating IDs of 1, 3, 5, 7, 9... sequence;
  • 6) Support the configuration of multiple db.

Applicable scenarios: I only care about systems where IDs are numbers and trends are increasing. I can tolerate discontinuities in IDs and waste IDs.

Inapplicable scenarios: Like business similar to order ID, because most of the generated ID is continuous, it is easy to scan the database or calculate the order quantity and other information.

In addition: WeChat’s chat message ID generation algorithm is also based on the logic of number segments and trend increments. If you are interested, please refer to: " IM Message ID Technology Topic (1): WeChat’s Massive IM Chat Message Sequence Number Generation Practice (Algorithm) Principles) ".

5. Tinyid's technical advantages

Performance aspects:

  • 1) http mode: access performance depends on the ability of the http server and network transmission speed;
  • 2) java-client mode: id is generated locally, the longer the step length, the larger the qps. If the number section is set large enough, the qps can reach 1000w+.

Availability:

  • 1) When the db is not available, because the server has a cache, it can be used for a period of time;
  • 2) If multiple dbs are configured, as long as 1 db survives, the service is available;
  • 3) When using tiny-client, as long as there is one server alive, the server will theoretically hang up, because the client has a cache, it can continue to be used for a period of time.

6. Detailed explanation of Tinyid's technical principles

6.1 Technical points of ID generation system

In simple systems, we often use db's id auto-increment method to identify and save data. With the complexity of the system and the increase in data, sub-databases and tables have become a common solution, and db auto-increment can no longer meet the requirements.

At this time, the globally unique id generation system comes in handy. Of course, this is only one of the application scenarios of id generation.

So, what capabilities should a mature id generation system have?

  • 1) Uniqueness: no matter how it can be repeated, the global unique id is the most basic requirement;
  • 2) High performance: Basic services are as time-consuming as possible, and it is best if they can be generated locally;
  • 3) High availability: Although it is difficult to achieve 100% availability, it must be infinitely close to 100% availability;
  • 4) Ease of use: it can be used immediately, easy to access, and the system design and implementation should be as simple as possible.

6.2 The realization principle of Tinyid

Let's first take a look at the most common id generation method, db's auto_increment, I believe everyone is very familiar with it.

I have also seen some students use this scheme to obtain an ID in actual combat. The advantage of this scheme is that it is simple, but the disadvantage is that only one id can be obtained from db at a time. The performance is relatively poor, the access to db is frequent, and the pressure of db It will be bigger.

So, can this kind of scheme be optimized? Can I get a batch of IDs from db at once? The answer is of course yes.

A batch of IDs can be regarded as a range of IDs, for example (1000, 2000], this 1000 to 2000 can also be called a "number segment", we apply for a number segment from db at a time, load it into the memory, and then use The ID is generated by self-increment. After this number segment is used up, apply for a new number segment from db again, so that the pressure on db is reduced a lot. At the same time, the id is directly generated in the memory, and the performance is improved a lot.

PS: Explain briefly what is the number segment mode:

The number segment mode is to obtain self-incrementing IDs from the database in batches. Each time a number segment range is retrieved from the database. For example, (1,1000] represents 1000 IDs. Load into memory.

So how to design the table that saves the db number segment? Let's continue reading.

6.3 DB number segment algorithm description

As shown in the above table, we can easily think that db directly stores a range (start_id, end_id). When the batch of IDs is used up, we do an update operation, update start_id=2000(end_id), end_id=3000(end_id+1000) If the update is successful, it means that the next id range has been obtained. Think about it carefully, in fact, start_id does not play any role, the new number segment is always (end_id, end_id+1000].

So here we change it, the db design should be like this:

As shown in the table above:

  • 1) We have added biz_type, which represents the type of business, and the id of different businesses is isolated;
  • 2) max_id is the end_id above, which represents the current largest available id;
  • 3) Step represents the length of the number segment, a reasonable length can be set according to the qps of each business;
  • 4) Version is an optimistic lock, and version is added to each update to ensure the correctness of concurrent updates.

Then we can get a usable number segment through the following steps:

A. Query the current max_id information: select id, biz_type, max_id, step, version from tiny_id_info where biz_type='test';

B. Calculate the new max_id: new_max_id = max_id + step;

C、更新DB中的max_id:update tiny_id_info set max_id=#{new_max_id} , verison=version+1 where id=#{id} and max_id=#{max_id} and version=#{version};

D. If the update is successful, the available number segment is successfully obtained, and the new available number segment is (max_id, new_max_id];

E. If the update fails, the number segment may be acquired by other threads. Go back to step A and try again.

6.4 Simple structure of number segment generation plan

As mentioned above, we have completed the number segment generation logic.

Then our id generation service architecture may look like this:

As shown in the figure above, the id generation system provides http services to the outside. The request passes through our load balancing router to one of the tinyid-servers, and obtains an id from the pre-loaded number segment.

If the number segment has not been loaded, or has been used up, apply for a new available number segment from db. Because of the atomicity of the number segment generation algorithm between multiple servers, it is ensured that the available number segment on each server is not repetitive , So that id generation is not heavy.

can be seen:

  • 1) If the tinyid-server restarts, the number segment will be invalidated, and part of the id will be wasted;
  • 2) At the same time, the id will not be continuous;
  • 3) Each request may hit a different machine, and the id is not monotonically increasing, but an increasing trend (but this is acceptable for most businesses).

6.5 The problem of simple architecture

At this point, a simple id generation system is complete, so is there still a problem?

Recall our initial id generation system requirements: high performance, high availability, and ease of use.

In the above architecture, there are at least the following problems:

  • 1) When the id is used up, you need to access the db to load a new number segment, and there may also be a version conflict in the db update. At this time, the time for id generation is significantly increased;
  • 2) db is a single point. Although db can build a master-slave and other high-availability architecture, it is always a single point;
  • 3) Using http to obtain an ID, there is network overhead, and the performance and availability are not very good.

6.6 Optimization method and final structure

1) Double-number segment cache:

When the number segment is used up and need to access the db, we can easily think of loading the next number segment asynchronously when the number segment is used to a certain extent, to ensure that there is always a usable number segment in the memory, and performance fluctuations can be avoided.

2) Increase multi-db support:

When the db has only one master, if the db is not available (down or the master-slave delay is relatively large), the acquisition number segment is not available. In fact, we can support multiple dbs, such as 2 dbs, A and B, and we can get the number segment randomly from one of them. So if A and B both get the same number segment, how can we ensure that the generated id is not heavy? Tinyid does this, so that A only generates even-numbered ids, and B only generates odd-numbered ids. The corresponding db design adds two fields, as shown below

The delta represents the increment of id each time, and the remainder represents the remainder. For example, you can set both delta of A and B to 2, and the remainder to 0 and 1, respectively. Then, the number segment of A generates only even number segments, and B is the odd number segment. Through the delta and remainder two fields, we can flexibly design the number of db according to the needs of the user, and at the same time provide the user with only the id sequence that produces similar odd numbers.

3) Add tinyid-client:

Use http to obtain an id, there is network overhead, is it possible to generate an id locally?

To this end, we provide tinyid-client, we can send a request to tinyid-server to obtain the available number segment, and then build the double number segment locally, id generation, so id generation becomes a pure local operation, performance is greatly improved, because the local With double-number segment cache, it can tolerate the downtime of tinyid-server for a period of time, and the usability has also been greatly improved.

4) The final architecture of tinyid:

In the end our architecture may look like this:

The following is a more specific code call logic:

As shown in the figure above, the following is an explanation of this code calling logic diagram:

  • 1) NextId and getNextSegmentId are two http interfaces provided by tinyid-server externally;
  • 2) NextId is to get the next id. When nextId is called, bizType will be passed in. The id data of each bizType is isolated, and the generated id will use the IdGenerator generated by the bizType type;
  • 3) getNextSegmentId is to get the next available number segment, tinyid-client will get the available number segment through this interface;
  • 4) IdGenerator is the interface for id generation;
  • 5) IdGeneratorFactory is a factory that produces specific IdGenerator, and each biz_type generates an IdGenerator instance. Through the factory, we can add biz_type to db at any time without restarting the service;
  • 6) IdGeneratorFactory actually has two subclasses: IdGeneratorFactoryServer and IdGeneratorFactoryClient. The difference is that getNextSegmentId is different, one is DbGet and the other is HttpGet;
  • 7) CachedIdGenerator is a specific id generator object, holding currentSegmentId and nextSegmentId objects, responsible for the core process of nextId. The nextId is finally generated by the AtomicLong.andAndGet(delta) method.

Specific code implementation, if you are interested, you can directly read the source code:

7. Tinyid's best practices

1) tinyid-server is recommended to deploy to multiple machines in multiple computer rooms:

The availability of multi-computer room deployment is higher, and the user needs to consider the delay problem for http access.

2) It is recommended to use tinyid-client to obtain id, the benefits are as follows:

a, id is generated locally (call AtomicLong.addAndGet method), performance is greatly increased;

b. The client's access to the server becomes low frequency, which reduces the pressure on the server;

c. Because of the low frequency, even if the client user and the server are not in the same computer room, there is no need to worry about delay;

d. Even if all servers are down, because the client has pre-loaded the number segment, it can still be used for a period of time

Note: Using tinyid-client method, if the client machine restarts frequently, more ID may be wasted, then you can consider using http method.

3) Two or more db configurations are recommended:

When there are multiple db configurations, as long as one db survives, the service can be configured with multiple dbs. If two dbs are configured, relevant data must be written in both dbs each time a new service is added.

8. How to call Tinyid?

On how to call. In view of space reasons, I will not write it in detail. If you are interested, you can read this " Tinyid: Didi Open Source Ten Million Level Concurrent Distributed ID Generator ".

9. Reference materials

[1]  What should I do with distributed ID when I am always asked in the interview? Tinyid threw it to him

[2]  Tinyid: Didi open source tens of millions of concurrent distributed ID generator

[3]  Tinyid Engineering Chinese readme

[4]  How does Didi's open source Tinyid generate billions of IDs every day?

Appendix: More popular technical articles on IM development

"One entry is enough for novices: develop mobile IM from scratch "

" Mobile IM developers must read (1): easy to understand, understand the "weak" and "slow" of mobile networks "

"A Must-Read for Mobile IM Developers (2): Summary of the Most Complete Mobile Weak Network Optimization Method in History "

" From the perspective of the client to talk about the message reliability and delivery mechanism of the mobile terminal IM "

" Summary of optimization methods for short connection of modern mobile network: request speed, weak network adaptation, security assurance "

" How to ensure the efficiency and real-time performance of large-scale group message push in mobile IM?

" Technical issues that need to be faced in mobile IM development "

" Is it better to use byte stream or character stream for the development of IM?

" Does anyone know the mainstream implementation of voice message chat?

" Implementation of IM Message Delivery Guarantee Mechanism (1): Guarantee the reliable delivery of online real-time messages "

" Implementation of IM Message Delivery Guarantee Mechanism (2): Guaranteeing the Reliable Delivery of Offline Messages "

" How to ensure the "sequence" and "consistency" of IM real-time messages?

" A low-cost method to ensure the timing of IM messages "

" Should I use "push" or "pull" for online status synchronization in IM single chat and group chat?

" IM group chat messages are so complicated, how to ensure that they are not lost or repetitive?

" Talk about the optimization of login request in the development of mobile terminal IM "

" How to save data by pulling data during IM login on the mobile terminal?

" On the principle of multi-sign-in and message roaming on mobile IM "

" How to design a "failure retry" mechanism for a completely self-developed IM?

" Is it so difficult to develop IM yourself? Teach you how to make an Andriod version of simple IM (with source code)

" IM Development Fundamentals Supplementary Lesson (6): Does the database use NoSQL or SQL? Enough to read this!

" Suitable for novices: develop an IM server from scratch (based on Netty, with complete source code) "

" Pick up the keyboard and do it: work with me to develop a distributed IM system by hand "

" Suitable for novices: teach you to use Go to quickly build a high-performance and scalable IM system (source code) "

" What is the realization principle of "Nearby" function in IM? How to implement it efficiently?

" IM Development Fundamentals Supplementary Lesson (7): Principles and Design Ideas of the Mainstream Mobile Terminal Account Login Method "

"I need to scan and log in with a mobile phone for IM? Let's take a look at the technical principle of WeChat's scan code login function "

" IM Development Collection: The most complete in history, a summary of various function parameters and logic rules of WeChat "

" IM development and dry goods sharing: how do I solve the problem of client freezes caused by a large number of offline messages "

" IM development and dry goods sharing: how to elegantly realize the reliable delivery of a large number of offline messages "

" IM development dry goods sharing: Youzan mobile terminal IM componentized SDK architecture design practice "

>>  More similar articles......

This article has been simultaneously published on the "Instant Messaging Technology Circle" official account, welcome to pay attention:

▲ The link to this article on the official account is: click here to enter , the original link is: http://www.52im.net/thread-3129-1-1.html

Guess you like

Origin blog.csdn.net/hellojackjiang2011/article/details/108478859