IM system function-unread message (and solution)

We saw that an app had an "unread message reminder", but there was no event after clicking on it. This situation is really unacceptable for "obsessive-compulsive patients"; or there was a new message, but the unread message was wrong, resulting in no Remind the user that this situation may cause the user to miss some important messages and seriously reduce the user's experience. Therefore, from here we can see the importance of "message unread" in the entire path of news reaching users.

The reason for the inconsistency between the message and unread

So in the instant messaging scenario, what are the circumstances that cause the "inconsistency" between the message and the unreading? To clarify this problem, we must first understand two concepts involving unreading: "total unread" and "conversation unread". Let's take a look at the following two concepts separately.

1. Conversation unread: The number of unread messages of the current user and a chat party. For example, user A receives 2 messages from user B. At this time, for user A, the unread conversation between him and user B is "2". When user A opens the chat page with user B to view the two messages At this time, for user A, the unread conversation between him and user B becomes 0. The same logic applies to group chats or live broadcast rooms. The unread peer of the conversation is just a group or a room.

2. Total unread: The number of all unread messages of the current user. This is not difficult to understand. The total unread is actually the sum of all unread conversations. For example, user A receives 3 messages from user C in addition to 2 messages from user B. Then, for user A, the total unread is "5". If the user views the 2 messages sent to him by user B, then the total unread of user A becomes "3".

Separate maintenance of session unread and total unread

In theory it is possible. However, in many "unreading" implementations of instant messages, session unreading and total unreading are generally maintained separately. The reason is that "total unread" is frequently used in many business scenarios. For example, each message push needs to use the total unread tape for the corner mark unread display.

In addition, some apps will synchronize the total unread of the client and the server through regular polling. For example, the total unread of the message bar of Weibo includes not only the number of messages related to instant messages, but also the unread of some other business notifications. , So it is not very accurate to calculate the total unread by the accumulation of the message push after arrival, but a different method is used to synchronize the total unread by polling.

For the frequently used "total unread", if it is obtained by aggregating all the unread sessions each time, and there are not many interactive sessions of the user, the performance can be guaranteed; once the number of sessions is large, it needs to be obtained from the storage multiple times , It is easy to happen that some sessions are not read due to timeout and other reasons, resulting in less total unread calculation.

Moreover, the operation of obtaining accumulation multiple times is prone to bottleneck in performance. Therefore, for the above considerations, total unreading and session unreading are generally maintained separately.

Consistency problem of unread

Maintaining total unread and session unread separately can solve the performance problem of total unread being accessed by "high frequency", but it also brings a new problem: unread consistency.

Unreading consistency means that the total unreading maintained and the sum of the unreading session should be consistent . If the two unread messages can’t be consistent, it will appear "A new message has been received, but there are no unread reminders in the corner label and the message bar in the App", or "There is an unread reminder, click it to find out which conversation has new Message".

Both of these abnormal situations are something we don't want to see. So how did these abnormal situations occur? Let’s take a look at the case, let’s take a look at the first one:

1. User A sends a message to user B. The initial unread status of user B is: the unread conversation with user A is 0, and the total unread is also 0.

2. After the message arrives in the IM service, perform the add unread operation: first add 1 to the unread conversation of user B and user A, and then add 1 to the total unread of user B.

3. Assume that the first step of adding unread operation succeeded, and the second step failed. Finally, the IM service pushes the message to user B. At this time, the unread status of user B is: the unread status of the conversation with user A is 1, and the total unread status is 0.

4. In this way, the consequence of the failure of the execution of the second step of adding unread is: user B does not know that a new message has been received, and may miss viewing this message.

5. Then the case is due to an exception occurred during the second step of adding unread "total unread", resulting in inconsistent unread and message.

So, is there no problem as long as the unread operation is executed normally? Next, let's look at the second case.

1. User A sends a message to user B. The initial unread status of user B is: the unread conversation with user A is 0, and the total unread is also 0.

2. After the message arrives in the IM service, perform the add unread operation: first perform the first step of adding unread, add 1 to the unread conversation of user B and user A.

3. At this time, the server performing the unread operation slowed down due to some reasons. At this time, user B clicked to view the chat session with user A on the App, which triggered the unread operation.

4. Perform the first step of clearing unreads to clear the session unreads of user B and user A to 0, and then continue to perform the second step of clearing unreads to clear user B's total unreads.

5. After all the unread operations are executed, the server that performs the add unread operation resumes the second step of adding unread, adding 1 to the total unread of user B, then two unread operations appear at this time Inconsistent reading.

6. The consequence is: after user B exits the session, he sees an unread message, but he clicks in but cannot find which chat session has unread messages.

Here, let me analyze the reasons for these two inconsistencies: in fact, it is because the two unread changes are not atomic, and one succeeds and the other fails, and the operation is overwritten due to concurrent updates. Case. So to solve these problems, you need to ensure the atomicity of the two unread update operations.

solution:

Ensure the atomicity of unread updates

So, how to ensure two unread "atomic updates" in a distributed scenario? A more common solution is to use a distributed lock to solve the problem. Before each modification, lock it first, and then unlock it after the modification.

1. Distributed lock

There are many implementations of distributed locks, for example, relying on the uniqueness and constraints of the DB to determine whether a fixed record is inserted successfully or not to determine the lock acquisition. It can also be implemented through some distributed caches, such as MC's add, such as Redis' setNX.

However, it should be noted that distributed locks also have their own problems. Due to the need to add a new set of resource access logic, the introduction of locks will reduce throughput; at the same time, the management of locks and exception handling are prone to bugs, such as single-point problems that require resources, and how to ensure that the lock can ultimately be guaranteed in case of downtime. freed.

2. Resources to support transaction functions

In addition to distributed locks, some resources that support transaction functions can also be used to ensure the atomicity of two unread updates. The transaction provides a mechanism of "package multiple commands and execute them in sequence at one time", and the transaction will not be actively interrupted during the execution. The server will continue to process other commands after executing all the commands in the transaction. Other commands of the client. For example, Redis supports transaction operations through four commands: MULTI, DISCARD, EXEC and WATCH. For example, each time the key to be modified is watched before the unread is changed, and then the transaction executes the operations of changing the session unread and changing the total unread. If the values ​​of the two unread keys watched have been modified when the transaction is finally executed , Then this transaction will fail, and the business layer can continue to retry until the transaction change is successful. Relying on Redis, a resource that supports transaction functions, if the unread itself exists in this resource, it is relatively simple to achieve two unread "atomic changes". However, this solution still has certain problems in performance. Because the watch operation is actually an optimistic locking strategy, for scenarios where unread changes are frequent (for example, in a very popular group, everyone speaks frequently), it may need to be repeated multiple times. The test can finally be executed successfully. In this case, the execution efficiency is low and the performance will be relatively poor.

3. Atomized embedded script

In fact, many resource features support "atomized embedded scripts" to meet the business's need for high consistency in changes to multiple records. Redis to support embedded Lua foot would have been atomized execute multiple statements, use of this feature, we can achieve a total change of atomic unread and unread conversation in Lua script, but also to achieve some of the more complex unread changes logic. For example, we don’t want some unreads to exist all the time and interfere with users. If the user does not check and clear the unreads for 7 days, the unreads can expire and become invalid. This kind of business logic is more convenient to use Lua scripts to implement "judgment while reading Expire and clear". Atomic embedded scripts can not only provide atomic guarantees based on the realization of complex business logic, but also have better execution performance than the previous distributed lock and watch transaction solutions. However, it should be noted here that because Redis itself is a server-side single-threaded model, try not to have remote access and other time-consuming operations in Lua scripts, so as to avoid hanging for a long time, causing the entire resource to be unavailable.

to sum up:

In this lesson, we first understand the importance of unreading in instant messaging scenarios, and then analyze the reasons for the inconsistency of unreading and messages. The main reasons are: "total unreading" and "session unreading" in most business scenarios It needs to be able to be maintained independently, but two unread changes have inconsistent success rates and cover each other in concurrent scenarios.

1. Distributed locks have good universality, but the execution efficiency is poor, and the management of locks is also more complicated, which is suitable for small-scale instant messaging scenarios;

2. Resources that support transaction functions do not require additional resources to maintain locks and are relatively simple to implement, but the watch mechanism based on optimistic locks has a higher failure rate in higher concurrency scenarios, and execution efficiency is more likely to bottleneck;

3. Atomic embedded scripts do not require additional resources to maintain locks, and performance is better in high concurrency scenarios. The development of embedded scripts requires some additional learning costs.

Guess you like

Origin blog.csdn.net/madongyu1259892936/article/details/106102276