The technology and thinking behind customer service sending a message

I. Introduction

In an enterprise customer service scenario, behind the customer service sending a message, it is necessary to consider technical support in many aspects such as network communication, front-end display, back-end storage, and security. From the front-end level alone, it is necessary to consider the display of the message, In scenarios such as status updates, stable transmission, and extreme operation messages without lag, with the continuous updating and iteration of the IM system, it has realized the construction of a one-stop full-scenario workbench from external procurement to self-research. We can obviously I feel that customer service requirements for IM experience are getting higher and higher, so the technology and thinking behind customer service sending a message are becoming more and more important. This article will explore the technology and thinking behind customer service sending a message, and help everyone understand how to provide efficient, safe, reliable and good user experience in IM chat scenarios.

2. The Importance of IM Chat Messages

IM chat messages are one of the fastest, most intuitive and efficient two-way communication methods between customer service and users. The importance of IM chat is reflected in the following aspects:

  • Immediate response: promptly answer user inquiries, serve users more quickly, and improve user satisfaction.
  • Personalized interaction: Personalized responses can be quickly made based on user needs to better meet user needs.
  • Data processing and analysis: Through the processing and analysis of IM chat messages, we can gain insight into user needs and user behavior, and help improve service quality.

In summary, the importance of IM chat messages lies in improving user satisfaction and improving customer service efficiency. This also means that the reliability, efficiency, and security of IM messages are particularly important. Next, this article will examine the technology behind sending a message to customer service from a front-end perspective. and thoughts in detail.

3. Development history of customer service IM messages

The following is the development history of customer service IM messaging, listing the milestones of core technology projects.

In this process, we have accumulated certain experience and skills, and also encountered various problems and challenges. For example: problems such as message loss, message sending failure, message duplication, message disorder, etc. We have also solved these problems one by one through technical specialization and achieved the expected results. We believe that with the continuous development of technology, and innovation, we can better provide more efficient and convenient services.

4. Details of technology and thinking

From the user/customer service perspective, isn’t sending a message just a matter of clicking the Enter key or clicking the Send button after typing the message? It seems very simple, but the process from starting to enter the message to the other party receiving the message is actually very powerful. Technology is supported by high efficiency and stability. Our customer service IM message link will involve three core ports, the sender, IM gateway and receiver. The following is a brief description of the technical points involved in the process of sending a message to the IM gateway from the customer service side. On the contrary, sending a message from the user side is similar.

From the above flow chart, we can see that the journey of a message is still very rich. Of course, some details have not been fully listed, such as: IM gateway's timeout re-push mechanism, front-end exception handling (network exception, timeout exception, Retrying without success, etc.). We can clearly see that when the customer service starts to input the message, it starts to notify the other party to input normally. After triggering the message sending, it needs to create the message body, sort, de-duplicate detection, network detection, chat list rendering, and push timeout reset. Try the queue and put it into the message interceptor to uniformly convert the message format and send it. Up to this point, you have just completed the sending work at the front-end level. At this time, it is still unknown whether the message is sent successfully. You also need to monitor the message sending result. If If no response is received within a certain period of time, the message will be resent for a second time. Until the message is sent successfully or the maximum number of retries is reached, the life cycle of the message ends. Once the response result of the message is received, the status of the message will be updated (the message has been sorted at this time, and no secondary sorting is required). At this point, the first step of processing is completed, and there will also be a message from the IM gateway to the client. Similar processing.

Looking at the entire message sending and receiving link, any problem in any link will lead to problems in message sending, which requires very stable and reliable technical means to ensure it. This is mainly explained from the following aspects.

Reliable delivery of messages

The reliable delivery of messages ensures the consistency of information on both sides of the message. This is why we put reliable message delivery first to explain. Let’s imagine a scenario where messages are often lost and customer service provides frequent feedback. R&D resources must be invested every time to troubleshoot the problem. This is still secondary. The loss of messages may lead to a sharp decline in user experience, which is not worth the gain. . Reliable delivery of all messages is very necessary and necessary. So what is reliable delivery? At least 3 aspects must be met:

1.1 Real-time nature of messages

The most important aspect of our use of IM is that we hope that the other party can receive the messages we send in real time and be able to reply, which is particularly important for improving user experience. If we don’t care about real-time performance, we can use other methods, such as email, writing letters or even flying pigeons to send letters...

When a message is sent to the IM gateway, the gateway generally needs to go through the following five steps:

  • Verification message: sensitive word verification, risk control submission for review (synchronous review)
  • Message storage: sorting, deduplication verification, etc.
  • Reply an ACK response (success, failure) to the sending party
  • Send the message to the receiver. If there is a multi-end login scenario, you also need to ensure that the message is synchronized across multiple ends.
  • Timeout retry, processing ACK returned by the receiver, etc.

In terms of the real-time nature of messages, there is no absolute real-time and can only be optimized as much as possible. The core processing logic is in the IM gateway. Whether it is the front end or the client, the processing process is very fast, at the millisecond level. Our IM gateway is developed in Go language, and its concurrent processing capability is also very high, so the time-consuming of the entire closed link is still very low.

1.2 Reliability of messages

As we all know, TCP itself is reliable, but it can only ensure the reliability of the transport layer, and the reliability between application layers cannot be guaranteed. We will publish targeted special articles in the future, so we will not go into details this time.

So how do we ensure the reliability between applications? The guarantee of reliability is to let the sender know that the receiver has received the message, which means that the message was successfully delivered. Let’s look back at the message loss scenario described above. The problem of message loss is also a headache we encountered during the development of IM messages. The technical resources required to troubleshoot a problem are very huge and need to be involved. H5, IM gateway, server and client have a very poor experience for users and customer service. In a very simple scenario, the user sent a message, but the customer service did not receive it and did not reply to the user. The user thought that the customer service did not reply on purpose, which would affect the user's satisfaction.

So how to solve this problem? You can take a look at the road to self-development of Dewu Customer Service IM Message Communication SDK , which has been explained. The core is to refer to the ACK mechanism of the TCP protocol to implement a set of ACK protocols based on the business layer. What needs special attention here is that for batch messages (customer service refreshing sessions, new sessions coming in, etc.), we use a batch ACK mechanism. If ACK is replied to every message, the cost will be relatively high. At the beginning, we used an IM architecture upgrade technology to coordinate with all terminals to complete the overall IM message reach to achieve zero loss, ensuring reach, and meeting At least once (100% reach rate after passing data embedding point verification). After going online, the scene met the expected results, and the corresponding troubleshooting investment was reduced by at least 70%+.

1.3 Orderliness of messages

In the process of developing IM, there is a very common scenario where the user asks question A and then asks question B. On the customer service side, question B is ranked before question A, causing confusion in the customer service's responses. Of course, this is just a scenario where IM messages are out of order. There are many more like this. There are many reasons why messages are out of order. For example, sending a file and then sending a message immediately. The front-end needs to upload the file to OSS to obtain the URL before sending it to the user. In the process of uploading the file, both the user and the customer service can send the message. This is If this scenario is not handled well, it is very easy for messages to be out of order.

If you don’t do IM, you really wouldn’t have thought how efficient customer service operations would be. When dealing with message out-of-order issues, I encountered customer service sending 2 messages in a row with an interval of only 300 milliseconds. This is a high-frequency and intensive operation scenario. In the customer service work scenario, it is continuous.

It seems to be an out-of-order problem. This problem will not be completely solved without clear consideration of user groups, extreme scenarios, critical values, etc.

Going back to our customer service IM, how do we handle message sorting? The entire development process is also quite tortuous. In the end, the Seq maintained by the IM gateway is used as the standard, and then returned to the sender. The sender sorts the messages according to the message sequence numbers to ensure that the sender and receiver messages are sorted consistently. The front-end processing process is as follows:

1.4 Idempotence of messages

Speaking of the idempotence of messages, we have to think about a question, why do we receive multiple (>1) identical messages? It must be caused by repeated sending by the sender. In what scenario would it be repeated? I just talked about the ACK mechanism of the application layer. If no ACK is received from the other party, it will continue to be sent repeatedly after the timeout period reaches the maximum number of retries. It will be easier to understand by referring to the screenshot below. It is just a simulated message retry. The execution frequency in real scenarios will definitely take longer than this.

Since the reliability of the message must be ensured, the duplication of messages is unavoidable. There may be message idempotence problems. So how to solve it? We use the Message ID of the message to deduplicate, which involves a performance issue. Sorting, deduplication, and risk control information verification all require a certain computational cost. How to ensure that the processing system does not get stuck is a core issue. If you want to know how our customer service IM is done, please continue reading below.

Causing optimization strategy for message processing

Let’s think about why lag occurs? What kind of scenarios can be considered stuck? We generally say that it is caused by the inability to complete rendering within 16ms. So why does it need to be completed within 16ms? Here we need to learn about refresh rate (RefreshRate) and frame rate (FrameRate).

  • The refresh rate refers to the number of times the screen is refreshed per second, and is specific to hardware. The browser refresh rate is 60Hz (the screen refreshes 60 times per second).
  • Frame rate is the number of frames drawn per second and is specific to software. Usually as long as the frame rate is consistent with the refresh rate, the picture we see is smooth. Therefore, when the frame rate is 60FPS, we will not feel stuck.

If the frame rate is 60 frames per second and the screen refresh rate is 30Hz, then the upper half of the screen will remain at the previous frame, and the lower half of the screen will render the next frame. This This situation is called screen tearing. On the contrary, if the frame rate is 30 frames per second and the screen refresh rate is 60Hz, then two consecutive frames will display the same picture, which will cause lag. Therefore, it is meaningless to increase the frame rate or refresh rate unilaterally. Both need to be improved at the same time. Browsers all use a refresh rate of 60Hz. In order to achieve a frame rate of 60FPS, it is required to complete the drawing of one frame within 16.67ms (1000ms/60Frame = 16.666ms / Frame).

Stuttering in IM message processing is very common. Up to a certain level, it is a problem that is difficult to avoid. Compared with us who often use computers, open multiple browser tabs, and do not shut down and restart for a long time, it will also happen. I feel stuck, but there are still many ways to optimize IM message processing, which mainly involve the following optimization strategies:

2.1 Asynchronous processing

As we all know, JS is single-threaded, so the asynchronous processing mechanism can be used to push low-priority tasks into the asynchronous task queue and give up the main thread to high-priority tasks. For example, if the chat page that customer service needs to display immediately after typing the message is not displayed for a short period of time, it will be considered that the system is stuck, so the priority of sending messages is higher than receiving messages. We have distinguished the task priorities of each scenario, and low-priority tasks are processed asynchronously.

2.2 Partial loading

This mainly focuses on the chat message list. For session processing of a large number of messages, only rendering messages in the visible area reduces the burden on the browser and improves response speed. There are many options for list optimization. as follows:

Solution 1: Use the timer setTimeout to achieve batch rendering. We generally do not recommend this method, because operating the DOM in setTimeout must wait until the next time the screen is drawn before it can be updated to the screen. If the two steps are inconsistent, This may cause the operation of an intermediate frame to be skipped over and directly update the elements of the next frame, resulting in frame loss.

Option 2: Use requestAnimationFrame. In comparison, the advantages of requestAnimationFrame are still very obvious, mainly reflected in the following aspects:

  • requestAnimationFrame will concentrate all DOM operations in each frame and complete them in one redraw or reflow, and the time interval of redraw or reflow closely follows the refresh frequency of the browser.
  • In hidden or invisible elements, requestAnimationFrame will not redraw or reflow, which of course means less CPU, GPU and memory usage.
  • requestAnimationFrame is an API provided by the browser specifically for animation. The browser will automatically optimize the method call during runtime, and if the page is not active, the animation will automatically pause, effectively saving CPU overhead.
  • Compared with setTimeout, the biggest advantage of requestAnimationFrame is that the system determines the execution timing of the callback function.
  • The pace of requestAnimationFrame follows the refresh pace of the system. It can ensure that the callback function is only executed once during each refresh interval of the screen, so that it will not cause frame loss.

Option 3: Use IntersectionObserver. The IntersectionObserver interface (affiliated to the Intersection Observer API) provides developers with a means to asynchronously monitor the intersection status of the target element and its ancestors or viewports. The ancestor element and viewport are called roots.

As you can see, crossing means that the current element is in the window and is currently visible. It is a good solution to replace monitoring scroll loading.

Of course, there are other solutions, and you still need to choose the appropriate solution based on the actual business scenario. The difficulty in segmented loading of IM messages lies in the variable height of the messages (multiple different types of messages), and the calculation cost is still somewhat expensive. Therefore, optimization still needs to verify the critical value. Sometimes optimization may not be effective.

2.3 Message traversal

Above we talked about message sorting, deduplication, message status updating, etc. If there are a large number of chat messages in multiple sessions, lags are bound to occur if they are not handled properly. You can first take a look at the processing process before we optimized it, using the The third-party SDK has a bunch of for loops. If the message volume is large, it will basically get stuck and become unresponsive.

So how do we deal with this problem? Rewrite the third-party SDK based on the existing business scenario and maintain the session as an independent instance. The core algorithm uses the dichotomy method. Interested students can read this previous article  about the road to self-development of IM messaging SDK for customer service , which is described in more detail. After rewriting the IM SDK, customer service no longer reported any chat-related lags, and the first call of the chat increased by 20%. The results are quite significant.

Message security considerations

In the IM system, message security is very important. Developers need to have strong security awareness and integrate security into the development process to enhance the security and robustness of the system. We have done a lot of things in terms of message security, so we won’t go into detail here.

Message sending and receiving delays

The delay in sending and receiving messages directly affects the user experience and communication efficiency. We have analyzed the journey of a message above, and the reasons for the delay are relatively easy to analyze. There are mainly four points:

  • Network delay: The sending and receiving of IM messages are transmitted over the network through long links, and a certain delay will occur during the network transmission process. If network latency is high, messages will be sent and received slowly.
  • System load: When customer service is in a one-to-many situation, multiple users are online at the same time, and the system needs to process a large number of messages and requests, resulting in a slow system response, which will affect the customer service experience.
  • Front-end delay: It needs to be processed by local message queue, cache, etc., which may cause message delay.
  • Message encoding and decoding: Some messages require encoding and decoding of data, which also consumes a certain amount of time, resulting in delays.

Now that we can analyze the cause, we can prescribe the right medicine and reduce the delay of sending and receiving through some optimization strategies. We currently plan to optimize from the following two aspects:

  • Front-end: The delay is mainly in the processing and encoding of messages. Currently, the data format of our IM messages is JSON, and there is a process of serialization and deserialization. Here we will use ProtoBuf to replace JSON. We have completed relevant technical research and Test verification. Let’s take a brief look at the comparison of ProtoBuf (Protocol Buffers) and JSON processing time:

    Encoding time: The encoding time of ProtoBuf is much faster than JSON because the encoding of ProtoBuf is binary and does not require encoding conversion and redundant type conversion. JSON is relatively slow to encode.

    Decoding time: Compared with encoding, ProtoBuf’s decoding efficiency is slightly lower. However, since the advantages of ProtoBuf are more obvious when the data volume is large and the structure is complex, the efficiency difference between the two may not be obvious when decoding small data.

  • Network delay: It is difficult to control network delay, but it can be optimized by reducing the message transmission volume. We just talked about Protobuf replacing JSON. Protobuf is a binary format, which is more compact than JSON format and can greatly reduce the size of data packets. It can reduce bandwidth usage and traffic costs during network transmission. In the IM system, due to the large number of users and frequent message sending, data occupation and network bandwidth are a huge problem. Using ProtoBuf can significantly reduce network bandwidth consumption and improve system performance. Another aspect is message compression, but the compression depth and compression algorithm need to be carefully selected and verified.

Therefore, using ProtoBuf format instead of JSON format can basically solve most of the delay problems, which is also a direction for the next IM optimization.

Agent experience and interaction considerations

When it comes to agent experience and interaction, we have accumulated a lot of experience. Not only IM, experience and interaction are a topic that all products cannot avoid. Since we started IM, experience has been the driving force for us to keep moving forward. , Caton is a topic that has always been around my ears. The lag that customer service understands is a bit different from the lag that we normally understand. In the early stage, we also thought that the system was stuck and unusable. It was similar to a frame drop scenario, but it was not actually the case. The interface request was slow and there were errors. Tips Prompts, short-term blank display when switching pages, messages not being immediately displayed on the chat page after entering a message, loading prompts for image uploads, etc., will all be classified as lags. In response to these aspects, we are constantly conducting workplace research, data analysis, and optimization, and customer service satisfaction has increased to 18%. Maybe in everyone's opinion, an 18% increase after so long is not a good figure, but for the customer service domain, an 18% increase is also a relatively difficult figure to overcome. The main reasons lie in two aspects: the first aspect is that many customer service staff have joined the job within 3 months, and some of the functional optimization comparisons we have made are incomprehensible or lack functional usage comparisons; the second aspect is that many front-line staff have The customer service comes from the customer service teams of first-tier manufacturers. In fact, if you think about it the other way around, this is also a positive drive. At least we can collect new feedback every time we conduct surveys, and we can see the experience gap with more mature and excellent products.

Experience is not achieved overnight. Don’t think about getting it right all at once. An excellent user experience and interaction design need to always be combined with user needs and feedback, and constantly improved and perfected. During the actual design and development process, continuous testing and optimization are required to ensure the quality and acceptability of the system. At the same time, we need to actively communicate and provide feedback with users in order to better understand user needs and opinions. We have not done this well before, especially in the promotion of new versions. The ease of use of the system has not reached the customer service level. Expectations are also an aspect that we need to continue to improve in the future.

The experience is based on the needs of the vast majority of users. You cannot sacrifice the experience of other users just for a small number of users. In particular, you cannot make too many changes or sacrifice the interests of other users because of the feedback of one user. . Uncompromising in the experience optimization process is also a very important strategy. During the experience optimization process, you must remain rational and objective, and make reasonable trade-offs and decisions based on user research and data analysis to achieve the best user experience.

The optimization of some small details can also achieve twice the result with half the effort. In the IM system, the optimization of some details includes: timely message prompts, clear message display, accurate message sending time, etc. The optimization of these small details can directly improve customer service efficiency and experience, thereby improving customer service satisfaction. We will continue to optimize the IM experience. Where there is a will, there is a way.

5. Follow-up planning

The above-mentioned technical and thinking details include the reliable delivery of messages, lag optimization, security, efficiency and experience, etc. In the next period of time, we will still focus on these aspects to continue to optimize and improve IM. Relevant abilities. Mainly consider the following aspects of planning:

  • Experience optimization: Experience is what we have to do as always. We will continue to explore optimization points at the visual and interactive levels, starting with details, such as color matching, button selection, etc., to provide a good seat experience.
  • ProtoBuf replaces JSON: reducing message encoding time, improving decoding efficiency, reducing data packet size, reducing network bandwidth consumption, and improving system performance.
  • Message compression: Especially for historical messages and batch messages, the use of compression technology can effectively reduce the size of data packets.
  • Function expansion: Continue to improve robot message types, especially for pre-sales shopping guides and agent assistance. Gradually support functions such as message references and tags.
  • Multi-language capability support: Although it has not yet accessed international services, it still needs to have the ability to rapidly expand at the design level.

In the above aspects, we will give priority to important and urgent technical transformation. We will not blindly innovate and optimize. We will still focus on business and focus on business and agent experience.

6. Summary

Sending a message by customer service may seem simple in an IM application, but there are many technical details that need to be considered. First of all, this needs to take into account the message delivery mechanism and reliability. Even a simple message needs to go through a series of encryption, encoding, transmission, security compliance, etc. before it can be successfully received.

The most important thing is to consider the issue of real-time data and operations in various extreme scenarios. Messages sent by customer service need to be displayed on the chat page and transmitted to users in a timely manner. Customer service students working in one-to-many scenarios need to ensure that There will be no inconsistency (loss, duplication) of messages in each session, as well as problems such as message interception and abnormal situations.

Therefore, customer service not only needs technical capabilities and data processing capabilities to send a message, but also needs to think about issues such as agent experience and real-time data. During the development process, various issues need to be handled in detail and continuously optimized to provide customer service with a stable, smooth, safe and friendly IM application.

Reference article:

Dewu Customer Service IM Messaging Communication SDK Self-Development Road

*Text/WWQ

This article is original to Dewu Technology. For more exciting articles, please see: Dewu Technology official website

Reprinting without the permission of Dewu Technology is strictly prohibited, otherwise legal liability will be pursued according to law!

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/10140300