Evolution and upgrades hornet's nest IM system architecture

Today, more and more users are attracted hornet continued accumulation of notes, Raiders, buzzing and other high-quality content to share, here stimulated the enthusiasm to travel, but also stimulating the growth of hornet's nest transactions. Helping users make travel decisions, to complete the transaction process, IM systems play an important role.

IM system for users and businesses to establish a direct communication channel to help users answer questions purchase travel products, both contributed to the order transactions, but also help users to dispel concerns, contribute to the realization of the aspirations of the user is traveling. With the rapid development of business, few years, hornet IM system has gone through several important architectural evolution and transformation.

IM 1.0 - early stages

To support rapid business early on the line, and then low flow version, concurrent less demanding, technical architecture major IM systems in a simple and usable for the purpose of realization of the function is also very basic.

IM 1.0 using PHP development, to achieve the basic user IM / customer service access, messaging, list management consulting function. When the user consultation, will be assigned to customer service through a policy of equal distribution, recording association of users and customer service. User / customer to send a message by calling the message forwarding module, the message is delivered to the other side of the Redis blocking queue. The received message through the HTTP long polling module connected call message, when the message returns immediately, no return message is blocked for some time, the purpose here is to reduce blocking polling interval. The messaging model as shown below:

Optimization of polling module message

FIG upper model long connection request message by polling module is mounted on php-fpm blocking queue When the request becomes large, if not release php-fpm process, will consume more server performance, high load .

To solve this problem, we optimized message polling module, based on the selected frame OpenResty by Lua coroutine way long optimization problem php-fmp mounted. Lua coroutine will be forwarded by Nginx intercept request flag determines whether a network request, if intercepted, it will be blocking operation Lua coroutine to be handled promptly release php-fmp, alleviate consumption of server performance. Optimization of the process flow shown below:

IM 2.0 - demand customization phase

With the rapid growth of the business, IM system is facing a substantial increase in demand for customization in the short term, the development of many new business modules. The face of a large number of user consultation, customer service capabilities have been overwhelmed. Thus, IM 2.0 to focus on the business function to enhance the experience, such as when the user's consultation process, the former single distribution evolved using the average, weighted queuing and other means; in order to enhance the efficiency of the customer, customer advisory optional reply also increased, for example, auto-reply, the FAQ like.

When consulting at a typical user scenario as an example, when the user opens the page App Alternatively, establishes long connected by a connecting layer, after consulting advice to initiate inlet, carries the message thread initialization message link, establishing a reusable, can the message line retrieval; when sending messages via message service will store messages in the DB, and retrieves whether the current consultation is assigned to the customer based on the message line, the purpose of call distribution service is currently consulting to improve customer service information; and finally customer service information updated to link relations.

Thus, a complete message has been constructed links, a message after the user / customer sent to the other party by forwarding service transmission, the processing flow as shown below:

IM 3.0 - Service splitting stage

The volume of business continues to accumulate, with the increase of a module, IM system expanded rapidly. Since there is no uniform code specification, the interface is not a single responsibility, coupling between modules and so many kinds of reasons, demand a change is likely to affect the other modules, the development of new requirements and maintenance costs are high.

To address this situation, IM system architecture must be upgraded first task is to split services. Currently, the IM system through the entire split into four major services, including customer service, user services, IM services, data services, as shown below:

  • Customer Service : Provides a variety of ways focused on enhancing customer service efficiency and user experience, such as providing group management, member management, quality services to enhance customer service team operational and management level; to enable users by assigning reception service, transfer service efficiency is more flexible and efficient; support auto-reply, FAQ, knowledge base and other services to improve the efficiency of customer service consulting reply and so on.

  • User services : analysis of user behavior, interests recommendation for the user to do portraits and users, and user satisfaction statistics on the hornet's nest business customer service.

  • The I M services : support single-mode chat and group chat, providing real-time message notification, offline messages push, roaming history messages, contact lists, upload and store files, message content risk control detection.

  • Data Services : by collecting source inlet user advice, whether consulting orders, if there is customer service desk, user consultation time information and customer service response, etc., the definition of data indicators, through data analysis offline data operations, and ultimately provide external data statistics. The main index information 30 seconds, 1 minute response rate, the number of counseling, non-response times, average response time, sales consulting, consulting conversion rate, conversion rate recommendation, sharing hospitality pressure on duty, score and other services.

User State transfer

Conventional IM system, the user consulting a complete user status flow as shown below:

Consult the user clicks a button to trigger the event, then the user enters the initial state of the state. After sending the message, the system changes the status to be assigned by the user, distribution service by calling customer service assigned a corresponding user status changes to distribution, unresolved. When the customer to solve the user or customer service response user does not speak for a long time, the operation of the trigger system automatically resolved, then the user change the status resolved, a consultation process is completed.

Reconstruction of IM services

In the process of splitting of the service, we need to consider the versatility, availability, and specific strategies to downgrade service, as well as the need to reduce dependencies between services as much as possible, avoid the single service is unavailable lead to paralysis of the overall service risk. During this period, the company's other lines of business needs for IM services more and more, using the frequency and magnitude started to increase. IM service initial stage when the connector is greater, the code can only be achieved by modifying the horizontal expansion; new service access, configure Openresty environment and Lua coroutine code on the service server, the service access is very inconvenient, IM services GM is also poor.

View of the above, we conducted a comprehensive reconstruction IM service, the goal is to extract IM services into separate modules, do not rely on other business, external to provide a unified and integrated way call. Considering the IM service requirements for high concurrency and low loss, choose the Go language to develop this module, the new IM service design as shown below:

Among them, the more important and Exchange Proxy layer layer provides the following services:

1. The  routing rule , for example, ip-hash, polling, the minimum number of connections and the like, by the client rules hashed to different ChannelManager instances.

2.  Management of client access , the connection information will be synchronized to DispatchTable access module, to facilitate retrieval Dispatcher.

3. The communication protocol between ChannelManager client , including the client requests to establish a connection, reconnection, positive off, heartbeat, notification, messaging, QoS information, and the like.

4.  provide external single, mass REST interface message . It should be decided according to the scene whether, for example, a user will need to contact customer service through the scene at this interface message, mainly in the following three points:

  • Will create a message line, message distribution stewards of logic, this logic is now implemented in PHP, IM service needs to know the results of the implementation of PHP, one way is to use the Go reimplemented, another way is to return by calling PHP REST Interface , it will bring too much PHP IM services and business network interactions affect performance.

  • When forwarding messages between multiple instances ChannelManager need communicate with each other, for example, on the user A to customer B ChannelManager1 message on ChannelManager2, if no communication mechanism between instances, the message can not be forwarded. When again extended ChannelManager instance, new requirements and other examples of existing examples of establishing a communications respectively, increases the complexity of system expansion.

  • If the client does not support WebSocket protocol, as long downgrade HTTP connection can be used to receive polling message, the short message is handled by the connection. Other scenario does not require message forwarding, used only to transmit the scene to ChannelManager message, it can be sent directly via WebSocket.

IM service calls after the transformation process

Call initialization message line and distribution business process is completed by the PHP. When needed message forwarding, PHP calls the Dispatcher services business messaging interfaces, Dispatcher Dispatcher Table service by sharing the data retrieved ChannelManager instances where the recipient, the message is sent to the RPC by way of example, ChannelManager by the WebSocket message push to the client. IM service call flow as shown below:

When the number of connections exceeds the upper limit of the current cluster ChannelManager carried simply extended ChannelManager example, by the notification to dynamically monitor ETCD side, thereby to achieve a smooth expansion. Currently browser version of JS-SDK has developed, other lines of business through access to documents, it can easily integrate IM service.

Design Exchange in layers, there are three issues to consider:

1. multiport synchronization message

The client now has a PC browser, Windows client, H5, iOS / Android, if a user is logged many-fold, when the news came, the need to find out all connected users, when an end user is disconnected, it needs to locate a connection.

Mentioned above, the connection information is stored in DispatcherTable module, the module is able to so DispatcherTable connection information according to a user to retrieve information quickly. Design DispatcherTable module uses the Redis Hash storage, when the client establishes a connection with ChannelManager, the metadata to be synchronized with a UID (user information), uniquefield (unique value, a corresponding unique value of the connection), WSID (connection identifier ), ClientIP (client ip), serverip (server ip), channel (channels), substantially corresponding to the following structure:

Thus by key (uid) to find a plurality of end user connections can be positioned by the key + field to a connector. The default expiration time connection information for two hours, the purpose is to avoid client connection is disrupted resulting in the server does not capture, which is stored in a number of obsolete data in DispatcherTable.

2. The user online state synchronization

For example, a user has four customer service and consultation, then the user will appear in the list of four customer service consulting in. When a user logs in, to ensure that users are seeing four customer service online.

To do this there are two options, one is to obtain the user's call by polling the status, so that when the user is online but does not change state, initiates many invalid requests; another is the subscriber line, customer service to push on-line notification, this will cause the message spread, consulted each customer needs to diffuse notice. We finally adopted a second way, in the process of pushing, only to push customer service online user status.

3. The message is not lost, do not repeat

To avoid loss of message for the ID at the time we initiated the request, the client bring read messages using long connection polling mode, travel is calculated from the value of the message and then return to the server; use WebSocket embodiment, the server will after the message is pushed to the client, the client waits for the ACK, if the client does not ACK, the server will try several times to push.

Then you need to repeat the message the client to do the processing according to the message ID, to avoid the client may have received the message, but due to other causes ACK validation fails, retry trigger, resulting in duplicate messages.

IM messaging services flow

We mentioned above over IM services need to support multi-terminal, while in the role is divided into client and business end, in order to allow notification message when the output based on the domain name, terminal, the role of dynamic output differentiated content, introduced DDD (field-driven design) modeling method to process the message, the process as shown below:

Summary and Outlook

With the hornet's nest "Content + transaction" mode deepening, IM system architecture has gone through different stages of evolution and upgrade, from the early rugged disordered model to the unified management gradually standardized, scale.

We made some progress, of course, there is much further to go. Future, combined with the company's business development team and the pace of technological capabilities, we will continue to optimize the IM system. We are currently planning to poll the server code module with the message Go replacement, it is no longer dependent on PHP and OpenResty environment and achieve better decoupling; in addition, we will explore the wisdom of customer service based TensorFlow achieved through training data models, data analysis, to further enhance the efficiency of artificial solve customer service, improve the user experience, enabling better for business.

Author: hornet's nest business platform IM R & D team.

(Ma cellular technology original content, reproduced, be sure to save the file to indicate the source end of the two-dimensional code picture, thank you with.)

Guess you like

Origin juejin.im/post/5d352610e51d45105d63a5f7