TOP100summit: [Shared Record-Feng Yu] 58 Road to Home Multi-terminal News Integration

The content of this article comes from the case sharing of Feng Yu, the architect of TOP100summit 58 in 2016 .

Editor: Cynthia

On November 9-12, 2017, the 6th TOP100summit of Beijing National Convention Center, leave a comment for a chance to get a free trial ticket.

 

Feng Yu: 58 Daojia architect. Mainly responsible for the company's strategic product development such as the home messaging system and the H5 portal. Experienced in message design, traffic growth, etc.

Introduction: After going through the barbaric development stage, 58 Daojia has many messaging scenarios and different technologies. This case summarizes the difficulties and challenges of message sending and receiving in multiple business scenarios, sorts out the characteristics of various technologies, and builds a set of general message delivery solutions based on actual business and R&D needs. The solution establishes a unified end-to-end, end-to-server, and server-to-end message channel, shields the business side from differences in different technologies, and provides monitoring statistics on core indicators such as message arrival rate. To achieve the goal that the business line can quickly access various message services.

This article will introduce the specific process, steps and methods of this practice for reference by peers.

one. statement of problem

1.1 Daojia business is complicated

58 Daojia is a start-up company engaged in life service O2O business. Since its establishment, its business has developed rapidly. The company operates three major businesses: housekeeping, beauty, and express. If you are looking for cleaning, nanny, and confinement, you can use the housekeeping business; you can use the beauty business for nail beauty, etc.; you can find express delivery for moving goods. In addition to the three major self-operated businesses, there is also a very important open platform. Merchants publish services on the open platform, and users can consume services on the platform. The open platform covers everything you can think of, from unlocking to changing light bulbs, from sending flowers to working out.

1.2 Diverse message requirements

Numerous services and different scenarios bring great challenges to the message system.

For example, express business: users need to move, take out their mobile phones to check the driver’s location, place an order, grab the order, and calculate the distance after delivery. These businesses all require timely and efficient delivery of orders and latitude and longitude information;

Another example is that the user's assets change or the coupon is about to expire. The system needs to push the prompt information to the user, and the user will not always open the 58 home application. We need to deliver the prompt information to the user at a low cost and effectively;

For another example, in an open platform, users need to communicate with merchants to understand the specific conditions of the services or products provided, and the system needs to ensure that users and merchants can communicate when they are not online at the same time.

1.3 Repeated development is serious

In order to cope with the rapid development of business, startups will choose the easiest method and framework to implement. The same is true for 58 Daojia. As a result, numerous message systems (as shown in Figure 1) have been built, which are scattered in various business lines. Some use MQTT, some use HTTP, some use Getui, and some use Mitui, the message protocol is inconsistent, and there are obstacles to interconnection. R&D personnel need to be familiar with multiple sets of message systems, the R&D efficiency is low, and the R&D quality is difficult to guarantee.

Figure 1: Confused Messaging System

Therefore, there is an urgent need to build a unified messaging system to shield R&D personnel from details, improve development efficiency and improve development quality.

two. Solutions

2.1 Unified Messaging Platform

As shown in Figure 2, the unified messaging platform mainly includes four parts: TCP message system, push channel, policy center, and terminal.

Figure 2: Unified Messaging Platform Architecture

●TCP message system

The self-developed message system based on the TCP protocol supports end-to-end, end-to-server, and server-to-end message delivery, and has the advantages of high performance and low overhead. Used to gradually replace a wide variety of messaging systems.

●Push channel

Strengthen the ability to push messages. Integrate message push methods such as GeTui, MiTui, APNS, WeChat, and SMS. The self-developed TCP message system is also a message push method.

●Strategy Center

Manually configure the message delivery strategy, which can be modified according to the message reachability rate or business scenario needs.

●side

Mainly refers to the mobile terminal. The unified messaging platform provides a unified SDK to support the interaction between the mobile terminal and the messaging platform server. At the same time, the terminal also includes WeChat, SMS and other software commonly used by users to receive messages.

Under this architecture, business R&D personnel only need to focus on the unified SDK on the side and the unified messaging interface on the server side, and other energy can be focused on processing business logic.

2.2 TCP message system

The overall structure of the TCP message system is shown in Figure 3.

Figure 3: TCP message system

The dashed boxes describe the functional components of the TCP message system. It includes four parts: access layer (msg-gate), logic layer (msg-logic), ip configuration (ipconfig), and route cache (redis).

access layer

The msg-gate module in the figure is the access layer, and its main functions include:

●Connection rectification: maintain a large number of TCP long connections with clients, and rectify a large number of external TCP long connections into a small number of TCP long connections with the back-end msg-logic.

●Secure channel: establish a secure TCP channel, encrypt and decrypt.

●Preliminary attack and defense: Implement the preliminary anti-attack strategy, speed limit strategy, and message body size limit.

●Message delivery: send the message delivered by msg-logic to the client.

logic layer

The msg-logic module is called the logic layer, and its main functions include:

●Connection verification (can be understood as logging in in the message system).

●The interface through which the APP sends messages to the app-server can be understood as the C2S interface.

●The interface for app-server to send messages to APP, including single sending and group sending.

Redis cache

Cache the connection status of the business client, which msg-gate it is connected to, and whether the connection status is normal. It is used to provide message routing when pushing messages to users.

ipconfig

Provide access layer IP addresses for clients to implement functions such as load balancing and business grouping.

2.2.1 Session management

The access layer maintains long TCP connections with a large number of clients, and needs to track the status of these connections in real time. The TCP message system saves the client's connection information in the memory of the access layer, which is called session. The session records the channel corresponding to the client, which can be understood as the socketid, which marks the login status isLogin, the login time loginTime, and the last active time lastKeepAliveTime.

The session needs state and information needs to be maintained in real time. The maintenance timing mainly includes the following:

●Login and logout are easy to understand, you need to modify the login status of Peer.

●Keepalive, the heartbeat needs to modify the last active time of the session.

●The Logic layer requests to kick people, and the kick request from the backend.

●The speed limit of a client by the access layer or the client sending messages too fast will be regarded as an attack behavior, and the connection will be forcibly disconnected.

●Socket may be abnormal, illegal message, can not pass the message header verification, also need to disconnect.

There is also a situation where the client does not transmit any messages after connecting to the server. This situation may be caused by network reasons or may be suspected attacks. We need to traverse all sessions regularly, find sessions that have been inactive for a long time, and clear them.

There are so many scenarios of reading and writing maintenance sessions, which can be summed up in three categories:

●Locate session by business attribute user id;

●Locate session through channel;

● Traverse the session.

Figure 4: Session structure diagram

As shown in Figure 4, session management mainly includes three structures:

The Map in the middle is the core data structure for saving Peer, and the session can be retrieved through channelid;

●The two-way map on the right saves the mapping relationship between uid and channelid. The two-way map can retrieve channelid according to uid, and can also retrieve uid according to channelid. Why use the bidirectional structure will be mentioned later.

●The queue on the left saves the channelid of all clients connected to the access server. The queue is implemented in a lock-free manner. Timed tasks traverse the session one by one without generating locks and affecting performance.

The scheduled task reads channelid1 from the queue to determine whether channel1 is normal. If it is found to be inactive for a long time, it is considered that the corresponding client is not connected to the access server. The session in the HashMap needs to be cleared, and the data corresponding to the BiMap needs to be cleared at the same time. When clearing the BiMap data, the data needs to be located according to the channelid, which is the purpose of the two-way map.

Other requests to locate and modify data based on uid or channelid will not generate locks and will not affect performance.

There is one point to note: when adding or deleting peers, you need to do corresponding concurrency control.

2.2.1 Offline messages

The offline message pulling method is shown in Figure 5.

Figure 5: Offline message logic

In order to prevent pulling too many offline messages at a time, the pulling method adopts the method of paging pulling. Pull 10 each time.

The APP side pulls the offline message, passing three parameters uid=123, msgid=100, size=10, uid indicates who pulls the message, msgid is the largest message id in the existing messages of the App, the message id increases, the largest The message id represents the last message data received by the App. If the App has not received the message, msgid will pass 0.

●The message server receives a request to pull offline messages, and msgid=100 indicates that the App has received the data before msgid=100. Delete offline messages before msgid=100.

● Retrieve 10 messages after msgid=100, assuming msid is from 101 to 110.

●The message server returns the 10 pieces of data to the App, and completes 1 page of offline data pull.

●If the number of offline messages pulled by the APP is not 0, the APP takes msgid=110 as a parameter to request to pull offline messages again, until the server does not return data and ends the offline message pulling.

2.3 Push channel

Users who arrive home at 58 will not open the app often, and the TCP messaging system may not be able to deliver messages to users in time. For activities like limited-time snap-ups, messages must be delivered to users at a certain time, which cannot be met by the TCP message system alone.

Unified message push channel, integrating TCP, Getui, MiTui, APNS, WeChat, SMS and other message push methods to ensure that messages are delivered to users as much as possible. The unified push channel structure is shown in Figure 6.

Figure 6: Unified push channel structure diagram

The core work of the push channel is to complete the push of messages to the end. Different channels have different parameters required for push, but the push channel can obtain the parameters required by the corresponding channel (as shown in Figure 7).

Figure 7: Push channel and parameters

2.4 Strategy Center

The Policy Center supports manual configuration and automatic adjustment of push policies.

Give two examples.

The first example: Suppose I am a fitness enthusiast. I use the App to communicate the price with the gym owner through the TCP message system. As a result, the gym owner does not open the 58 Daojia App and cannot receive my news. At this time, the system can follow the strategy center. Strategy, send a message reminder to the gym owner through APNS or Getui and Mitui;

The second example: someone decides to find a manicurist for manicure. This information is very important to the manicurist. One of the delivery strategies of the strategy center is likely to send a text message to the manicurist at the same time as push.

The strategy center structure is shown in Figure 8.

Figure 8: Strategy Center Structure Diagram

Policy configuration module. Manually configure the message push strategy, which is convenient to modify the message push strategy according to the message reachability rate or business scenario needs. For example, the product mentioned above can adjust the push channel back and forth, which can be configured through this module.

Policy parsing, parsing push message policies. Read the configured message sending strategy, and select the push channel according to the type of mobile phone. Xiaomi mobile phones use Mi Push, other android mobile phones use Ge Push, and Apple use apns.

If it is pushed through multiple channels, you need to confirm whether it is a parallel push (such as asset changes, push through APNS, WeChat at the same time) or sequential push (according to the ACK situation, such as express orders, the priority is to push through the TCP channel, if no ACK is received within the specified time. , then push via personal push or meter push).

The timing scheduler periodically probes the message cache according to the push policy to determine whether the message has been delivered. According to the push strategy, push through other channels or give feedback on whether the message is delivered.

ACK detection, to determine whether the message is delivered, and through which channel it is delivered.

2.4 terminal

Provide a unified mobile terminal development SDK to support the message transmission of the entire mobile terminal. The SDK on the terminal has four core points: keep-alive, message deduplication, random delay of TCP reconnection, and power control.

●Keep-alive: Ensuring that the TCP link is available on various models of mobile phones is the most critical factor for whether the message transmission is normal.

●Message deduplication: The technology of memory queue + SQLite is adopted to ensure that the messages presented to users will not be duplicated in a complex network environment.

● Random delay of TCP reconnection: After the TCP access server hangs up unexpectedly, a large number of clients initiate connection requests to other servers at the same time, resulting in an avalanche.

●Controlling power consumption is a problem that needs to be paid attention to in mobile development.

three. practice process

3.1 Abstracting complex message scenarios

In the face of complex business, abstract modeling is required first. Figure 9 shows the division of message types.

Figure 9: Message Classification

The mobile phone and notebook icons in the upper row in the figure are called terminals or clients in the message system, and are represented by client in English. The icon in the middle cloud is our unified messaging platform. The server icon below is the business server, which is represented by sever in English.

58 Daojia's various complex message requirements can be abstracted into three categories.

●C2S,client to server

For example, on the mobile phone of the express driver, the latitude and longitude of the driving track needs to be transmitted to the express background server in near real time, so that the server can calculate the fare according to the driving track.

●S2C,server to client

The user has a cleaning coupon that is about to expire, and the server needs to notify the user. This type of message is actively pushed by the server.

●C2C,client to client

Open platform business, users need to consult the merchants, send the questions to the merchants, and the merchants will answer them.

3.2 Clear goals, step by step

●The goal that the system needs to achieve is clear.

At the beginning of planning, the unified messaging platform has considered supporting three types of messages, and at the same time foresees the need to strengthen message push capabilities and flexible configuration capabilities. The overall structure diagram includes four major parts: TCP message system, push channel, policy center, and end, to ensure that the ultimate goal can be achieved.

●Promote implementation step by step.

In the specific implementation, the TCP message system was first developed to solve the pain point of mass message transmission, and gradually extended to various businesses; then, multiple push channels were integrated to increase push strategies. Every step of the implementation stage can make the business line see results, and the messaging platform can also be quickly promoted in the process.

4. Effect evaluation and summary

Before the unified messaging platform, each team developed its own messaging function, repeated investment, poor code quality, and no continuity in maintenance. After unification, one-time investment, through continuous improvement and maintenance, greatly improves the efficiency of research and development and improves the quality of the system. It is embodied in the following aspects.

●Provide mobile terminal SDK, unified development interface on the terminal.

●Server-side interface, unified server-side development interface.

●Strengthen the ability to push messages to users without opening an app.

●Add a message push strategy to meet changes in business needs.

From November 9th to 12th, Beijing National Convention Center, the 6th TOP100 Global Software Case Study Summit, Liu Zhongwei, senior architect of 58 Daojia, will share "O2O System Architecture Evolution"

The TOP100 Global Software Case Study Summit has been held for six sessions to select outstanding global software R&D cases, with 2,000 attendees every year. Including product, team, architecture, operation and maintenance, big data, artificial intelligence and other technical special sessions, on-site learning of the latest research and development practices of first-line Internet companies such as Google, Microsoft, Tencent, Ali, Baidu and so on.

For more TOP100 case information and schedule, please visit [Official Website] . In 4 days, I will share the 100 most worthwhile R&D case practices in 2017.

Free trial ticket application entrance

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326287672&siteId=291194637