Jingdong architect: one hundred million push messaging platform architecture practice! 9 ppt once thoroughly publicize

Click on " technology leadership " concern Δ every day at 8:30 am Push  

Source: Jingdong Mall architecture team

Each app or service has to push information to the user of the client's needs. As the push platform in Taiwan, the need for the internal company offers many different app available simultaneously, steady push service, so we push messaging platform came into being.

Push Platform Architecture

Glossary:

dt: full name deviceToken, representatives device unique identifier.

appId: push application code in a user's application platform.

token: the platform for each application corresponding to the specific key.

msgId: push each call, the message that uniquely identifies the platform generated.

Side service platform:

send-pservice: External website provide the push platform, users can use applications on the site, create a push, view data and other operations.

send-api: it provides external interfaces, including JSF and the http in two ways, the interface includes a push, unbundling, and cancellation.

send-worker: primarily responsible for pulling the timing push, transmitted.

Channel service side:

habitat: responsible for equipment database-related operations, including the bindings, unbundling, report, write-offs.

mutate: responsible for receiving the push request, the push handle pin, dt correspondence relationship, and the adaptation of the feed channel.

report: reporting services, is responsible for receiving the device when you first log generated dt reported.

channel: the last responsible for sending a request to push the respective feed channel.

slark: Receipt Services, is responsible for receiving asynchronous receipt of information after the passage of some manufacturers push, and for further processing.

other:

Self-built channels: a platform to build a push service channel, responsible for maintaining the long connection, push messages to the client.

Manufacturer Channel: system-level push services provided by outside vendors.

DataAnalysis: statistics service platform, responsible for circulation statistics push.

Business parties push in two ways, one is created on the page by pushing push platform website, and the second is by calling api way we provide services, be pushed. Push into broadcast and specify the type of user push, users can specify the type of equipment specified people or create a push in time.

After the inside of the platform receiving the request message pushed, if the request is in the form of a push device, the system will directly push content and composition dt, pushed to the feed channel. If the request is a population-pin push mode, the system finds all valid devices to each other via pin, and then add the contents of a push dt way. Push channel is divided into two types, self-built channel and channel vendors. Vendor channel is a mobile phone manufacturer or system to provide system-level push channel, maintained by the phone system, is relatively stable. Self-built channel is our self-developed push service channel, on-line device pushed through the long link between self-built channels to maintain the service and the client.

Key data preparation

In the application dimension, push messaging platform provides push service to a different app, so it is necessary for each app appId and generate a corresponding token, calls for the business side to push the use of certified platforms.

In the device dimensions, correspondence between the platform stores dt, pin and appId for the formation and maintenance of a specific device data push message.

Message in dimension, internet service for each push call party request, generates a msgId, and returned to the caller, this will be carried throughout the msgId the push process, generated from the push platform, pushed or self manufacturers channel, touch reach the client, the last receipt to push the platform. This approach is a prerequisite for the entire process to track push, push data to facilitate troubleshooting and statistics.

Key techniques

1

Processing push requests

Business party pushed through the platform can create a page, and call the send-api service interface in two ways, which is used more frequently. Platform provides call and JSF form http two interfaces, real-time push request in accordance with the classification, can be divided into instant push messages and timing push message.

Instant message required for the push request, after receiving a valid service API calls, calls immediately push interface channel side is pushed. The need for regular pushed message, api service will write the task in an ordered set redis. will periodically send-worker pulls push task of the respective set of applications, and the arrival time of the push tasks taken, encapsulation, service side push call channel, after successful receipt, the task will be removed.

In order to improve the interface performance, in addition to the basic parameters check, the process flow of other requests are asynchronously. Using the thread pool processing is performed on each push requests, response time. With such a response can be reduced or JSF timeout thread pool when a large number of concurrent requests high, due to the slow response time, resulting in depletion occurs.

2

Request vendor channel

After receiving the channel side side encapsulated internet push requests, filter out dt, then the combined message content and body dt, vendors request to the channel. If the business side pin is passed, the system will be based on the correspondence to the positioning pin corresponding set dt. When manufacturers receipt cleanup dt, or call the business side unbundling or cancellation dt interfaces, this collection is dt dynamic real-time updates. The same time push the message body of a large number of dt, where the group is broadcast or pushed, the system will dt packet transmission, to control the size of the request body.

In the case of high concurrent requests, request vendor channel vendors limiting cases lead to failure is likely to occur. When the request failure occurs, the system will go to retry, retry, but this time if the time interval is too short, its retry is meaningless, vendor channel in such a short period of time may still be limiting state, retry interval If too long, the request would reduce the efficiency, but also the consumption of the memory system.

Thus push this case using the internet retry strategy configurable, dynamic adjustment of the number of retries and the retry interval configuration, facilitate heat update retry mechanism according to the specific circumstances of the manufacturer channel.

3

Establishment and maintenance of a long connection

Self-built channel is a push messaging platform to build their own long channel connecting the push service aims to provide an effective way to push equipment manufacturers can not access the channel.

Long to establish a connection:

Self long connection channel is implemented based on TCP + TLS. When the client line action trigger, it sends a request to the server machine, after the server receives a request, it returns to the client a heartbeat argument, which will be used as the heartbeat interval when the two sides no data exchange. Since then, the client under normal circumstances, if there is no data exchange with the server will periodically send a heartbeat to the server, both the long connection is established.

We will be based on the actual situation of the network and server, dynamic adaptation heartbeatConfig parameters including off-time interval Idle and heartbeat transmission interval heartbeat, can be dynamically adjusted.

Long connections Maintenance:

After the client's first request triggers onConnect action, the server will connect client relationship with the self-built channel ctx cluster to maintain them. When neither the data exchange in both Idle time, nor heartbeat links, the server determines that the client will be offline, the corresponding long connection is disconnected, the client records all connections will be released automatically.

In the long connection process, it may appear disconnected from the network causes the client to disconnect reconnect phenomenon. That is in the Idle time, with a client request to the server even built a second time, this time may appear a client and server to establish the case of two long connections. For this problem, the server uses the old connection is disconnected, accepted form new connections, ensure the uniqueness of each device in the cluster long connection.

Connection using a long push:

When the service channel by the channel side of the final push request, and sends the message body dt final package form to request a channel. For using the device self channel, channel will automatically identify the device corresponding to the single server ip, a push request is sent to push a corresponding machine. Not recognize the decision device is not online, can not be pushed.

Self-built security long connection:

For a long connection self service channel, the security is very important. We made two major areas of security guarantee. The first is self-built channel using TLS encryption server domain. The second is the use of self-codec defined between the server and client SDK, we unified use prespecified good custom coding and decoding methods to improve data security level.

4

Push receipt processing

Each channel includes a self-built channel manufacturers of receipt parameters of the standard is not unified, platform slark service for receiving acknowledgment message push channel, receipt manufacturers currently access services, including Huawei, Meizu, millet and OPPO. For different channels, internet parameters based on receipt of criteria for each channel for each channel to do a separate receipt information processing, for example, cleaning according dt invalid return code, the client push open water after transmission (for statistical )Wait.

Statistical data platform

1

Dimension Data Message

Push data push for a platform are essential, these data can be used for each side to see your business each push of the status effect, so they can better formulate push policy, while giving each push tracked , convenient push the entire process of troubleshooting.

State comprises push processing, transmission, touch up, open. The message processing and transmission to the two states produced by the channel side kafka processing node, the message reported by the touch-up and opened to the vendor SDK or self channel, then the channel services to slark receipt, to produce kafka.

Push for push messaging platform provides a multi-dimensional statistics, including each device, each user push record, as well as the overall number of groups and push all aspects of the broadcast statistics and reflect touch up rates and open rates.

At the same time, it also provides status push push channel dimensions of statistics, as well as the recent push charts, you can let the business side from multiple angles, the results clearly and accurately observed each push of their own.

2

Dimension Data equipment

Equipment data, in addition to the platform to maintain detailed correspondence between each dt outside, and also carried out a separate statistics show the total amount of equipment for each application, daily new equipment, cleaning equipment, turn off the device. It also provides mobile phone brand (Android), device data model (IOS) dimension statistics.

3

Achieve statistical data

Technology implementations, the statistics used flink calculation program, through the whole process carried msgId, message type, and a push client state reported by the device type field, the push data is calculated each dimension. Push data in this dimension of each device, the use of the elasticSearch, push for necessary field data were landing, user support in the form of a pin, push case corresponds to a query for all devices.

Monitoring and alarm

For a mature product platform for their monitoring and alerting service and essential services. Push notifications platform within the system, the user needs both aspects set out a detailed monitoring Buried develop a reliable alarm rules.

Within the system, push the platform for each new application interfaces are added Buried monitoring appId corresponding amount when calling interface, performance, availability changes, the platform administrator according to police, found responsible for the corresponding application people, facilitate the timely understanding of the situation, locate and solve problems. But also on the critical cluster of machines set up memory, CPU alarm rules, real-time monitoring server cluster performance.

Moreover platform also carry out regular daily checks each application for a certificate uploaded. A program of regular tasks all day pulling certificate applications, parsing the certificate is valid, the platform will have failed, or left in the certificate application responsible for 7 days failed mail, text messaging and other forms of warning, in order to urge the user to replace the certificate as soon as possible, to avoid There was a problem.

Platform for future planning

Improve system performance, expand the scope of services:

Push notifications platform currently has financial Jingdong, wheat Beijing, Beijing, Xi and other applications to provide messaging push services, the next goal is to provide the full amount of push messaging service Jingdong master. We will also continue to optimize performance, improve stability and availability of the system, to create a good platform for large-scale distributed message push.

At the same time, push messaging platform will advance the work of foreign output. The cloud, the relevant components of the reform task is completed as soon as possible, so that the platform can push Foreign enabling the service to more users outside the company group.

 -END- 

Watch "technology leadership" No public

With a story about technology, it is interesting, there is expected!

I want to join the community, with 100 large Internet cafes learning?

Add group assistant Emma, ​​marked "plus group"

Technology Leadership community


We look at:

1. Superstition is a disease in Taiwan, get treatment

2. Jingdong Mall, very large electrical architectural practice's system! 8 ppt

3. Lei Jun, Zhang Xiaolong: why efforts to master in-depth and easy

4 . Ali VP Li Feifei: next-generation cloud native database technology, 40 ppt

5. headlines, vibrato: 400 million Nikkatsu recommendation system architecture and algorithms, 33 ppt

6. The micro-channel architecture Director: micro-services architecture at 10 billion yen live scene

Like to see the point!

Published 165 original articles · won praise 946 · views 310 000 +

Guess you like

Origin blog.csdn.net/yellowzf3/article/details/105154293