TOP100summit: [Shared Record-Meituan Review] The evolution of the system architecture behind the rapid business upgrade and development

The content of this article comes from the case sharing of Xu Guanfei, the senior technical expert of the 2016 TOP100summit Meituan ● Dianping, and the head of the eHome team of the hotel background R&D team.
Editor: Cynthia

Xu Guanfei: Senior technical expert of Meituan-Dianping, and head of the eHome team of the hotel backstage R&D team. Senior technical expert of Xinmei University. He joined Meituan in 2012 and led the technological transformation of Xinmei Hotel business from a general group purchase model to a professional hotel reservation; he was responsible for the Meituan hotel reservation business system and platform system, and currently leads the hotel incubation business technical team.

Introduction: Xinmei University takes group buying as an entry point. In order to better connect users and merchants, since 2012, it has been deeply cultivating in vertical fields such as hotels, takeaways, and movies. By 2016, the transaction volume of Meituan Hotel exceeded 100 million in a single day, and the number of room nights ranked first in the industry.
In the process of transformation and rapid development of the hotel business, different stages also put forward different requirements for the system structure. This sharing session will organize the different stages of the business with you through the analysis and evolution of the overall situation of the system in several stages. The challenges and opportunities brought to the system, I hope to provide some ideas for students who are at a certain stage in the text.

1. Case Background

● From 2012, Meituan Hotel started to refine the hotel business based on group purchases;
● During the period from 2012 to 2016, it explored and completed the upgrade from group purchase to reservation business, with reservation business accounting for more than 90%;
● During the period, the hotel business developed rapidly. In 2016, the consumption volume exceeded 150 million per day, and the number of room nights per day exceeded 80W, ranking first in the industry;
● During the entire process of business transformation, upgrading and rapid development, different stages of the system architecture Design presents different challenges.

2. Case Interpretation

We will divide the entire development process of the hotel system into four major stages: decision-making, implementation, support, and optimization, from the beginning of intensive hotel cultivation to the current stage.

● The decision-making stage means that we need to think and determine how the hotel system should develop in the future, so as to better support and promote the development of the business.
● The implementation stage means that after a decision is made, the content of the decision needs to be gradually implemented and realized.
● The support stage means that the system needs to be able to support the rapid development of the business.
● The optimization stage refers to the support and development process of the system. We need to continuously optimize the business and system to better promote business development.

decision stage

First we enter the decision-making stage.

At that time, along with the development of the group buying business, a complete group buying system has been formed, including a complete set of supply, data, and sales processes, including toB, toE, and toC functions, together with related infrastructure. , such as DB, Cache, RPC, etc., and this set of systems supports more than 20 major categories such as catering, movies, hotels, and tourism.

 

 

The refined development of hotel business requires us to further expand and upgrade in supply, product, promotion, transaction and other links.

 

 

In front of us are a series of very common problems: the old system is difficult to support, the cost of the new system is high, and the business feasibility is not confirmed.

● Difficult to support the old system: The original group purchase system is a full-category system architecture, with strong versatility, difficult to modify and customize, and the system’s historical “burden” is heavy;
● The cost of the new system is high: if a new system is built with long links, It involves a wide range, requires a large amount of changes, and has a long cycle;
● The business feasibility is not confirmed: no matter what method is adopted, the cost will not be low, and at this time, we still have a certain degree of whether this business direction can truly meet the needs of users. Uncertainty.

This is a very common scenario and a very difficult decision to make. For things that are hard to decide, we need to gather more information:

First, low-cost verification of business viability.

The simplified version of the business is implemented in a way that is loosely coupled with the existing process, so as to ensure that users can confirm the usage of such requirements, and at the same time keep the low cost as much as possible. After verification, the business direction is OK.

 

 

Second, the verification of typical requirements is "difficult to support".

Validate the cost of customized development of the group buying system through typical "price plan" requirements. After actual verification, a typical demand lasts about 2 months, and there are many influencers, and there will be many similar demands in the future. Therefore, the cost is basically unacceptable.

 

 

Third, MVP requirements verification is "high cost".

Build the MVP version of the hotel system through the access of the hotel group, use the third party as the supply source, and export the PC version to realize the simplest version of the logic to confirm the scope and cost of the new system. After actual verification, the cycle is about 2 months, and the cost is not small, but it is basically controllable.

 

 

After three aspects of verification, it is concluded that the business direction is OK, the cost of reuse is too high, and the cost of new construction is basically controllable. According to this, the plan is: build a hotel system (build a wheel).

 

 

landing stage

After the direction is determined, we enter the landing stage.

There are three main aspects to focus on during the landing process:

● Which wheels need to be built by yourself (self-built or multiplexed): There are many systems involved in the whole link, all of which are expensive to rebuild, and some of them can be reused. How to choose.
● How big the wheels are (divided and combined): Service-oriented granularity problems, frequent changes in initial business requirements, high coordination costs for small services, and poor stability and isolation of large services.
● How to get the wheels up when the car does not stop (to ensure business development): The entire service construction requires a cycle, which will have a greater impact on the business during this period. How to reduce the impact and ensure business development.
Response: "self-build" or "reuse" → end-to-end, to maintain core competitiveness.
Building a hotel system does not mean that all the wheels have to be rebuilt again. Several principles are followed in the process:
● End-to-end: infrastructure For example, the reuse of cache, RPC framework, etc., ensures the stability of the infrastructure; the reuse of basic services such as users, payment, and risk control ensures the cooperation between different businesses in the future.
● Tail removal: External unified exports such as main search, recommendation, order, etc. are reused to ensure the uniformity and simplicity of downstream user experience.
● Guarantee the core competitiveness: the intermediate business logic part, focus on the core related part of the core competitiveness of the business. The non-core part is "accomplished", the multi-business core part "requires results", and only the core part of the hotel "does it by itself".

 

 

Response: "Dividing" and "Combining" → Dividing internally and cooperating externally

For the problem of division and integration, service-oriented is the trend, but we should try to reduce the impact of frequent changes in the early stage of the business and the situation of large demand gathering on the system, and adopt the methods of internal division and external cooperation:

● Internal division: Internal business logic scalability is retained. For example, product internal services will be divided into sales rules, prices, inventory and other services according to the subsequent business development direction to ensure scalability.
● External cooperation: externally ensure the logical integrity of services: For example, the interface provided by the product service can complete the complete operation and information acquisition of the product, so as to avoid the high cost of collaboration caused by too detailed service diffusion.

 

 

Response: Ensure business development → implement in stages, differentiate priorities, and ensure benefits at each stage

The entire construction of the system involves a wide range. If the one-time completion cycle is long, it is not conducive to rapid verification and optimization. Therefore, the entire project is divided into steps. There are two requirements for step-by-step:
● Prioritize solving business pain points;
● Must There are staged outputs.

First, identify the core goals of the project: efficient supply → good user experience.

Then, confirm the priority of the core goals:

● User experience depends on the abundant and high-quality products produced by efficient supply, and a certain accumulation period is required after the supply system construction is completed;
● Therefore: solve the supply problem first, and then improve the user experience.
Finally, clarify the business benefits of each stage:
● Supply stage: efficiency improvement
● User experience stage: refund rate

Step 1: Solve the efficiency problem
Through the construction of the supply system and product system, the supply efficiency has been improved by 300%. At the same time, in this process, the first half of the construction of the hotel system has been completed: supply to products.

 

 

Step 2: Solve the experience problem
Through the construction of the prepaid system and the business upgrade, the user refund rate is reduced by 60%. At the same time, in this process, the construction of the latter part of the hotel system is completed: sales to transactions.

 

 

Step 3: System integration Integrate
the hotel business in the group buying system with the hotel system to ensure that various business forms share the benefits of efficiency and experience, and at the same time unify all the systems of the hotel.

 

 

After the above steps, the initial construction of the hotel system is completed.
In the figure below, the gray part is the basic component part with the tail removed, the pink part is the basic service part with the tail removed, the green part is the unified export part without the head, and the rest of the large dark cyan part is related to the core of the hotel business. part of the system.

 

 

Support stage: support the rapid growth of business demand and business volume

After the construction of the hotel system is completed, it needs to undertake the original group purchase business volume, coupled with the explosive demand of users, the system needs to be further improved to better support the business.

In the support stage, we mainly look at the following aspects:
● Quality: The wheels are always broken, and they are repaired for half a day;
● Stability: Adjust the small wheels inside, and the whole car kneels; a small demand goes online, causing the entire system to collapse.
● Performance: The car is too heavy, and the wheels can't hold it; the business volume is getting bigger and bigger, and the system response is getting slower and slower, and it can't hold it anymore.
● Service-oriented: Small wheels become bigger wheels; the original small service has become larger and larger with the development of business, and many people are afraid to move it.

Response: Quality
When it comes to quality, you can quickly think of the need to strictly control all aspects of demand, design, development, launch, maintenance, etc. in order to ensure the final quality.

A general theory is that the earlier a problem is discovered, the smaller the impact and the lower the recovery cost.
But along with this theory, there is another dimension of problems: if you want to find problems in the more advanced links, the cost of investment is usually higher. When we are busy fighting fires every day, if we directly grasp the single test coverage, we are not allowed to have enough energy, so we have appropriate adjustments to the priority of these quality control measures (of course, all aspects need to be done at the same time) .

Focus on online monitoring and problem solving: ensure fast discovery and resolution of problems
● Monitoring level: Improve basic monitoring, service monitoring, business monitoring, check and fill gaps, and ensure full-process monitoring to quickly find problems.
● Troubleshooting level: Through the system call trace and business trace dimensions, it is ensured that the cause can be quickly located according to the situation of the problem.
● Processing level: Internal problems are quickly dealt with through processing tools and data repair tools, and external problems are quickly responded to through measures such as downgrades and current limiting.

 


When the online monitoring, investigation, and processing are almost done, we can finally draw some energy from the firefighting and continue to move forward.

 

Grasp the online process: ensure that the impact of the problem is as small as possible
● Intranet environment: Intranet online environment, internal verification.
● Grayscale environment: Grayscale partial user environment, low-traffic user authentication.
● Full environment: full user environment, online.

 

 

Grasp the main process automation test: ensure that the severity of the problem is as low as possible
● Mock external services through MockServer to ensure the stability of the test environment;
● Classify business functions;
● Cover automated tests according to business function priorities to ensure that key processes are not lost.

 

 

Grasp design, development, and self-test: ensure that problems are avoided and detected as early as possible

● Design stage: Provide standard design examples to ensure the quality of design output; ensure design results through design review;
● Development stage: avoid repeated learning and pit-filling costs by referring to best practices; static scanning Customization of rules to eliminate common problems; through code review and regular code reading to keep improving the code;
● Self-test phase: Provide unit test specifications, set coverage goals, and continuously integrate long-term inspections.

 

 

Response: Stability
There are many dimensions to consider in terms of system stability. Here we focus on the isolation problem.

Layering: Separation of change and invariance
Layer services: data layer, service layer, application layer, and API layer. The stability decreases in turn. The logic of change is controlled in the upper-layer area as much as possible to avoid modification at the bottom layer.

 

 

Slimming: The core process and the branch process are separated from the
main line logic, and the useless logic and non-main line logic are stripped to ensure the stability and clarity of the core process.

 

 

Isolation: The internal and external
parts of the interaction are separated for detection, and Fail Fast after a problem is found, so as to avoid dragging down the entire system due to a problem at one point.

 

 

Response: Performance

Do the most important thing for performance: data. Any optimization please use data to speak, otherwise it will be fooling.
Several commonly used ideas for optimizing performance (basic components, network and other level optimization are not discussed here): simplification, asynchronous, parallel, nearest.
● Simplified: Is it that complicated
● Asynchronous: Is it that urgent
● Parallel: Is it necessary to wait for someone else

 


● Proximity: the expansion of the computer room is slow, can the local computer room be used; the network access is slow, can the local data be used, etc.?

 

 

 

Response: service

With the development of the business, the original small service has become larger and larger, and the service-oriented problem needs to be considered. General service splits have the following thinking dimensions:

● Business level: independence, integrity, atomicity, etc.
● Service level: severity, speed, read/write, etc.
● Organization level: Organizational division determines system architecture

In addition, for the parts that can be shared by multiple services, they can be constructed according to the platform, such as customer service system, settlement system, etc.

 

 

After a period of tossing, the system has become as shown in the figure:
● The bottom layer of infrastructure and basic business services supports the business of each business group of the company;
● The platform layer, the penultimate layer, provides wine The platform business services in the travel scenario support business forms such as hotels, tickets, and transportation;
● The intermediate product and transaction layers, as the service layer of the business, remain basically stable;
● The second layer of search, transactions, etc., as the application layer, Closer to the business, lighter changes, and flexibility.

 

 

At this stage, our system runs more and more smoothly, and the optimization effect is shown in the figure.

 

 

Optimization stage: technology optimization to promote further business development

With the further development of the business and the system, the system enters the optimization stage, and another topic is raised at this stage: how to further promote the business development through technical optimization.

Two core principles:
● Grasp the pain points: keep thinking and figure out what exactly you want;
● Talk about data: Like the performance part, without data, you are just messing around.

The original marketing process to seize the pain points
is: operators collect data from N parties manually, then set marketing strategies based on their own experience, launch them, and then collect data to verify the results.

 

 

At this time, we may hear feedback from the operation classmates: "Alas, I'm too tired every day, I want to take two days off, but no one can do it for me."
From the point of view of the normal demand route, we need to optimize the tools for obtaining data for the operation students, and optimize the user experience of the system for the operation students to set strategies.

But we should also think deeply:
● Why is it so tiring: In the whole process, operation is the core part, which needs to obtain information, integrate, make decisions, and deliver from multiple parties, and plays a more important role.
● Why no one can replace: In the process of strategy setting, it relies too much on operational capabilities, experience has a huge impact on the results, and at the same time, experience is not easy to impart or accumulate.
In response to this situation, we have further optimized the marketing system:
● Experience accumulation: Input the operation rules as experience to the analysis engine to achieve accumulation.
● Lightening of roles: The main work of data integration and analysis is replaced by machines, and operations play more roles of audit and observation.

 

 

data speak

A typical scenario of direct connection is to obtain inventory and price information from hotel groups or third parties, and then sell them through MT. In this process, the real-time nature of data has a great impact on the business.

 

 

Contents to be considered include:
● Trigger point of the problem: low payment conversion rate
● Narrow the scope: analyze the data of each link to confirm the partial impact of the room status
● Clear indicators: the accuracy rate of the room status

Specific steps:
First: analyze the user's order booking time, confirm the coverage of the data in the next N days, and set different synchronization strategies for the number of days from the current time, and the accuracy rate will increase by 10%;
Second: analyze the actual data caused by not real-time In the case of the problem, it is determined that the synchronization of data is triggered by the occurrence of verification, transaction, etc., and the accuracy rate increases by 5%;
third: analyzes the user's behavior data, predicts hot data, and increases the synchronization frequency, and the accuracy rate increases by 8%.

 

 

summary

 

 

From November 9th to 12th, Beijing National Convention Center, the 6th TOP100 Global Software Case Study Summit, Hao Jinghua, Meituan Food Delivery Algorithm Architect, will share the "Meituan Delivery Intelligent Scheduling System Evolution"; Meituan Dianping Wine Travel Quality Team Tool Chain Wang Peng, the person in charge, will share the "Practice of Automated Testing and Continuous Integration Tool Chain under Microservice Architecture".

The TOP100 Global Software Case Study Summit has been held for six sessions to select outstanding global software R&D cases, with 2,000 attendees every year. Including product, team, architecture, operation and maintenance, big data, artificial intelligence and other technical special sessions, on-site learning of the latest research and development practices of first-line Internet companies such as Google, Microsoft, Tencent, Ali, Baidu and so on. Application entrance for the single-day experience ticket for the opening ceremony of the conference

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326281007&siteId=291194637