1. Background
The racing list is a ranking list based on JD.com’s real-time sales data provided by various purchasing and sales groups during the promotion period. It also responds to the traffic peak scene of the big promotion, and uses the list to leverage brands to increase resource investment in JD.com. The racing list is based on user configuration rules for real-time data calculation. The ranking of the list changes in real time during the promotion period. The relevant ranking data is widely disseminated on Weibo and Moments. The accuracy of relevant calculations and rankings is crucial.
The configuration rules of each list of the racing list will be different. In order to ensure the calculation of the list data is accurate, it is necessary to check the real-time ranking data of the list before the start of the big promotion. The main verification method is to take the real-time data of the previous day Ranking data, in addition, configure information according to the list rules, calculate relevant offline data, perform real-time offline data comparison, and verify data consistency.
A single list rule has 20+ different configuration items, each configuration is independent of each other, and data verification needs to be performed separately for each rule
2. Evolution process of logarithmic scheme
2.1. Pure labor - high cost and cannot be fully covered
The initial stage is purely manual logarithm, and the real-time and offline data corresponding to the racing list are respectively obtained for manual comparison
1) Real-time data: Read the list data interface regularly at 23:59 every day, and record the corresponding list data
2) Offline data : Manually write offline SQL scripts according to the list rules, and execute SQL through data query to obtain list ranking data
The entire operation process takes a long time. It takes 1 hour to write SQL, and 0.5 hours to execute a single SQL. In order to cover all rules, more than 100 rule configurations, SQL writing, and data verification need to be completed at a time. If the rules remain unchanged, it is estimated that it will take It takes 20 man-days to complete a complete test, and script writing requires an in-depth understanding of business rules, and requires a high level of SQL for testers.
2.2. Semi-automation - continuous consumption of manpower
The race list is mainly used during the big promotion period. In addition to the functional test coverage rules, data verification of the rules configured by the business party is also required before the big promotion to ensure the calculation accuracy of the user configuration rules. Taking 23 years 618 as an example, there are a total of 5000+ list rules, if you still use the purely manual data verification scheme, it will take 900+ days, which is completely unfeasible. Therefore, the semi-automatic logarithm scheme is realized. Compared with the manual logarithm scheme, it solves the problems of automatic generation of offline SQL and automatic acquisition of real-time data.
The specific plan is as follows:
1. Real-time data acquisition : Based on the list snapshot function, the daily snapshot data of the list is automatically recorded and written into the database.
2. Offline SQL generation and data calculation:
2.1. Rule configuration storage: through the list rule export function that comes with the system, the list rules are exported to excel, and then imported into the hive table; at the same time, other configuration data that the list rules depend on are also imported into hive
2.2. Regularly generate SQL: According to the list rule configuration information, use the case when method to generate corresponding SQL fragments for different situations, and finally manually combine the above SQL
2.3. Merge SQL to execute computing tasks : Merge SQL generated by multiple combinations into one, configure offline scheduling tasks, and calculate offline data of different lists through task execution
2.4. Push data to logarithmic MySQL : push the generated offline list data to MySQL for real-time data storage
3. Real-time offline data comparison : After pushing all the real-time and offline data into the database, directly query the database, compare the data, and highlight the data exceeding the threshold.
Through the above method, the semi-automatic real-time offline logarithm is completed, which solves the most labor-intensive SQL manual writing problem in manual logarithm. However, this solution still has the following problems:
During the period of 618 and Double 11 in 22 years, it was mainly used by R&D students to adjust relevant SQL and verify data, which required 3 developers for 3 weeks and consumed 45 man-days as a whole.
2.3, full automation - liberate manpower
In order to further liberate manpower consumption and upgrade logarithmic operations from semi-automated to fully automated , the following needs to be achieved
The complete automated logarithmic scheme is shown in the figure below:
Optimization point details:
1. Automatically update and store SQL every day : the list rules are changed from manual page export to automatically extract rule data to HIVE every day, and then automatically update the target SQL every day and store the SQL in the HIVE table
2. Automatically obtain and execute the target SQL : obtain the executed target SQL from HIVE and then execute the SQL (using some special methods of the hive command, obtain the SQL in advance and execute it)
#HiveTask增加run_shell_cmd_out函数只返回标准流的内容在标准客户端执行如下python脚本
from HiveTask import HiveTask
ht = HiveTask()
ht.run_shell_cmd_out(shellcmd='hive -e "select * from table;"')
The solution was put into use during 618, 23, which coincided with the handover of the R&D team. The new team had no logarithmic experience, and other businesses were carried out simultaneously, so it was impossible to invest full manpower. Through the fully automated logarithm, the R&D manpower investment has been liberated, and the efficiency of preparing for the big promotion has been greatly improved. The manpower required is mainly for test students to perform maintenance processing on the scheduling tasks of the entire link.
Clarification about MyBatis-Flex plagiarizing MyBatis-Plus Arc browser officially released 1.0, claiming to be a substitute for Chrome OpenAI officially launched Android version ChatGPT VS Code optimized name obfuscation compression, reduced built-in JS by 20%! LK-99: The first room temperature and pressure superconductor? Musk "purchased for zero yuan" and robbed the @x Twitter account. The Python Steering Committee plans to accept the PEP 703 proposal, making the global interpreter lock optional . The number of visits to the system's open source and free packet capture software Stack Overflow has dropped significantly, and Musk said it has been replaced by LLMAuthor: Jingdong Retail Wang Henglei, Qi Qi
Source: JD Cloud Developer Community