Implementation practice of Diff testing tool for active and backup link delivery in real-time data warehouse

1. Background

At present, the priority level of real-time delivery indicators provided by real-time data warehouses is becoming more and more important. In particular, the data warehouse data provided by downstream rules engines has a direct impact on the advertising delivery of advertising operations. Data delays or abnormalities may cause direct or indirect asset losses. Loss; Judging from the link panorama of the delivery management platform, the current delivery is a closed-loop operation process. The real-time data warehouse is a key node in the data link. The real-time data directly supports the automated operation of the rule engine and delivery. Manual control of the management platform; real-time node accidents may cause the entire delivery link to fail to operate normally; in order to achieve 99.9% stability of the delivery link, it is necessary to improve the stability and priority of the link tasks.

135.png

The R&D test comprehensive evaluation plan adds a backup link to the live link, iterates the delivery requirements, and makes iterative modifications through the backup link. After the modification is completed, Diff of the main and backup links is performed to ensure that the Diff pass rate is 99.9%, and then it can go online.

2. Implementation plan

  • Data preparation: The data generated by the active and backup links are written into ODPS in real time.

  • Data collection: The test tool service collects active and backup link data slices at the same time, retaining 2 copies of the data in the same time period.

  • Data Noise Reduction & Diff: After the tool collects data, it will perform the first step of noise reduction processing; the primary and backup data will start to be compared & the second step of noise reduction processing.

  • Data Diff result: Process the result of data comparison, determine the difference in each field, and finally determine the difference in the overall data and give the result.

900.jpeg

3. Set up primary and backup links

230.jpeg

Real-time link explanation: The source data is written to Kafka, Flink consumes the Kafka data as the data source (Source), combines the attribute fields to perform operator processing (Transformatin), and the processing results are written to Kafka (Sink) for the next step of processing. The processing is streamed to the application database through each Flink task node.

4. Data preparation-data slicing

Time window slice

Slice according to the test time point, and fix the data from 0 o'clock on the day to the execution time period to ensure that the data is no longer updated.

Business scenario slicing

Different business scenarios are iteratively sliced, and the data stream is delivered to provide a variety of downstream scenario data. The slices are fixed for the iterated business scenario data. Such as: fields_a='b'

5. Active and standby link data Diff-denoising

Data drift problem

Symptom: The data flow is constantly updated. The data flow of the same business data is updated to the latest one. The main link may enter the partition of the day, and the backup link may enter the partition of the next day.

Denoising plan: Take the last piece of data from the data stream.

098.jpeg

Data update frequency issue

Symptom: During the update process of the same business data, the main link may be updated 10 times, the data will not be changed for the next five times, and the backup link will be updated only 5 times.

Denoising solution: Take N pieces of data from the data stream for the same business data.

076.png

Data update timeliness issue

Problem phenomenon: During the same business data update process, the main link updates three data to 11.68, 12.9, and 13.05; the backup link updates three data to 11.68, 12.9, and 13.1; it can be seen that the data updated in the next time are not the same. Same.

Denoising solution: The data streams of the same business data are merged into a list, and the master and slave mutually determine whether the last data exists in the data stream list intercepted by the other party.

099.png

The problem of inconsistent attribute field values

Symptom: There are empty characters, null, 0 and 0.0. The Diff result is failed, but the actual business meaning is OK.

Denoising solution: Diff after unified conversion.

The problem of inconsistency in the message field parsing attribute fields of the active and standby links

Symptom: The message field stores data in JSON format. For the same piece of business data, the attribute fields corresponding to the JSON parsed by the primary and secondary links are not completely consistent, and there are differences between the two.

Denoising solution: parse out all attribute fields through code to ensure complete Diff.

message template:

{"fields_a":"20230628","fields_b":"2023-06-22 19:48:24","fields_c":"2","fields_d":"plan","fields_e":"3******","fields_f":"0.0","fields_g":"2","fields_h":"4*****","fields_i":"ext","fields_j":"binlog+odps","fields_k":"2","fields_l":"STATUS_*****","fields_m":"1********","fields_n":"孙**","fields_o":"2023-06-28T22:19:43.872"}

Convert JSON:

{
        "fields_a": "20230717",
        "fields_d": "plan",
        "fields_e": "3******",
        "fields_aj": "33761.125",
        "fields_p": "37934.0",
        "fields_r": "1250.412",
        "fields_s": "1250.412",
        "fields_t": "33761.125",
        "fields_w": "33761.125",
        "fields_m": "1*********",
        "fields_v": "33761.125",
        "fields_y": "33761.125",
        "fields_n": "孙**",
        "fields_z": "1250.412",
        "fields_ai": "27",
        "fields_ak": "",
        "fields_aa": "33761.125",
        "fields_ab": "33761.125",
        "fields_ac": "33761.0",
        "fields_al": "0.1002",
        "fields_i": "***",
        "fields_j": "***",
        "fields_k": "2",
        "fields_ad": "1.0",
        "fields_ak": "37934.0",
        "fields_x": "1250.412",
        "fields_y": "0.0",
        "fields_ag": "27",
        "fields_af": "27",
        "fields_ah": "0.0",
        "fields_al": "0.0",
        "fields_am": "0.0",
        "fields_ao": "37934.0",
        "fields_ap": "37934.0",
        "fields_an": "33761.125",
        "fields_aq": "1*********",
        "fields_ae": "27",
        "fields_o": "2023-07-17T23:59:00.103",
        "fields_ar": "0.1002"
}

The above five issues can be denoised through SQL. The overall denoising SQL template is as follows:

SET odps.sql.mapper.split.size = 64;
SET odps.stage.joiner.num = 4000;
SET odps.stage.reducer.num = 1999;
CREATE TABLE table_diff AS
SELECT  a.fields_as AS fields_as_main
        ,b.fields_as AS fields_as_branch
        ,a.fields_at AS fields_at_main
        ,b.fields_at AS fields_at_branch
        ,a.fields_d AS fields_d_main
        ,b.fields_d AS fields_d_branch
        ,a.fields_i AS fields_i_main
        ,b.fields_i AS fields_i_branch
        ,a.fields_j AS fields_j_main
        ,b.fields_j AS fields_j_branch
        ,a.fields_aw AS fields_aw_main
        ,b.fields_aw AS fields_aw_branch
        ,a.fields_k_json_key AS fields_k_json_key_main
        ,b.fields_k_json_key AS fields_k_json_key_branch
        ,a.fields_k_json_key_list AS fields_k_json_key_list_main
        ,b.fields_k_json_key_list AS fields_k_json_key_list_branch
        ,CASE   WHEN a.fields_k_json_key = b.fields_k_json_key THEN 0
                WHEN b.fields_k_json_key_list RLIKE a.fields_k_json_key THEN 0
                WHEN a.fields_k_json_key_list RLIKE b.fields_k_json_key THEN 0
                ELSE 1
        END AS fields_k_json_key_diff_flag
FROM    (
            SELECT  fields_as
                    ,fields_at
                    ,fields_d
                    ,fields_i
                    ,fields_j
                    ,fields_aw
                    ,MAX(CASE WHEN rn = 1 THEN fields_k_json_key END) AS fields_k_json_key
                    ,CONCAT_WS(',',COLLECT_SET(fields_k_json_key)) AS fields_k_json_key_list
            FROM    (
                        SELECT  *
                                ,CASE   WHEN NVL(GET_JSON_OBJECT(message,'$.fields_k'),'') = '' THEN '---'
                                        WHEN GET_JSON_OBJECT(message,'$.fields_k') IN ('0','0.0') THEN '0-0-0'
                                        ELSE GET_JSON_OBJECT(message,'$.fields_k')
                                END AS fields_k_json_key
                                ,ROW_NUMBER() OVER (PARTITION BY fields_as,fields_at,fields_d,fields_i,fields_j,fields_aw ORDER BY offset DESC ) AS rn
                        FROM    table_main
                        WHERE   pt = 20230628
                        -- AND     fields_i = 'realMetric'
                    ) 
            WHERE   rn < 6
            GROUP BY fields_as
                     ,fields_at
                     ,fields_d
                     ,fields_i
                     ,fields_j
                     ,fields_aw
        ) a
LEFT JOIN   (
                SELECT  fields_as
                        ,fields_at
                        ,fields_d
                        ,fields_i
                        ,fields_j
                        ,fields_aw
                        ,MAX(CASE WHEN rn = 1 THEN fields_k_json_key END) AS fields_k_json_key
                        ,CONCAT_WS(',',COLLECT_SET(fields_k_json_key)) AS fields_k_json_key_list
                FROM    (
                            SELECT  *
                                    ,CASE   WHEN NVL(GET_JSON_OBJECT(message,'$.fields_k'),'') = '' THEN '---'
                                            WHEN GET_JSON_OBJECT(message,'$.fields_k') IN ('0','0.0') THEN '0-0-0'
                                            ELSE GET_JSON_OBJECT(message,'$.fields_k')
                                    END AS fields_k_json_key
                                    ,ROW_NUMBER() OVER (PARTITION BY fields_as,fields_at,fields_d,fields_i,fields_j,fields_aw ORDER BY offset DESC ) AS rn
                            FROM    table_branch
                            WHERE   pt = 20230628
                            -- AND     fields_i = 'realMetric'
                            and fields_d !='group'
                        ) 
                WHERE   rn < 6
                GROUP BY fields_as
                         ,fields_at
                         ,fields_d
                         ,fields_i
                         ,fields_j
                         ,fields_aw
            ) b
ON      NVL(a.fields_as,'-00') = NVL(b.fields_as,'-00')
AND     NVL(a.fields_at,'-00') = NVL(b.fields_at,'-00')
AND     NVL(a.fields_d,'-00') = NVL(b.fields_d,'-00')
AND     NVL(a.fields_i,'-00') = NVL(b.fields_i,'-00')
AND     NVL(a.fields_j,'-00') = NVL(b.fields_j,'-00')
AND     NVL(a.fields_aw,'-00') = NVL(b.fields_aw,'-00')
;

Field denoising problem

Symptom: When field logic modification is involved, the Diff result fails, affecting the Diff result.

Denoising solution: It is necessary to abandon the logically modified fields, no longer judge the logically modified fields, and flexibly control them through Java.

String[] jsonColumnListStrings = jsonColumnList.split(",");
List<String> jsonColumnLists = new ArrayList<String>();
String[] iterationColumnStrings = iterationColumn.split(",");
List<String> iterationColumnLists = Arrays.asList(iterationColumnStrings);
for (String s:jsonColumnListStrings){
    if(!iterationColumnLists.contains(s)){//判断字段是否为去噪字段
        jsonColumnLists.add(s);
    }
}

0776.png

6. Diff result analysis

Based on the SQL synthesized by the active and standby Diff, a comparative result table can be produced, and the analysis of the execution results can determine whether the execution passed or not.

Analysis logic 1: Determine the passing proportion of each comparison field

Provide R&D analysis on which parsed fields have a low pass rate.

Analysis logic 2: Determine the proportion of all fields passing the total number of records

This indicator can determine whether the Diff passes. If it accounts for 99.9%, it means it passed.

Analyze SQL sample:

SELECT  round(SUM(CASE WHEN fields_k_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_k_ratio
        ,round(SUM(CASE WHEN fields_m_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_m_ratio
        ,round(SUM(CASE WHEN fields_e_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_e_ratio
        ,round(SUM(CASE WHEN fields_a_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_aratio
        ,round(SUM(CASE WHEN fields_n_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_n_ratio
        ,round(SUM(CASE WHEN fields_p_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_p_ratio
        ,round(SUM(CASE WHEN fields_ac_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_ac_ratio
        ,round(SUM(CASE WHEN fields_ar_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS fields_ar_ratio
        ,round(SUM(CASE WHEN fields_k_json_key_diff_flag = 0 AND fields_m_json_key_diff_flag = 0 AND fields_e_json_key_diff_flag = 0 AND fields_a_json_key_diff_flag = 0 AND fields_n_json_key_diff_flag = 0 AND fields_p_json_key_diff_flag = 0 AND fields_ac_json_key_diff_flag = 0 AND fields_ar_json_key_diff_flag = 0 THEN 1 ELSE 0 END) / COUNT(1) * 100,4) AS total_ratio
        ,COUNT(1) AS total_cnt
FROM    table_diff
;

7. Tool servitization

Backend service processing logic

Primary and secondary comparison SQL synthesis

Implant Diff's SQL into the code, control data slicing, denoising and other scenarios through the code to complete the test SQL synthesis.

for(String s:jsonColumnLists){
    selectSql1 = selectSql1 + " case when NVL(GET_JSON_OBJECT(message,'$." + s + "'),'')='' then '---' when get_json_object(message,'$." + s + "') in ('0','0.0') then '0-0-0' else get_json_object(message,'$." + s + "') end  AS " + s + "_json_key,";
    selectSql2 = selectSql2 + " max(case when rn =1 then " + s + "_json_key end) as " + s + "_json_key,concat_ws(',',collect_set(" + s + "_json_key)) as " + s + "_json_key_list,";
    mergeSql = mergeSql + " a." + s + "_json_key as " + s + "_json_key_main,b." + s + "_json_key as " + s + "_json_key_branch,a." + s + "_json_key_list as " + s + "_json_key_list_main,b." + s + "_json_key_list as " + s + "_json_key_list_branch,case when a." + s + "_json_key = b." + s + "_json_key then 0 when b." + s + "_json_key_list rlike a." + s + "_json_key then 0 when a." + s + "_json_key_list rlike b." + s + "_json_key then 0 else 1 end as " + s + "_json_key_diff_flag,";
}
rowNumberSql ="ROW_NUMBER() OVER (PARTITION BY fields_as,fields_at,fields_d,fields_i,fields_j,fields_aw ORDER BY offset DESC ) AS rn ";
selectSql1 = selectSql1 + rowNumberSql;
whereSql1 = whereSql1 + bizdate + " AND fields_i = 'realMetric' ";
String pretreatmentSqlMain = "";
String pretreatmentSqlBranch = "";
pretreatmentSqlBranch = selectSql2.substring(0,selectSql2.length()-1) + " from(" + selectSql1 + " from " + branchLinkTableName + whereSql1 + ")" + whereSql2 + groupSql.substring(0,groupSql.length()-1);
pretreatmentSqlMain = selectSql2.substring(0,selectSql2.length()-1) + " from(" + selectSql1 + " from " + masterLinkTableName + whereSql1 + ")" + whereSql2 + groupSql.substring(0,groupSql.length()-1);
mergeSql = mergeSql.substring(0,mergeSql.length()-1) + " from (" + pretreatmentSqlMain + ")a left join (" + pretreatmentSqlBranch + ")b " + joinSql.substring(0,joinSql.length()-3) + ";";
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyyMMddHHmmss");
String dateStr = simpleDateFormat.format(new Date());
this.resultDataCreateSql = "set odps.sql.mapper.split.size=64;set odps.stage.joiner.num=4000;set odps.stage.reducer.num=1999; create table du_temp.diff_main_branch_" + dateStr  + "_test as " + mergeSql;
log.info(resultDataCreateSql);
this.resultDataTable = "du_temp.diff_main_branch_" + dateStr  + "_test";
log.info(resultDataTable);
//合成过滤结果数据的sql
String resultSql = " select ";
String totalResultSql = "round(sum(case when ";
for(String s:jsonColumnLists){
    resultSql = resultSql + " round(sum(case when " + s + "_json_key_diff_flag = 0 then 1 else 0 end)/count(1)*100,4) as " + s + "_ratio,";
    totalResultSql = totalResultSql + " " + s + "_json_key_diff_flag = 0 and";
}
this.resultDataFiltrate = resultSql + totalResultSql.substring(0,totalResultSql.length()-3) + " then 1 else 0 end)/count(1)*100,4) as total_ratio , count(1) as total_cnt from " + this.resultDataTable + ";";
log.info(resultDataFiltrate);

Diff result report analysis

...}
else if(testType.equals("主备diff")) {
    for (Map.Entry entry:testResultRecord.entrySet()) {
        List<String> listValue = (List<String>) entry.getValue();
        this.resultData.put(entry.getKey().toString(),listValue.get(0)) ;
        if(Double.parseDouble(listValue.get(0))< 99.9 & !entry.getKey().toString().equals("total_cnt")){
            this.failDetail.put(entry.getKey().toString(),listValue.get(0)) ;
        }
    }
    if(failDetail.size()>0){
        this.testStatus = "失败";
    }else {
        this.testStatus = "成功";
    }
}

Platform visualization

  • Create task

27.png

  • execution list

84.png

  • Result Report-Platform Display

As shown below: the result of an execution failure, the pass rate is 99.8471, which does not reach 99.99% .73.png

  • Result Report-Feishu Notification

Here's an example:

Execution requirement name: Active-backup Diff-521 Executor: *** Execution type: Active-backup Diff Execution number: 20230628204636 Execution backup link table name: table_main Execution main link table name: table_branch Execution backup link table partition: 20230628 Execution Result details table: table_diff Execution result details summary: fields_am_ratio:99.9958 fields_z_ratio:99.9826 fields_af_ratio:99.9856 fields_ba_ratio:99.9964 fields_al_ratio:99.9915 fields_ad_ratio:99.9873 fields_r_ratio:99.9826 fields_aa_ratio:99 .9906 fields_ai_ratio:99.9856 fields_v_ratio:99.9917 fields_ak_ratio:99.9909 fields_m_ratio:99.9969 fields_ak_ratio:99.9945 fields_bb_ratio :99.9964 fields_bc_ratio:99.9957 fields_bd_ratio:99.9954 fields_ae_ratio:99.9856 fields_be_ratio:99.9952 fields_bf_ratio:99.9955 fields_t_ratio:99.9917
fields_ag_ratio:99.9856 fields_p_ratio:99.9909 fields_bg_ratio:99.9948 fields_a_ratio:99.9969 fields_d_ratio:99.9969 fields_x_ratio:99.9826 fields_an_ratio:99.9917
fields_ap_ratio:99.9909 fields_ar_ratio:99.9915 fields_y_ratio:99.9917 fields_bh_ratio:99.9955 fields_aj_ratio:99.9916 fields_bi_ratio:99.987 fields_ac_ratio:99.9908 fields_s_ratio:99.9826 fields_ab_ratio:99.9906 fields_i_ratio:99.9969 fields_bj_ratio:99.9951 fields_ah_ratio:99.9959
fields_k_ratio:99.9969
fields_e_ratio:99.9969 fields_bk_ratio:99.9962 fields_bl_ratio:99.8748 fields_al_ratio:99.9958 fields_j_ratio:99.9969
fields_bm_ratio:99.9951 fields_n_ratio: 99.9969 fields_ao_ratio:99.9909 fields_w_ratio:99.9906 fields_bn_ratio:99.9965 fields_bo_ratio:99.9912 fields_bcrate_ratio:99.987 fields_y_ratio:99.9958 Summary data of active and standby diff execution results: total_ratio:99.8471 total_cnt:7142
59 Execution result failure details:
fields_bl_ratio:99.8748
total_ratio:99.8471% Execution result status: failed

8. Access and release process of active and standby diff tools

When the deployed backup link finally passes the test of the active and backup Diff tools, it will be online. It is currently equivalent to a backup production line.

For subsequent version iterations, if the requirements are verified by the Diff tool before going online, they will meet the online requirements.

68.jpeg

9. Summary

Real-time computing is different from offline data warehouses. The stability and accuracy of data are difficult to control. Complex links cannot guarantee the quality of the overall data through simple testing. The form of dual-link Diff can better guarantee real-time data during iterations. the quality of.

For the implementation of active and standby Diff: the biggest pain point is often that the noise of the data is very large, which requires technical means to reduce noise to ensure the accuracy and reliability of the data comparison results.

* Text/Shiyu

This article is original to Dewu Technology. For more exciting articles, please see: Dewu Technology official website

Reprinting without the permission of Dewu Technology is strictly prohibited, otherwise legal liability will be pursued according to law!

Broadcom announced the termination of the existing VMware partner program . Site B crashed twice, Tencent's "3.29" level one incident... Taking stock of the top ten downtime incidents in 2023, Vue 3.4 "Slam Dunk" released, Yakult confirmed 95G data Leaked MySQL 5.7, Moqu, Li Tiaotiao... Taking stock of the (open source) projects and websites that will be "stopped" in 2023 "2023 China Open Source Developer Report" is officially released Looking back at the IDE 30 years ago: only TUI, bright background color …… Julia 1.10 officially released Rust 1.75.0 released NVIDIA launched GeForce RTX 4090 D specially for sale in China
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/10568112