Python compares Excel table and json data

Note: Since I don't do professional data processing and don't know much, the python code I write is simple and easy to understand. If there are bugs or better implementation methods, you are welcome to point them out! ! !

Some time ago, as a test novice, I received a new job from the boss, comparing the mysql database and the Excel source table. Although I have never been in touch with data testing, but based on the principle that we are all bricks, wherever we need to move, we should do it quickly.

At the beginning, I planned to use Python to connect to read the database, and then read the Excel table to compare the data. However, I do not have permission to the database, and only have the export permission of a visual database Metabase. I fiddled with it, and it is no problem for Metabase to export data in json format, so it became a comparison between Excel table and json data.

So the direction is set, how to compare next? (in deep thought...)

Since python is used for comparison, it must first be converted into the same data type for comparison. Then the preliminary solution is to convert Excel and json into lists with multiple dictionary type data nested in them, that is, [{}, {},{}...] format data.

The first is to read Excel data and convert it into [{},{},{}...] format, the following figure is the output after implementation.

The following is the code implementation idea:

"""
    @功能:用于读取Excel中数据,并将其转换为[{},{},{}...]
"""

# 根据excel路径及 sheet页名称将对应的sheet页内容读入到字典中
def read_excel_into_list(excel_file_path, sheet_name):
    # 1、根据excel路径将数据读入到工作薄中(读入内存)
    work_book = file_read_write_utils.read_from_excel(excel_file_path)
    # 2、将对应的sheet页内容读入到字典中
    sheet = work_book.sheet_by_name(sheet_name)
    return convert_sheet_context_into_list(sheet, sheet_name)


# 将对应的sheet页内容读入到字典中
def convert_sheet_context_into_list(sheet, sheet_name):
    fld_key_list = []
    fld_val_list = []
    new_val_list = []
    new_val_list2 = []
    for col_index in range(sheet.nrows):
        for row_index in range(sheet.ncols):
            # 将栏位名称存入fld_key_list
            fld_key_list.append(sheet.cell_value(0, row_index))
            # 将第二行及之后栏位值存入fld_value_list
            if col_index >= 1:
                fld_val_list.append(sheet.cell_value(col_index, row_index))
        if col_index >= 1:
            dict1 = dict(zip(fld_key_list, fld_val_list))
            new_val_list.append(dict1)

    if sheet_name == 'XXXX':
        for table in new_val_list:
            new_val_list2.append(table)
            i = table_none_format(new_val_list2)
        print(f'共{i}行')
    # print("根据Excel表格读取到的列表为: \n" + str(new_val_list2))
    return new_val_list2

The next step is to read the json data and convert it into [{},{},{}...] format, the following figure is the output after implementation.

The following is the code implementation idea:


import json
import os

import xlrd2
import xlwt
from xlutils.copy import copy


# 将json文件对象中的数据直接转换成 Python 列表
def read_from_json(json_file_path, model, char_set):
    i = 0
    if '' == json_file_path.strip():
        json_file_path = json_file_path
    if '' == char_set.strip():
        char_set = "UTF-8"
    if '' == model.strip():
        model = "r"

    with open(json_file_path, model, encoding=char_set) as f:
        json_data = json.load(f)
        for table in json_data['data']:
            i += 1
        # print("根据json报文读取到的列表为:\n" + str(json_data['data']))
        print(f"共{i}行")   
        return json_data['data']

Then after reading the data, the next step is to compare the data of the same data type. Since my two sets of data do not have a unique primary key, during the comparison process, I identify uniqueness through two or more key fields of each dictionary in the data (due to data modeling, two or more key The field can identify the uniqueness of the piece of data, which depends on the actual project data~~), the figure below is the output after implementation.

The following is the code implementation idea:

"""
    @功能:比较两个列表中的字典是否相同
    注意:1、由于比对的mysql表精度和Excel精度可能存在差异,需要在之前进行数据处理
"""


def compare_all_different(list1, list2):
    diff = []
    i = 0
    a = 0
    print("开始比较list1(表格读取) 和 list2(json读取) 的所有差异:")
    # 首先判断Excel 和 json 数据总数是否一致
    if len(list1) != len(list2):
        print("表格数据总数和json数据总数对应不上!")
    else:
        # 其次获取list1中的一条字典数据,再获取list2中对应的一条字典数据,进行两条字典数据的比对
        for dict1 in list1:
            year = dict1['年份']  # 年份
            province_name = dict1['地区']  # 地区
            gdp = dict1['gdp']  # gdp
            gdp_person = dict1['gdp_person'] # gdp_person
            # 通过一个字段标志另一个列表中的唯一字典
            dict2 = get_dict_wih_same_key(new_list1=[year, province_name, gdp, gdp_person], list2=list2)
            # 接下来就是两条字典数据比对
            try:
                differ = set(dict1.items()) ^ set(dict2.items())
                a += 1
                if len(differ) != 0:
                    i += 1
                    print(f"【{i}】--\ndict1(表格读取):\n{dict1}\ndict2(json读取):\n{dict2}\n相同关键字的栏位取值有差异,差异是:{differ}")
                    for item in list(differ):
                        diff.append(item)
                else:
                    # i += 1
                    # print(f"第{i}行数据比对一致")
                    pass
            except AttributeError:
                pass
        print(a)
        return diff


def get_dict_wih_same_key(new_list1, list2):
    i = 0
    for dict2 in list2:
        if dict2['年份'] == new_list1[0] and dict2['地区'] == new_list1[1]:
                return dict2

The above is my humble opinion of the test Xiaobai. In the new year, I hope everyone goes well~~

Guess you like

Origin blog.csdn.net/xiaolu_z/article/details/128533231
Recommended