Python batch data processing - mutual conversion between underscore and camel case

background:

The data read by python (such as database data) is often in the form of list. When processing data (such as formatted kafka, or json format), because the coding style requires that the code must be one of hump or underscore, it is converted into another One form often uses a form such as data_dict['xxx']='yyyy', which has such disadvantages:

  • 1. Poor scalability
  • 2. Hardcoded
  • 3. Lots of repetitive code
  • 4. Unsightly

In view of the above shortcomings, the following tool classes are made:

method:

1. Define the data format class: the name of the key and the corresponding position of the list index

2. Define the tool class: automatically identify the variable name of the data format class

3. Dynamically obtain the variable key: convert according to the corresponding conversion rules (hump -> x underscore underscore -> hump)

4. Convert to dictionary or json

Essence:

  • 1、__dict__
  • 2、setattr
  • 3. Class initialization

The specific implementation is as follows:

import re

class User(object):
    USER_NAME = 0
    USER_SOURCE = 1
    USER_AGE = 2
    USER_SCORE = 3


class Employee:
    employeeName = 0
    employeeId = 1
    employeeDepartment = 2
    employeeTitle = 3


class FormatDictTool(object):
    """
    将list数据(下划线式)转换成驼峰式的字典结构
    """
    def __init__(self, class_name, trans_type='camel'):
        func = self._to_lower_camel
        if trans_type == 'snake':
            func = self._to_snake
        for key, index in class_name.__dict__.items():
            if '__' in key:
                continue
            new_key = func(key)
            setattr(self, new_key, index)

    @staticmethod
    def _to_lower_camel(name: str):
        """下划线转小驼峰法命名"""
        return re.sub('_([a-zA-Z])', lambda m: (m.group(1).upper()), name.lower())

    def _to_snake(self, name: str) -> str:
        """驼峰转下划线"""
        if '_' not in name:
            name = re.sub(r'([a-z])([A-Z])', r'\1_\2', name)
        else:
            raise ValueError(f'{name}字符中包含下划线,无法转换')
        return name.lower()

    def list2dict(self, arrays):
        return {key: arrays[index] for key, index in self.__dict__.items()}


if __name__ == '__main__':
    tool = FormatDictTool(class_name=User)
    user_data_list = [['name1', 'source1', 'user_source1', 20.3], ['name2', 'source2', 'user_source2', 50]]
    print('User:')
    for data in user_data_list:
        print(tool.list2dict(data))
    
    tool = FormatDictTool(class_name=Employee, trans_type='snake')
    employee_data_list = [['name1', 'id1', 'department1', 'title1'], ['name2', 'id2', 'department2', 'title2']]
    print('Employee:')
    for data in employee_data_list:
        print(tool.list2dict(data))

result:

User:
{'userName': 'name1', 'userSource': 'source1', 'userAge': 'user_source1', 'userScore': 20.3}
{'userName': 'name2', 'userSource': 'source2', 'userAge': 'user_source2', 'userScore': 50}
Employee:
{'employee_name': 'name1', 'employee_id': 'id1', 'employee_department': 'department1', 'employee_title': 'title1'}
{'employee_name': 'name2', 'employee_id': 'id2', 'employee_department': 'department2', 'employee_title': 'title2'}

reference:

Python realizes the conversion between hump-style naming and underscore naming

Guess you like

Origin blog.csdn.net/qq_19446965/article/details/124359240