About python to realize the reading of xml and the change of the attribute value of the label node

Reason for publication

A friend who was doing testing recently asked me about the operation of xml files, but because I am not talented, as a front-end, I am not very familiar with python, but fortunately in this era, the Internet is developed, and I soon learned that this library xml.etree.ElementTreecan Operate the xml, and at the same time, the xpath format path can easily achieve the node we want and change it. Now that we have found the method, let's try to write it directly. Unfortunately, we have encountered many unknown problems.

A small test (remember to add the code here, otherwise if the xml is in Chinese, it will lead to garbled characters)

import xml.etree.ElementTree as ET

# 解析 XML 文件
with open('input.xml', 'r', encoding='utf-8') as f:
    tree = ET.parse(f)

root = tree.getroot()


# 查找所有符合条件的节点,修改属性值
for id_node in root.findall(".//id[@age='10']"):
    id_node.set('age', '18')
# 将修改后的 XML 写回到文件中
tree.write('output.xml',encoding='utf-8',xml_declaration=True)

Through this code, I found that python is really simple, but what problems did we encounter, once our xml file has a namespace, then this xpath cannot locate the node we want, that is to say, we pass The findall method did not find anything, but the tag node in the actual xml file, the corresponding attribute value exists, where is the problem?

xpath problem correction

When our xml file has a command space, then our xpath needs to bring our namespace (the so-called xml namespace is the value represented by the attribute value in the root node xmlns), as follows, for example, our xml
file as follows

<?xml version="1.0" encoding="UTF-8"?>
<a xmlns="urn:test-org:v1">
  <id name="Anna" age="10"  />
  <id name="Bob" age="12"  />
</a>
# 其实我们的xpath路径规则是这样写的,这意思就是我们找的是id标签节点,属性值age=10的节点数据
".//{urn:test-org:v1}id[@age='10']"


# 我们修改上述错误代码变更为 ,修改age 为10的变更为18
for id_node in root.findall(".//{urn:test-org:v1}id[@age='10']"):
    id_node.set('age', '18')

Update the output xml file, each node has its own prefix ns0

insert image description here

Then this is what we don't want. Why does this happen?

Solve the prefix problem

# 这里就是将我们xml中自定义的命名空间转换为"",这样我们就能得到不带前缀的xml更新文件了 
ET.register_namespace("",'urn:test-org:v1')

The actual running effect is as follows
insert image description here
The complete code example shows

import xml.etree.ElementTree as ET

ET.register_namespace("",'urn:test-org:v1')
# 解析 XML 文件
with open('input.xml', 'r', encoding='utf-8') as f:
    tree = ET.parse(f)

root = tree.getroot()

# 查找所有符合条件的节点,修改属性值
for id_node in root.findall(".//{urn:test-org:v1}id[@age='10']"):
    id_node.set('age', '18')

# 将修改后的 XML 写回到文件中
tree.write('output.xml',encoding='utf-8',xml_declaration=True)

Since I like to do some encapsulation, I have done some data query for the acquisition of the xml namespace, the traversal of the dictionary in python, the variable parameter passing of the string, and the method of the file operating system. The following is what I need according to my friends. method of encapsulation. We only need to care about the xml file path and change the parameter dictionary

The optimized extraction method is provided

import xml.etree.ElementTree as ET
import os


def update_xml_data_by_xpath(xml_file_path, attr_dic):
    """局部更新xml文件,根据指定节点属性

    Args:
        xml_file_path (string): xml文件路径
        attr_dic (dictionary): 所需更新的节点 参数格式为
        {节点标签名:{attr:"所需更改的节点属性名",old_value:原本属性所对应的值,new_val:更新后的值}}
        示例:{"id":{"attr":"extension","old_val":"00000001","new_val":"00000002"}}

    Returns:
        _type_: NONE
    """
    if not xml_file_path:
        return print("xml file path can't be empty")
    if not attr_dic:
        return print("attr_dic  can't be empty")

    # 读取xml 文件,这里主要是避免文件中存在中文,读取乱码现象
    with open(xml_file_path, 'r', encoding='utf-8') as f:
        # xml 文档转化为 节点元素树
        tree = ET.parse(f)
        # 获取树根元素
        root = tree.getroot()
        namespace = ""
        # 避免存在xml 文件不存在命名空间的情况,导致数组超界发生
        try:
            namespace = root.tag.split('}')[0].split("{")[1]
            # 命名空间前缀
            ET.register_namespace("", namespace)
        except IndexError:
            print('该xml文件不存在命名空间,可不替换处理')

    for key, value in attr_dic.items():
        attr = value.get('attr')
        old_val = value.get('old_val')
        new_val = value.get('new_val')
        if namespace:
            xpath = f".//{
     
     {
     
     {
      
      namespace}}}{
      
      key}[@{
      
      attr}='{
      
      old_val}']"
        else:
            xpath = f".//{
      
      key}[@{
      
      attr}='{
      
      old_val}']"

        # 通过xpath 匹配所需要找的节点所对应的属性值数据,并修改属性数据
        for id_node in root.findall(xpath):
            id_node.set(attr, new_val)

        # 获取文件名
    file_name = os.path.basename(xml_file_path)

    # 获取文件所在目录
    dir_path = os.path.dirname(xml_file_path)

    # 更新后的文件路径
    update_file_path = os.path.join(dir_path, f'update_{
      
      file_name}')

    # 输出我们更改后的xml 文件
    tree.write(update_file_path, encoding='utf-8',
               xml_declaration=True, method="xml")
    print(
        f'The xml file is updated successfully and the file is output to{
      
      update_file_path}')


update_xml_data_by_xpath(
    './input.xml', {
    
    "id": {
    
    "attr": "age", "old_val": "18", "new_val": "20"}})

Add the judging process that the xml file of the command space does not exist, and hope that everyone can provide comments and references.

Guess you like

Origin blog.csdn.net/weixin_39370315/article/details/130232445