Detailed explanation of Python3 XML processing module

Table of contents

1: XML file format

2: ElementTree parses XML files

Three: Search for Element

Four: Modification of Element

Five: Deletion of Element

Six: Addition of Element


         XML is an inherently hierarchical data format. The most natural way of representation is to parse it into a tree shape. Many configuration files use XML for configuration storage. In the daily maintenance of XML, it often involves the addition, deletion, modification and checking of XML files. It is also very troublesome to hard parse XML from scratch.

        Python3 has a built-in xml processing module xml.etree.ElementTree  that can help us parse xml and support the addition, deletion, modification and query of xml. Next, we explore this module from four aspects: addition, deletion, modification and query.

1: XML file format

        The overall look of xml is a tree-like hierarchical structure. An example of an xml file is given below. The subsequent addition, deletion, modification and query operations of xml will take this xml as an example:

<?xml version="1.0" encoding="utf-8"?>
<addr_info id="中国">
   <R1 type="上海">
       <device_type>黄埔区</device_type>
       <username>admin</username>
       <people_num>一百万</people_num>
       <company>zte.com.cn</company>
   </R1>
   <SW3 type="南京">
       <device_type>江宁区</device_type>
       <username>admin</username>
       <people_num>两百万</people_num>
       <company>baidu.com.cn</company>
   </SW3>
</addr_info>

Two: ElementTree parses XML files

We use ElementTree to parse the above xml file. The specific usage is as follows:

import xml.etree.ElementTree as ET


tree = ET.parse('eg.xml')#直接读取xml文件,形成ElementTree结构
root = tree.getroot() # 获取root tag
print('tag:',root.tag) # 打印root的tag
print('attrib:',root.attrib) # 打印root的attrib
# 使用root索引访问标签的值,[0]是R1标签,[0]是R1标签中的第一个标签device_type, .text是取这个标签的值,自然值就是cisco_ios
print(root[0][0].text)

for child in root: # 打印root的child层的tag和attrib
   print(child.tag, child.attrib)

operation result:

tag: addr_info
attrib: {'id': '中国', 'topic': 'ftz'}
黄埔区
R1 {'type': '上海'}
SW3 {'type': '南京'}

We can view the attributes supported by root through dir

['__class__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__',
 '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
 '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', 
'__init_subclass__', '__le__', '__len__', '__lt__', '__ne__', '__new__', 
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', 
'__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'attrib',
 'clear', 'extend', 'find', 'findall', 'findtext', 'get', 'getchildren', 
'getiterator', 'insert', 'items', 'iter', 'iterfind', 'itertext', 'keys', 
'makeelement', 'remove', 'set', 'tag', 'tail', 'text']

Commonly used attributes of Element are as follows:

1、tag

Tag is a str object, which represents an xml tag. In the example, the device_type is closed before and after

2、attrib

attrib is a dict object, representing xml attributes, type="Shanghai" in the example

3、text

Text is the content wrapped in the xml data tag, and it is also the content of Element. In the example, admin is one million

4、child elements

Child elements are a subset contained in a pair of xml tags, similar to the content wrapped in the R1 and SW3 tags in the above example.

Three: Search for Element

Element has a lot of search methods, which are summarized as follows:

iter(tag=None) Traverse the children of Element, you can specify the tag to search accurately
findall(match) Find the child node that can match the current element tag or path
find(match) Find the first child node that can match the current element tag or path
get (key, default=None) Get the attrib corresponding to the specified key of the element. If there is no attrib, return default. 

We use iter and findall as examples to find the population in the above example

import xml.etree.ElementTree as ET


tree = ET.parse('eg.xml')
root = tree.getroot()

#iter查找
for addr in root.iter():
    if addr.tag == 'people_num':
        print("people_num=",addr.text)

#findall查找
for people in root.findall('R1'):
    peopleNum = people.find('people_num').text
    print("people_num=",peopleNum)

for people in root.findall('SW3'):
    peopleNum = people.find('people_num').text
    print("people_num=",peopleNum)

operation result:

Four: Modification of Element

The modification method of Element is as follows:

Element.text Directly modify the field
Element.remove() Delete the field
Element.set() Add or modify the attribute attrib
with Element.append() Add a new child 

        We change the population of Huangpu District in Shanghai from one million to ten million. We first need to find the R1 node, then find the sub-node of people_num under the R1 node, find the sub-node and modify its content. This uses the above mentioned method. The text attribute you get is to modify the text.

import xml.etree.ElementTree as ET


tree = ET.parse('eg.xml')
root = tree.getroot()


for addr in root.iter('R1'):
    addr.find('people_num').text = '一千万'

tree.write('./eg2.xml',encoding='utf-8')

After confirming that the modification is completed, you need to use the ElementTree.write() method to write. The method of using write is as follows:

Five: Deletion of Element

The deletion method of Element is as follows:

remove  移除节点

Let’s take the above example and delete the entire SW3 node.

import xml.etree.ElementTree as ET


tree = ET.parse('eg.xml')
root = tree.getroot()

for addr in root.findall('SW3'):
    root.remove(addr)

tree.write('./eg2.xml',encoding='utf-8')

So what if I want to delete the child node username of the R1 node? I haven’t figured it out yet. I will update it later.

Six: Addition of Element

The method of adding nodes to Element is as follows:

ET.SubElement

We add a new node Huanggang node and the child nodes below the node. The method is as follows:

import xml.etree.ElementTree as ET


xmlParse = ET.parse('eg.xml')
root = xmlParse.getroot()
tree = ET.ElementTree(root)

#增加R3节点
hubeiNode = ET.SubElement(root,'R3')
hubeiNode.attrib = {'type':'黄冈'}
#增加R3节点的子节点

huanggang = ET.SubElement(hubeiNode,'device_type')
huanggang.text = '黄梅县'

huanggang2 = ET.SubElement(hubeiNode,'username')
huanggang2.text = 'admin'

huanggang3 = ET.SubElement(hubeiNode,'people_num')
huanggang3.text = '五十万'

huanggang4 = ET.SubElement(hubeiNode,'company')
huanggang4.text = 'feidadun.com'

tree.write('./eg2.xml', encoding='utf-8', xml_declaration=True, short_empty_elements=True)

Although we have achieved our goal, it is squeezed into one line after writing, which is inconvenient to read. Use the following function to beautify it.

def pretty_xml(element, indent, newline, level=0):  # elemnt为传进来的Elment类,参数indent用于缩进,newline用于换行
    if element:  # 判断element是否有子元素
        if (element.text is None) or element.text.isspace():  # 如果element的text没有内容
            element.text = newline + indent * (level + 1)
        else:
            element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * (level + 1)
            # else:  # 此处两行如果把注释去掉,Element的text也会另起一行
            # element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * level
    temp = list(element)  # 将element转成list
    for subelement in temp:
        if temp.index(subelement) < (len(temp) - 1):  # 如果不是list的最后一个元素,说明下一个行是同级别元素的起始,缩进应一致
            subelement.tail = newline + indent * (level + 1)
        else:  # 如果是list的最后一个元素, 说明下一行是母元素的结束,缩进应该少一个
            subelement.tail = newline + indent * level
        pretty_xml(subelement, indent, newline, level=level + 1)  # 对子元素进行递归操作

The final effect is as follows:

Attached are all the codes:

import xml.etree.ElementTree as ET


def pretty_xml(element, indent, newline, level=0):  # elemnt为传进来的Elment类,参数indent用于缩进,newline用于换行
    if element:  # 判断element是否有子元素
        if (element.text is None) or element.text.isspace():  # 如果element的text没有内容
            element.text = newline + indent * (level + 1)
        else:
            element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * (level + 1)
            # else:  # 此处两行如果把注释去掉,Element的text也会另起一行
            # element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * level
    temp = list(element)  # 将element转成list
    for subelement in temp:
        if temp.index(subelement) < (len(temp) - 1):  # 如果不是list的最后一个元素,说明下一个行是同级别元素的起始,缩进应一致
            subelement.tail = newline + indent * (level + 1)
        else:  # 如果是list的最后一个元素, 说明下一行是母元素的结束,缩进应该少一个
            subelement.tail = newline + indent * level
        pretty_xml(subelement, indent, newline, level=level + 1)  # 对子元素进行递归操作

xmlParse = ET.parse('eg.xml')
root = xmlParse.getroot()
tree = ET.ElementTree(root)

#增加R3节点
hubeiNode = ET.SubElement(root,'R3')
hubeiNode.attrib = {'type':'黄冈'}
#增加R3节点的子节点

huanggang = ET.SubElement(hubeiNode,'device_type')
huanggang.text = '黄梅县'

huanggang2 = ET.SubElement(hubeiNode,'username')
huanggang2.text = 'admin'

huanggang3 = ET.SubElement(hubeiNode,'people_num')
huanggang3.text = '五十万'

huanggang4 = ET.SubElement(hubeiNode,'company')
huanggang4.text = 'feidadun.com'

pretty_xml(root, '  ', '\n')  # 执行美化方法    缩进为两个空格,'\n'换行
tree.write('./eg2.xml', encoding='utf-8', xml_declaration=True, short_empty_elements=True)

Guess you like

Origin blog.csdn.net/qq_27071221/article/details/132838534