1、导入模块 $\rightarrow$ 读取文件 $\rightarrow$ 获取根节点 $\rightarrow$ 获取根节点的标签与属性
2、遍历一级子节点、获取子节点的标签与属性
3、通过索引获取数据
4、Element.findall()、Element.find() - 按照 tag 值查找子节点
5、Element.iter() - 循环迭代方式查找指定 tag 的节点

假设有xml文件内容如下：

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
    <neighbor name="xxxx" direction="W"/>
</data>

1、导入模块 $\rightarrow$ 读取文件 $\rightarrow$ 获取根节点 $\rightarrow$ 获取根节点的标签与属性

import xml.etree.ElementTree as ET

# 读取文件
tree = ET.parse('test.xml', parser=None)

# 获取根
root = tree.getroot()
print(root)   # <Element 'data' at 0x10c8b2b30>

# 根的标签与属性
print(root.tag)   # data
print(root.attrib)   # {}

另一种方式是通过 open 和 read() 将文件内容读取为字符串 str 格式。
再通过 ET.fromstring() 函数获取root 节点。获取 root 节点之后的操作就都一样了。

import xml.etree.ElementTree as ET

with open('text.xml') as f:
    data_str = f.read()

root = ET.fromstring(data_str)
print(root.tag)   # data
print(root.attrib)   # {}

2、遍历一级子节点、获取子节点的标签与属性

for child in root:
    print(child.tag, child.attrib)
# country {'name': 'Liechtenstein'}
# country {'name': 'Singapore'}
# country {'name': 'Panama'}

3、通过索引获取数据

$roo t$ 是根节点
$roo t [0]$ 是下一级子节点的第1个元素
$roo t [0] [1]$ 是下二级子节点的第2个元素

print(root[0][1].tag)   # year
print(root[0][1].attrib)   # {}
print(root[0][1].text)   # 2008

4、Element.findall()、Element.find() - 按照 tag 值查找子节点

Element.findall()、 Element.find()

'''
Element.findall('xxx') ：查找当前节点(Element)下，tag为'xxx' 的所有子节点，放到一个list中
Element.find('xxx')：查找当前节点(Element)下，tag为'xxx' 的第一个子节点
Element.get(key) ： （节点属性是一个字典）获取节点属性 对应的 value 
'''
for country in root.findall('country'):
    rank = country.find('rank').text   #
    name = country.get('name') 
    print(name, rank)
# Liechtenstein 1
# Singapore 4
# Panama 68

5、Element.iter() - 循环迭代方式查找指定 tag 的节点

这里 循环迭代方式 指的是，在该节点下的 所有阶 的子节点中查找。不像 find 和 findall，find 和 findall 只在一级子节点中查找。

# 生成一个迭代器
print(root.iter('neighbor'))   # <_elementtree._element_iterator object at 0x101a6d630>

# root 节点下，循环迭代的方式查找 tag 为 'neighbor' 的 子节点
for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)
# {'name': 'Austria', 'direction': 'E'}
# {'name': 'Switzerland', 'direction': 'W'}
# {'name': 'Malaysia', 'direction': 'N'}
# {'name': 'Costa Rica', 'direction': 'W'}
# {'name': 'Colombia', 'direction': 'E'}
# {'name': 'xxxx', 'direction': 'W'}

解析 xml 文件 - xml.etree ElementTree

目录

1、导入模块 $\rightarrow$ 读取文件 $\rightarrow$ 获取根节点 $\rightarrow$ 获取根节点的标签与属性

2、遍历一级子节点、获取子节点的标签与属性

3、通过索引获取数据

4、Element.findall()、Element.find() - 按照 tag 值查找子节点

5、Element.iter() - 循环迭代方式查找指定 tag 的节点

猜你喜欢

解析 xml 文件 - xml.etree ElementTree

目录

1、导入模块 → \rightarrow → 读取文件 → \rightarrow → 获取根节点 → \rightarrow → 获取根节点的标签与属性

2、遍历一级子节点、获取子节点的标签 与 属性

3、通过索引 获取数据

4、Element.findall()、Element.find() - 按照 tag 值查找 子节点

5、Element.iter() - 循环迭代方式 查找指定 tag 的节点

猜你喜欢

1、导入模块 $\rightarrow$ 读取文件 $\rightarrow$ 获取根节点 $\rightarrow$ 获取根节点的标签与属性

2、遍历一级子节点、获取子节点的标签与属性

3、通过索引获取数据

4、Element.findall()、Element.find() - 按照 tag 值查找子节点

5、Element.iter() - 循环迭代方式查找指定 tag 的节点