Python:解析XML格式数据

场景描述

在写Python爬虫的时候,遇到XML格式的数据,使用Xpath时无法正常解析,这时候该怎么办呢?

测试环境

  • Python 3.9.13

测试数据

<?xml version="1.0" encoding="UTF-8"?>

<tradeproperty>
    <INDEX>
        <TRADING_DAY>20221012</TRADING_DAY>
        <PRODUCT_ID>IC</PRODUCT_ID>
        <INSTRUMENT_ID>IC2210</INSTRUMENT_ID>
        <INSTRUMENT_MONTH>2210</INSTRUMENT_MONTH>
        <BASIS_PRICE>6361.6</BASIS_PRICE>
        <OPEN_DATE>20220822</OPEN_DATE>
        <END_TRADING_DAY>20221021</END_TRADING_DAY>
        <UPPER_VALUE>0.1</UPPER_VALUE>
        <LOWER_VALUE>0.1</LOWER_VALUE>
        <UPPERLIMITPRICE>6256</UPPERLIMITPRICE>
        <LOWERLIMITPRICE>5118.8</LOWERLIMITPRICE>
        <LONG_LIMIT>1200</LONG_LIMIT>
    </INDEX>
    <INDEX>
        <TRADING_DAY>20221012</TRADING_DAY>
        <PRODUCT_ID>IC</PRODUCT_ID>
        <INSTRUMENT_ID>IC2211</INSTRUMENT_ID>
        <INSTRUMENT_MONTH>2211</INSTRUMENT_MONTH>
        <BASIS_PRICE>5929.6</BASIS_PRICE>
        <OPEN_DATE>20220919</OPEN_DATE>
        <END_TRADING_DAY>20221118</END_TRADING_DAY>
        <UPPER_VALUE>0.1</UPPER_VALUE>
        <LOWER_VALUE>0.1</LOWER_VALUE>
        <UPPERLIMITPRICE>6231.6</UPPERLIMITPRICE>
        <LOWERLIMITPRICE>5098.8</LOWERLIMITPRICE>
        <LONG_LIMIT>1200</LONG_LIMIT>
    </INDEX>
</tradeproperty>

解析代码

import xml.etree.ElementTree as ET

test_xml = '''<?xml version="1.0" encoding="UTF-8"?>

<tradeproperty>
    <INDEX>
        <TRADING_DAY>20221012</TRADING_DAY>
        <PRODUCT_ID>IC</PRODUCT_ID>
        <INSTRUMENT_ID>IC2210</INSTRUMENT_ID>
        <INSTRUMENT_MONTH>2210</INSTRUMENT_MONTH>
        <BASIS_PRICE>6361.6</BASIS_PRICE>
        <OPEN_DATE>20220822</OPEN_DATE>
        <END_TRADING_DAY>20221021</END_TRADING_DAY>
        <UPPER_VALUE>0.1</UPPER_VALUE>
        <LOWER_VALUE>0.1</LOWER_VALUE>
        <UPPERLIMITPRICE>6256</UPPERLIMITPRICE>
        <LOWERLIMITPRICE>5118.8</LOWERLIMITPRICE>
        <LONG_LIMIT>1200</LONG_LIMIT>
    </INDEX>
    <INDEX>
        <TRADING_DAY>20221012</TRADING_DAY>
        <PRODUCT_ID>IC</PRODUCT_ID>
        <INSTRUMENT_ID>IC2211</INSTRUMENT_ID>
        <INSTRUMENT_MONTH>2211</INSTRUMENT_MONTH>
        <BASIS_PRICE>5929.6</BASIS_PRICE>
        <OPEN_DATE>20220919</OPEN_DATE>
        <END_TRADING_DAY>20221118</END_TRADING_DAY>
        <UPPER_VALUE>0.1</UPPER_VALUE>
        <LOWER_VALUE>0.1</LOWER_VALUE>
        <UPPERLIMITPRICE>6231.6</UPPERLIMITPRICE>
        <LOWERLIMITPRICE>5098.8</LOWERLIMITPRICE>
        <LONG_LIMIT>1200</LONG_LIMIT>
    </INDEX>
</tradeproperty>
'''
# 从xml格式字符串导入数据
root = ET.fromstring(test_xml)
# 遍历每条xml数据:INDEX
for child in root:
    print('=' * 60)
    # 遍历每条xml数据下的具体内容
    ## 方案一
    for item in child:
        print(f'{
      
      item.tag}{
      
      item.text}')
    ## 方案二
    # for i in range(len(child)):
    #     print(f'{child[i].tag}:{child[i].text}')

运行结果示例:

============================================================
TRADING_DAY:20221012
PRODUCT_ID:IC
INSTRUMENT_ID:IC2210
INSTRUMENT_MONTH:2210
BASIS_PRICE:6361.6
OPEN_DATE:20220822
END_TRADING_DAY:20221021
UPPER_VALUE:0.1
LOWER_VALUE:0.1
UPPERLIMITPRICE:6256
LOWERLIMITPRICE:5118.8
LONG_LIMIT:1200
============================================================
TRADING_DAY:20221012
PRODUCT_ID:IC
INSTRUMENT_ID:IC2211
INSTRUMENT_MONTH:2211
BASIS_PRICE:5929.6
OPEN_DATE:20220919
END_TRADING_DAY:20221118
UPPER_VALUE:0.1
LOWER_VALUE:0.1
UPPERLIMITPRICE:6231.6
LOWERLIMITPRICE:5098.8
LONG_LIMIT:1200

参考链接

猜你喜欢

转载自blog.csdn.net/qq_34562959/article/details/127277858