python代码xml转txt

为了训练深度学习模型,经常要整理大量的标注数据,需统一不同格式的标注数据,一般情况下习惯读取TXT格式的数据。但实际中经常遇到XML格式的标注数据,在此举例:1.读取XML标注数据;2.写入TXT文件。

XML标注数据如下

[html]  view plain  copy
  1. <annotation verified="no">  
  2.   <folder>suE</folder>  
  3.   <filename>Drivingrecord_001</filename>  
  4.   <path>C:\Desktop\Drivingrecord_001.jpg</path>  
  5.   <source>  
  6.     <database>Unknown</database>  
  7.   </source>  
  8.   <size>  
  9.     <width>1920</width>  
  10.     <height>1080</height>  
  11.     <depth>3</depth>  
  12.   </size>  
  13.   <segmented>0</segmented>  
  14.   <object>  
  15.     <name>苏E*****-蓝-1-白,灰-大众-上海大众-桑塔纳-尚纳</name>  
  16.     <flag>polygon</flag>  
  17.     <pose>Unspecified</pose>  
  18.     <truncated>0</truncated>  
  19.     <difficult>0</difficult>  
  20.     <bndbox>  
  21.       <leftTopx>170</leftTopx>  
  22.       <leftTopy>704</leftTopy>  
  23.       <rightTopx>167</rightTopx>  
  24.       <rightTopy>729</rightTopy>  
  25.       <rightBottomx>242</rightBottomx>  
  26.       <rightBottomy>735</rightBottomy>  
  27.       <leftBottomx>243</leftBottomx>  
  28.       <leftBottomy>710</leftBottomy>  
  29.     </bndbox>  
  30.   </object>  
  31.   <object>  
  32.     <name>苏E*****-蓝-1-黄-雷克萨斯-雷克萨斯(进口)-雷克萨斯RX</name>  
  33.     <flag>polygon</flag>  
  34.     <pose>Unspecified</pose>  
  35.     <truncated>0</truncated>  
  36.     <difficult>0</difficult>  
  37.     <bndbox>  
  38.       <leftTopx>733</leftTopx>  
  39.       <leftTopy>721</leftTopy>  
  40.       <rightTopx>733</rightTopx>  
  41.       <rightTopy>759</rightTopy>  
  42.       <rightBottomx>881</rightBottomx>  
  43.       <rightBottomy>760</rightBottomy>  
  44.       <leftBottomx>882</leftBottomx>  
  45.       <leftBottomy>722</leftBottomy>  
  46.     </bndbox>  
  47.   </object>  
  48.   <object>  
  49.     <name>苏*****-蓝-1-黑-宝马-宝马(进口)-宝马7系</name>  
  50.     <flag>polygon</flag>  
  51.     <pose>Unspecified</pose>  
  52.     <truncated>0</truncated>  
  53.     <difficult>0</difficult>  
  54.     <bndbox>  
  55.       <leftTopx>1274</leftTopx>  
[html]  view plain  copy
  1.       <leftTopy>657</leftTopy>  
  2.       <rightTopx>1274</rightTopx>  
  3.       <rightTopy>671</rightTopy>  
  4.       <rightBottomx>1325</rightBottomx>  
  5.       <rightBottomy>670</rightBottomy>  
  6.       <leftBottomx>1326</leftBottomx>  
  7.       <leftBottomy>656</leftBottomy>  
  8.     </bndbox>  
  9.   </object>  
  10.   <object>  
  11.     <name>苏*****-蓝-1-灰-标致-东风标致-标致307</name>  
  12.     <flag>polygon</flag>  
  13.     <pose>Unspecified</pose>  
  14.     <truncated>0</truncated>  
  15.     <difficult>0</difficult>  
  16.     <bndbox>  
  17.       <leftTopx>1609</leftTopx>  
  18.       <leftTopy>658</leftTopy>  
  19.       <rightTopx>1611</rightTopx>  
  20.       <rightTopy>671</rightTopy>  
  21.       <rightBottomx>1659</rightBottomx>  
  22.       <rightBottomy>669</rightBottomy>  
  23.       <leftBottomx>1657</leftBottomx>  
  24.       <leftBottomy>656</leftBottomy>  
  25.     </bndbox>  
  26.   </object>  
  27. </annotation>  

在此,我们只需要图片名filename,和每个object的坐标(四个点的坐标)

[python]  view plain  copy
  1. Drivingrecord_001.jpg 170 704 167 729 242 735 243 710 733 721 733 759 881 760 882 722 1274 657 1274 671 1325 670 1326 656 1609 658 1611 671 1659 669 1657 656   

利用xml.dom.*模块,文件对象模块DOM在读取XML文件时,一次读取整个文件,将其所有数据保存在一个树结构中,此时,可利用DOM的各种函数来读取目标数据。在此,利用xml.dom.minidom解析XML文件。并将目标数据写入TXT文档。

[python]  view plain  copy
  1. # -*- coding: utf-8 -*-  
  2. """ 
  3. Created on Fri Mar  2 15:36:44 2018 
  4.  
  5. @author: gg 
  6. """  
  7.   
  8. import xml.dom.minidom  
  9. import os  
  10.   
  11. save_dir = 'D:\plate_train'    
  12. if not os.path.exists(save_dir):  
  13.     os.mkdir(save_dir)  
  14. f = open(os.path.join(save_dir, 'landmark.txt'), 'w')  
  15.   
  16. DOMTree = xml.dom.minidom.parse('D:\plate_train\label\Drivingrecord_001.xml')  
  17. annotation = DOMTree.documentElement  
  18.   
  19. filename = annotation.getElementsByTagName("filename")[0]  
  20. imgname = filename.childNodes[0].data+'.jpg'  
  21. print(imgname)  
  22.      
  23. objects = annotation.getElementsByTagName("object")  
  24.   
  25. loc = [imgname]  #文档保存格式:文件名 坐标  
  26.   
  27. for object in objects:  
  28.     bbox = object.getElementsByTagName("bndbox")[0]  
  29.     leftTopx = bbox.getElementsByTagName("leftTopx")[0]  
  30.     lefttopx = leftTopx.childNodes[0].data  
  31.     print(lefttopx)  
  32.     leftTopy = bbox.getElementsByTagName("leftTopy")[0]  
  33.     lefttopy = leftTopy.childNodes[0].data  
  34.     print(lefttopy)  
  35.     rightTopx = bbox.getElementsByTagName("rightTopx")[0]  
  36.     righttopx = rightTopx.childNodes[0].data  
  37.     print(righttopx)  
  38.     rightTopy = bbox.getElementsByTagName("rightTopy")[0]  
  39.     righttopy = rightTopy.childNodes[0].data  
  40.     print(righttopy)  
  41.     rightBottomx = bbox.getElementsByTagName("rightBottomx")[0]  
  42.     rightbottomx = rightBottomx.childNodes[0].data  
  43.     print(rightbottomx)  
  44.     rightBottomy = bbox.getElementsByTagName("rightBottomy")[0]  
  45.     rightbottomy = rightBottomy.childNodes[0].data  
  46.     print(rightbottomy)  
  47.     leftBottomx = bbox.getElementsByTagName("leftBottomx")[0]  
  48.     leftbottomx = leftBottomx.childNodes[0].data  
  49.     print(leftbottomx)  
  50.     leftBottomy = bbox.getElementsByTagName("leftBottomy")[0]  
  51.     leftbottomy = leftBottomy.childNodes[0].data   
  52.     print(leftbottomy)  
  53.       
  54.     loc = loc + [lefttopx, lefttopy, righttopx, righttopy, rightbottomx, rightbottomy, leftbottomx, leftbottomy]  
  55.       
  56. for i in range(len(loc)):  
  57.     f.write(str(loc[i])+' ')  
  58. f.write('\t\n')      
  59. f.close()  
  60.       
  61.       
  62.       

猜你喜欢

转载自blog.csdn.net/qq_33485434/article/details/80420880