1. The structure of the XML file
Related nouns involved are: Node, Element, Attribute, Content
See:
Example XML file:
<bookstore>
<book category="CHILDREN">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
(1) Node: It can be understood that each <>.XML node represents a single XML fragment, for example, the start element and its attributes, the end element, text or "typed" text content, such as an integer or byte array.
(2) Element (Element): The part from the start tag to the end tag, such as <book category="CHILDREN"> to </book> in the above figure.
(3) Attribute (Attribute): The category="CHILDREN" behind <book category="CHILDREN"> is the attribute.
(4) Content: For example, the content inside the <year> and <price> tags.
2. Parse the XML file
There are currently two ways to parse XML files through C#: XmlDocument and XmlReader.
(1)XmlDocument
See: XmlDocument Class (System.Xml) | Microsoft Learn
①Features:
Advantages: XML files can be added, deleted, checked, modified, etc.
Disadvantages: Need to load the entire file into memory, parsing large XML files will be slower
② Code implementation (not directly available)
using System.Xml;
XmlDocument xmlDocument = new XmlDocument();//实例化
xmlDocument.Load(filepath);//加载xml文件,文件路径为绝对路径
XmlNode xmlNode = xmlDocument.SelectSingleNode(xpath);//利用节点的层级关系选择节点
XmlNodeList xmlNodeList = xmlNode.ChildNodes;//将节点下的子节点集合成一个列表
(2)XmlReader
See: xml::XmlReader class | Microsoft Learn
① Features
Pros: read-only, fast
Disadvantages: It is impossible to add, delete, check and modify XML files, and it is a one-way read
② Code implementation (not directly available)
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();
//使用 Create 该方法获取 XmlReader 实例。
//此方法使用 XmlReaderSettings 类指定要在创建的对象中 XmlReader 实现的功能。
xmlReaderSettings.IgnoreWhitespace = true;//忽略空白
xmlReaderSettings.IgnoreComments = true;//忽略注释
using (XmlReader xmlReader = XmlReader.Create(filepath, xmlReaderSettings))
{
while (xmlReader.Read())
{
//写需要进行的操作
}
3. Idea development
(1) Purely using XmlDocument: the speed of parsing large files starts to be unacceptably slow (it takes about three minutes to run on my own computer)
(2) Pure XmlReader: faster than pure XmlDocument, but still a bit slow, so I started to try to find some tricky ways based on my understanding of the XML file structure
(4) Current thinking
①Read the XML file into the memory through XmlDocument, and use the SelectNodes method to gather all the frames into an XmlNodeList
②Use the XmlNodeReader method to read in each node as needed