Python-based XML file parsing (0)

Introduction to XML

what is XML

XML refers to Extensible Markup Language and was primarily designed to transmit and store data. In the XML file, none of the tags are predefined, and the author needs to customize the tags.
An XML document example is shown below.

<?xml version='1.0' encoding="UTF-8"?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note> 

XML documents form a tree structure that starts at the "root" and expands to the "leaf" . In the code above, the next line describes the root element of the document.

<note>

Next are its 4 child elements.

<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>

The last line defines the end of the root element:

</note>

XML documents must contain the root element , which is the parent element of all other elements.
In XML documents, pay attention to the following:
1. Must contain a closing tag ;
2. XML tags are case-sensitive; 3. XML
must be properly nested;
4. XML attribute values ​​must be quoted;
the first of the following is wrong, the second one is right.

<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>

5. Entity references;
in XML documents, some characters have special meanings. An error occurs when you put these characters in an XML element, because XML treats this as the beginning of a new element. For this reason, entity references are used in practical applications to avoid errors .
6. Comments in XML are very similar to HTML syntax;

<!-- This is a commrnt -->

7. In XML documents, spaces are preserved;
8. In XML, newlines are stored in LF.

XML element

XML documents contain XML elements. An XML element is the portion from (and including) the opening tag up to (and including) the closing tag. An element can contain:
other elements

  • other elements
  • text
  • Attributes
  • or mix all of the above
<bookstore>
    <book category="CHILDREN">
        <title>Harry Potter</title>
        <author>J K. Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
    <book category="WEB">
        <title>Learning XML</title>
        <author>Erik T. Ray</author>
        <year>2003</year>
        <price>39.95</price>
    </book>
</bookstore>

In the above example, both <bookstore> and <book> have element content because they contain other elements. The <book> element also has an attribute (category="CHILDREN"). <title>, <author>, <year>, and <price> have text content because they contain text.

XML Naming Rules

XML documents must follow the following naming conventions :

  • Names can contain letters, numbers, and other characters
  • The name cannot start with a number or punctuation
  • The name cannot start with the letters xml (or XML, Xml, etc.)
  • Name cannot contain spaces
  • Any name can be used, no reserved words

XML attributes

XML elements have attributes (Attribute) to provide additional information about the element. Attributes provide information that is not part of the data.

<file type='gif'>computer.gif</file>

In the above example, the attribute type defines the type of the file, which is easy to come out later.
In XML, some issues to be aware of when using attributes:

  • Attributes cannot contain multiple values ​​(elements can)
  • Attributes cannot contain a tree structure (elements can)
  • Attributes are not easily extensible (elements can)

Attributes are difficult to read and maintain, try to use elements to describe data, and only use attributes to provide information that is not related to the data.

This part mainly refers to < http://www.runoob.com/xml/xml-display.html >

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325079183&siteId=291194637