Introduction to XML and Dom4j to parse XML

1. Introduction to XML

First of all, let's briefly introduce what xml is. If you have some understanding of this aspect, you can skip title one. Friends who are
more anxious can directly look at 2.Dom4j Parsing XML in Title Two

1. What is XML

  • XML stands for Extensible Markup Language (EXtensible Markup Language)
  • XML is a markup language, very similar to HTML
  • XML is designed to transmit data, not display data
  • XML tags are not predefined. We need to define the label ourselves.
  • XML is designed to be self-describing
  • XML is the recommended standard of W3C

2. The main role of XML

  • Used to save data, and these data are self-descriptive
  • It can also be used as a configuration file for projects or modules
  • It can also be used as a format for network transmission (now JSON is the main format).

3. XML given HTML

  • XML is not a substitute for HTML.
  • XML and HTML are designed for different purposes:
    XML is designed to transmit and store data, and its focus is on the content of the data.
    HTML is designed to display data, and its focus is on the appearance of the data.
  • HTML is designed to display information, and XML is designed to transmit information.
  • XML tags are case sensitive , while HTML tags are not case sensitive. The browser will automatically convert the tag name to lowercase when parsing HTML tags.

4.XML attributes

The tag attributes of xml are very similar to those of html. Attributes can provide additional information about elements. Attributes can
be written on the tag: multiple attributes can be written on one tag. The value of each attribute must be enclosed in quotation marks.

5.XML grammar rules

  • All XML elements must have a closing tag (that is, closed)
  • XML tags are case sensitive
  • XML must be nested correctly
  • An XML document must have a root element. The
    root element is the top-level element, and the element without a parent tag is called the top-level element.
    The root element is the top-level element without a parent tag, and it is the only one.
  • XML attribute values ​​must be quoted
  • Special characters in XML
symbol Representation in XML significance
< < Less than
> > more than the
& & And sign
' apostrophe
" " quotation marks
  • Text area (CDATA area)
    CDATA grammar can tell the xml parser that the text content in my CDATA is just plain text and does not need xml grammar to parse the
    CDATA format: <![CDATA[ 这里可以把你输入的字符原样显示,不会解析 xml ]]>
    example:
<?xml version="1.0" encoding="UTF-8"?>
<!-- xml 声明 version 是版本的意思 encoding 是编码 -->
<students>
    <student id="001">
        <name>Mr.Yu</name>
        <age>21</age>
        <gender><![CDATA[<男>]]></gender>
    </student>

    <student id="002">
        <name>小明</name>
        <age>20</age>
        <gender><![CDATA[<男>]]></gender>
    </student>
</students>


2. Dom4j parses XML

1. Tree structure and xml file analysis technology

1.1 Tree structure

Regardless of whether it is an html file or an xml file, they are all markup documents and can be parsed using the dom technology developed by the w3c organization.
Insert picture description here
The XML file corresponding to the tree structure in the above figure:

<bookstore>
<book category="COOKING">
  <title lang="en">Everyday Italian</title> 
  <author>Giada De Laurentiis</author> 
  <year>2005</year> 
  <price>30.00</price> 
</book>
<book category="CHILDREN">
  <title lang="en">Harry Potter</title> 
  <author>J K. Rowling</author> 
  <year>2005</year> 
  <price>29.99</price> 
</book>
<book category="WEB">
  <title lang="en">Learning XML</title> 
  <author>Erik T. Ray</author> 
  <year>2003</year> 
  <price>39.95</price> 
</book>
</bookstore>

The document object represents the entire document (it can be an html document or an xml document).

1.2 xml file analysis technology

  • Early JDK provided us with two xml parsing technologies DOM and SAX (obsolete, but we need to know these two technologies)
  • The dom parsing technology is formulated by the W3C organization, and all programming languages ​​use the characteristics of their own language to implement this parsing technology. Java also implements the dom technology parsing mark.
  • Sun company upgraded the dom parsing technology in the JDK5 version: SAX (Simple API for XML)
    • SAX analysis, which is not the same as the analysis formulated by W3C. It uses a similar event mechanism to tell the user what is currently being parsed through callbacks. It reads the xml file line by line for analysis. Will not create a large number of dom objects.
    • So when it parses xml, it uses memory. And performance. Both are better than Dom analysis.
  • Third-party analysis:
    • jdom is encapsulated on the basis of dom.
    • dom4j encapsulates jdom again.
    • Pull is mainly used in the development of Android mobile phones. It is very similar to SAX. It is an event mechanism that parses xml files.

2.Dom4j parses XML

  • Through the explanation of the above xml file parsing technology, we know that Dom4j is a third-party parsing technology. We need to use a good class library provided by a third party to parse the xml file.
  • Since dom4j is not a technology of Sun company, but a technology of a third-party company, we need to use dom4j to download the dom4j jar package from the official website of dom4j. I uploaded the resource of this file on csdn, you can also download it directly, download address: https://download.csdn.net/download/MrYushiwen/14934949

After decompressing the downloaded file, we make a brief introduction to the file directory:

Insert picture description here

  • docs is a document directory, a learning document provided by a third-party library.
  • The lib directory contains other third-party libraries that dom4j needs to rely on.
  • The src directory is the source directory of dom4j

We now need to use dom4j-1.6.1.jar and import the jar package into the project.
After importing the jar package, we need the following steps to parse XML with Dom4j:

  1. To create a Document object, we need to create a SAXReader object first
  2. By creating a SAXReader object. To read the xml file and get the Document object
  3. Through the Document object. Get the root element object of XML
  4. Pass the root element object. Get all the book tag objects, Element.elements (tag name), it can get the collection of the specified child elements under the current element
  5. Traverse each student label object. Then get every element in the student tag object.

The specific code is as follows:

public class TestMain {
    
    
    public static void main(String[] args) {
    
    
        try {
    
    
            parseXml();
        } catch (DocumentException e) {
    
    
            e.printStackTrace();
        }
    }

    public static void parseXml() throws DocumentException {
    
    
        // 要创建一个 Document 对象,需要我们先创建一个 SAXReader 对象
        SAXReader reader = new SAXReader();
        // 通过创建 SAXReader 对象。来读取 xml 文件,获取 Document 对象
        Document document= reader.read("05_xml/xml/students.xml");
        //通过 Document 对象。拿到 XML 的根元素对象
        Element root = document.getRootElement();
        //通过根元素对象。获取所有的 book 标签对象,Element.elements(标签名)它可以拿到当前元素下的指定的子元素的集合
        List<Element> students = root.elements("student");
        //遍历每个 student 标签对象。然后获取到 student 标签对象内的每一个元素。
        for (Element student : students) {
    
    
            //获取student的id属性
            String id = student.attributeValue("id");
            //拿到 student 下面的 name 元素对象
            Element nameElement = student.element("name");
            //拿到 student 下面的 age 元素对象
            Element ageElement = student.element("age");
            //拿到 student 下面的 gender 元素对象
            Element genderElement = student.element("gender");
            //再通过 getText() 方法拿到起始标签和结束标签之间的文本内容
            System.out.println("学号:"+id);
            System.out.println("姓名:"+nameElement.getText());
            System.out.println("年龄:"+ageElement.getText());
            System.out.println("性别:"+genderElement.getText());
            System.out.println("*****************************");
        }
    }
}

Parsed xml file:

<?xml version="1.0" encoding="UTF-8"?>
<!-- xml 声明 version 是版本的意思 encoding 是编码 -->
<students>
    <student id="001">
        <name>Mr.Yu</name>
        <age>21</age>
        <gender><![CDATA[<男>]]></gender>
    </student>

    <student id="002">
        <name>小明</name>
        <age>20</age>
        <gender><![CDATA[<男>]]></gender>
    </student>
</students>


Parse the output result:
Insert picture description here

Guess you like

Origin blog.csdn.net/MrYushiwen/article/details/113182481