Detailed Explanation of XML Parsing

Compilation software: IntelliJ IDEA 2019.2.4 x64
DOM4J package version: dom4j-1.6.1



1. What is XML?

XML is a markup language, the full name is Extensible Markup Language (Extensible Markup Language), which is used to describe the structure and content of data . It can be used to represent various types of data such as text, numbers, images, etc.

XML is designed to make data exchange and sharing easier, and it can also be used for data storage and transmission . An XML document consists of tags, attributes, and text content, and can be validated via a DTD (Document Type Definition) or XML Schema. The syntax of XML is simple and flexible, and it is widely used in fields such as Web services, data exchange, and configuration files.


Second, how to reflect the scalability of the XML language?

What is extensible? As the name suggests, it means that new functions can be added on the original basis. But in the XML language, although the meaning is roughly the same, a more precise formulation is:虽然XML语言允许我们可以自由定义格式,但并非可以随便乱写,而是要遵从具体的XML约束。

These constraints are defined by different organizations, and different organizations define different constraints.

For example, in Java EE, our common web-xml file is mainly used to describe the deployment information and configuration of Java Web applications. With it, developers can focus more on the implementation of business logic. The organization that defines it is the Java Community Process (JCP) community. If you want to edit the web-xml file, you must follow the specific constraint files formulated by the organization.

The same goes for other third-party tools. Define their own constraints on the basis of the XML basic syntax specification to enforce what can and cannot be written in the configuration file. As shown below.

insert image description here


3. What is XML mainly used for?

use:

  • ①异步系统之间进行数据传输的媒介(现在json已经代替了该功能)

    What does it mean? That is to say, if you want to transfer data between Java programs, python programs, and c++ programs, you can use it as a carrier for storing data and transfer it.

  • ②作为配置文件使用

    What is a configuration file? Configuration files are files with special formats used to provide configuration parameters and initialization settings for applications. For example, the druid connection pool in jdbc uses properties files as configuration files.


Fourth, the basic syntax of XML

XML的基本语法主要由标签、元素、属性和文本内容等语法规范,文档声明(Document Declaration)以及可选的约束文件(例如DTD或XSD)组成

4.1 Grammar specification

  • root tag

There can only be one root tag.

  • tab closed
    • Double tags : The start tag and end tag must appear in pairs.
    • Single-Tab : A single-tab is closed within a tag.
  • label nesting
    • Can be nested, but not cross-nested .
  • Comments cannot be nested
  • It is recommended to use lowercase letters for tag names and attribute names
  • Attributes
    • attribute must have a value
    • Attribute values ​​must be enclosed in quotation marks, either single or double

ps: The above grammar specification is completely consistent with the language specification of the HTML language, which is very easy to use.

4.2 Document Statement

The document declaration specifies the XML version and character set information, and is generally located in the first line of the XML document

For example the following code:

//定义该xml文件的版本为1.0,字符集编码为utf-8
<?xml version="1.0" encoding="UTF-8"?>

4.3 Constraints file

Constraint files define the structure and norms of XML documents to ensure the validity and consistency of XML documents. It mainly includes DTD and Schema.

It should be noted that although the constraint file is an important part of XML, it is not a necessary part of the basic syntax. In some cases, developers can choose not to use constraint files, but only rely on the structure and logical relationship of the XML document itself to realize data verification and processing.


5. How to parse XML? (Take Java as an example)

effect:

Read data in xml with Java code

step:

① Prepare an XML file to be parsed and customize the content

The code demonstration is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!--  没有约束文件,标签名和属性你想怎么写就怎么写  -->
<employees>
    <employee id="101">
        <name>张三</name>
        <age>18</age>
        <address>北京</address>
    </employee>
    <employee id="102">
        <name>李思思</name>
        <age>22</age>
        <address>武汉</address>
    </employee>
    <employee id="103">
        <name>王五</name>
        <age>32</age>
        <address>上海</address>
    </employee>

</employees>

②Import the DOM4J ( ) package in the project DOM4J是一个Java的XML解析库, and use IDEA to write related codes.
insert image description here

ps: The steps of importing the project package are the same as importing the Juniite package, so I won’t go into details here. If you have any questions, you can refer to this blog "Java SE: JUnit Quick Start Guide" .

a. Create to start creating the xml parser object

```java
  //1.创建解析器对象
  SAXReader reader=new SAXReader();
```

b. Let the parser object parse the xml file

	```java
	//解析XML获取Document对象: 需要传入要解析的XML文件的字节输入流
	Document document = reader.read(domTest.class.getClassLoader().getResourceAsStream("employees.xml"));
	```

c. Start fetching content

```java
 //获取根节点对象
 Element rootElement = document.getRootElement();
 
 //在xml文件里自根节点下如果有多个同名节点的元素,默认找第一个,这里返回第一个employee
 Element employee = rootElement.element("employee");

 //从employee元素下找名称为name的标签
 Element name = employee.element("name");

 //获取标签name的内容	
 System.out.println("name中的内容:"+name.getText());


 //获取子标签(每一个element元素)下的标签name的标签体
 List<Element> elements = rootElement.elements();
 for (Element element : elements) {
     //获取每一个element元素下的标签name的标签体
      System.out.println(element.element("name").getText());
  }

//获取第一个employee元素的属性id的值
//获取第一个employee元素
Element element1 = rootElement.element("employee");

//获取属性对象id
Attribute id = element1.attribute("id");

//获取属性对象id的值,然后赋给value
String value = id.getValue();
System.out.println("id:"+value);//打印属性id的值

```

Case: In the employees.xml file just created, use Java to obtain the tag body of the tag name in the first sub-tag employee, the tag body of the tag name in all sub-tags employee, and the first sub-tag employee The attribute value of the attribute id

The complete code of the case is as follows (example):

import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.List;

public class domTest {
    
    
    public static void main(String[] args) {
    
    
        //1.创建解析器对象
        SAXReader reader=new SAXReader();

        try {
    
    

            //写法1获取根标签对象
            Document document = reader.read(domTest.class.getClassLoader().getResourceAsStream("employees.xml"));


       /*
            //写法2获取根标签对象
            File file=new File("E:\\javaApp\\day04_xml\\src\\employees.xml");
            FileInputStream fis=new FileInputStream(file);
            Document document = reader.read(fis);
       */

            Element rootElement = document.getRootElement();//获取根节点对象
            Element employee = rootElement.element("employee");//在xml文件里自根节点下如果有多个同名节点的元素,默认找第一个,这里返回第一个employee
            Element name = employee.element("name");//从employee节点下找名称为name的标签
            System.out.println("name中的内容:"+name.getText());//获取标签name的内容

            System.out.println("-------------------------");
            //获取子标签(每一个element元素)下的标签name的标签体
            List<Element> elements = rootElement.elements();
            for (Element element : elements) {
    
    
                //获取每一个element元素下的标签name的标签体
                System.out.println(element.element("name").getText());
            }


            System.out.println("---------------------------");
            Element element1 = rootElement.element("employee");//获取第一个employee
            Attribute id = element1.attribute("id");//获取属性对象id
            String value = id.getValue();//将属性对象id的值赋给value
            System.out.println("id:"+value);//打印属性id的值

        } catch (Exception e) {
    
    
            e.printStackTrace();
        }
    }
}

insert image description here

Guess you like

Origin blog.csdn.net/siaok/article/details/130034863