Detailed explanation of XML file principles

1. Introduction

1. XML definition

XML (Extensible Markup Language) is a standard for encoding documents. It was developed by the World Wide Web Consortium (W3C) in 1998 to make information easier to share, process and transmit. XML is a markup language, which means it uses tags (or elements) to describe data or content. These tags are user-defined, so XML can be used to describe any type of data. For example, you could create an XML document containing "book" and "author" tags to describe a library of books. Important features of XML include:

  • Extensibility: XML lets users define their own markup. This means it can be extended and adapted to a variety of applications and information models.
  • Self-descriptive: Because tags are user-defined, XML documents usually describe the information they contain.
  • Machine-readable and human-readable: Another advantage of XML is that it is easy for machines to process and humans to read at the same time.
  • Open Standard: XML is an open standard developed by the W3C, which means it is widely accepted and used around the world.

XML is used in many places, including web development, scientific data exchange, audio and video processing, e-commerce, etc. Because of its extensibility and self-describing nature, XML is a very powerful tool that can adapt to a variety of data description and exchange needs. Its most important functions are data transmission , configuration files and data storage (when there is not much data, it acts as a small database)

2. Test

Write a simple XML file below

<user>jack</user>
<msg>超级大帅哥</msg>

Then open the file with a browser to check if there are any errors.

Insert image description here

Found an error, because all xml must have a root node

<root>
    <user>jack</user>
    <msg>超级大帅哥</msg>
</root>

Insert image description here

3. The difference between HTML and XML

  • HTML tags cannot be customized, XML tags must be customized
  • HTML syntax requirements are not strict, XML tag requirements are extremely strict and must be paired tags
  • XML is used to transmit or store data, and HTML is used to display data.

2. XML basic syntax

1. Grammar rules

(1) xml must have a root node (the root node is the parent node of all other nodes)
Insert image description here
(2) XML header statement: not mandatory and optional (recommended to write)

<?xml version="1.0" encoding="utf-8" ?>
  • version: xml version
  • encoding: encoding type

(3) All xml elements must be paired tags
(4) Tag names are case-sensitive
(5) Comments in XML and HTML comments are the same
(6) Special characters use entity escape (such as < for $lt;)

Characters that need to be escaped in xml

$lt; < less than
&gt; > greater than
&amp; & ampersand
&apos; ' apostrophe
&quot; " quotation mark

2. Attributes of elements

Attributes describe additional information about tags

Insert image description here

A label can have multiple attributes. The value of the attribute must be enclosed in quotation marks. Naming rules for attributes: numbers, letters, and underscores (numbers cannot begin)

3. CDATA

As shown in the following code, there are many characters that need to be escaped. If you escape manually, the workload will be very heavy. At this time, CDATA comes in handy:

<?xml version="1.0" encoding="utf-8" ?>
<root>
     <man>
         <name>张杰</name>
         <msg>世界上最好的大学是什么:如果2<4 ,但是4>5,7<3</msg>
     </man>
    <man>
        <name age="38">太白</name>
    </man>
</root>

Insert image description here

<?xml version="1.0" encoding="utf-8" ?>
<root>
     <man>
         <name>张杰</name>
         <msg><![CDATA[世界上最好的大学是什么:如果2<4 ,但是4>5,7<3]]></msg>
     </man>
    <man>
        <name age="38">太白</name>
    </man>
</root>

Insert image description here

The content in the square brackets of CDATA will not be parsed (CDATA must be capitalized)

4. DTD files

As mentioned earlier, the tags of XML files are all customized, but this will bring some problems, such as causing the XML content to be very messy, so we need to define the specifications of XML tags in advance when using XML (such as Mybatis' configuration file), and the specification file we configure is the DTD file

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration><!-- 配置 -->
    <properties /><!-- 属性 -->
    <settings /><!-- 设置 -->
    <typeAliases /><!-- 类型命名 -->
    <typeHandlers /><!-- 类型处理器 -->
    <objectFactory /><!-- 对象工厂 -->
    <plugins /><!-- 插件 -->
    <environments><!-- 配置环境 -->
        <environment><!-- 环境变量 -->
            <transactionManager /><!-- 事务管理器 -->
            <dataSource /><!-- 数据源 -->
        </environment>
    </environments>
    <databaseIdProvider /><!-- 数据库厂商标识 -->
    <mappers /><!-- 映射器 -->
</configuration>

The following implements a simple DTD file

<!ELEMENT students (student*)>
<!ELEMENT student (name,age)>
<!ELEMENT  age (#PCDATA)>
<!ELEMENT  name (#PCDATA)>

The above indicates that there can only be student tags under the studentns tag, and there are name and age tags under the student tag. Then the data types of age and name are both String. Then we will demonstrate using our own dtd file.

<?xml version="1.0" encoding="utf-8" ?>
<!--引入dtd文件-->
<!DOCTYPE students SYSTEM "test1.dtd">
<students>
    <student>
        <name></name>
        <age></age>
    </student>
</students>

5. XSD file

XSD, that is, XML structure definition, XSD is a substitute for DTD, so the function of XSD is the same as that of DTD, but the usage of XSD is more advanced. XML Schema Definition (XSD) is used to describe and validate the structure of XML documents. It provides a way to specify how an XML document must look (which elements can exist, their order, how many child elements an element can have, etc.). Specifically, you can use XSD to define:

  • The position and number of times elements and attributes can appear
  • Data types for elements and attributes
  • Default and fixed values ​​for elements and attributes

The following is some basic syntax of XSD:

Declaring elements: xsd:element is used to declare an element.

<xsd:element name="elementName" type="dataType"/>

Declaring attributes: xsd:attribute is used to declare an attribute.

<xsd:attribute name="attributeName" type="dataType"/>

Declaring complex elements: xsd:complexType can be used to declare complex elements that contain other elements and/or attributes.

<xsd:complexType name="complexTypeName">
    <!-- definitions of elements and/or attributes -->
</xsd:complexType>

Declaring simple elements: xsd:simpleType can be used to declare simple elements, which contain only text.

<xsd:simpleType name="simpleTypeName">
    <!-- definition of text type -->
</xsd:simpleType>

Now let's look at a complete example, assuming we have the following XML document:

<?xml version="1.0" encoding="utf-8" ?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

A corresponding XSD might look like this:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="note">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="to" type="xsd:string"/>
      <xsd:element name="from" type="xsd:string"/>
      <xsd:element name="heading" type="xsd:string"/>
      <xsd:element name="body" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

</xsd:schema>

In this XSD, we define a complex element called "note" that contains four child elements: to, from, heading, and body. All these elements are of type string, and they must appear in the order specified in xsd:sequence. To reference or link to the corresponding XSD file in an XML document, you need to use the xmlns (XML namespace) attribute at the top of the XML document along with the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute. If the XSD file is in the same namespace as the XML document, you can use the xsi:noNamespaceSchemaLocation attribute:

<?xml version="1.0" encoding="UTF-8"?>
<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="note.xsd">
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

In the above example, the XSD file is named "note.xsd" and it is located in the same directory as the XML document. If the XSD file and the XML document are in different namespaces, you need to use the xsi:schemaLocation attribute and define your namespace:

<?xml version="1.0" encoding="UTF-8"?>
<note xmlns="http://www.example.com" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com note.xsd">
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

In the above example, we defined a namespace "http://www.example.com" and associated it with the XSD file "note.xsd". Note that the namespace URL does not need to be an actual URL, it is just a namespace identifier.

3. Java parsing XML

1 Introduction

There are four ways to parse XML files:

  • DOM parsing : When parsing XML, all elements in the document are constructed into a tree structure in memory according to the hierarchical relationship in which they appear. The advantage is that the contents of nodes can be traversed and modified, but the memory pressure is large and the parsing is slow. (poor performance)
  • SAX parsing : It is an alternative method of XML parsing, which is faster and more effective than the DOM method. Its characteristic is that the node content cannot be modified (second in performance). SAX is an event-based parser that parses XML documents from beginning to end. It triggers a series of events when reading the document (such as start document, start element, end element, end document, etc.) and calls predefined methods to handle these events. Since it does not need to load the entire document into memory, it takes up less memory and is suitable for parsing large documents.
  • JDOM parsing : It only uses concrete classes, not interfaces (inflexible and poor performance). JDOM is a Java-specific XML parser that combines the advantages of SAX and DOM. JDOM uses concrete classes instead of interfaces, simplifying the complexity of the DOM and providing a more intuitive way to manipulate XML. JDOM can load XML documents into memory and provides a tree-structure-based way to access and modify data. However, for large documents, it may cause excessive memory usage.
  • DOM4J parsing : It is an intelligent branch of JDOM that incorporates many functions beyond basic XML documents. For example, Hibernate uses DOM4J parsing (the highest performance). DOM4J is an open source Java XML parser, similar to JDOM, which combines DOM and the advantages of SAX. DOM4J provides a flexible API and good performance, suitable for processing complex XML documents in Java applications. It can switch to event-driven processing at any time, which means it can handle very large documents without consuming large amounts of memory.

The first two methods are basic methods and are officially provided (independent of the platform). The latter two are extension methods, which are extended on the basic method and are only applicable to the Java platform.

2. Parse XML files

Define XML file

<?xml version="1.0" encoding="UTF-8" ?>
<students>
    <student>
        <id>1</id>
        <name>kobe</name>
        <age>23</age>
    </student>
    <student>
        <id>2</id>
        <name>james</name>
        <age>24</age>
    </student>
</students>

Use DOM4J to parse XML (you need to download the relevant jar package from the mvn warehouse before use)

  public static void main(String[] args) throws DocumentException {
    
    
        //1. 加载XML文件到jvm中,形成数据流
        InputStream resourceAsStream = TestXML_1.class.getClassLoader().getResourceAsStream("xml/4.xml");
        //2. 创建解析对象
        SAXReader saxParser= new SAXReader();
        //3. 获得整个文档对象(整个xml文件)[将数据流转换为一个文档对象]
        Document read = saxParser.read(resourceAsStream);
        //4. 首先读取根节点
        Element rootElement = read.getRootElement();
        //5. 获得根元素下的所有子元素
        List<Element> elements = rootElement.elements();
        elements.forEach(s-> System.out.println(s));
        for (Element element : elements) {
    
    
            List<Element> elements1 = element.elements();
            for (Element element1 : elements1) {
    
    
                System.out.println(element1.getName()+":"+element1.getData());
            }
        }
    }

Insert image description here

Parsing properties

<students>
    <student type="usa" color="black">
        <id>1</id>
        <name>kobe</name>
        <age>23</age>
    </student>
    <student type="china" color="yellow">
        <id>2</id>
        <name>guoailun</name>
        <age>24</age>
    </student>
</students>
public class TestXML_1 {
    
    
    public static void main(String[] args) throws DocumentException {
    
    
        //1. 加载XML文件到jvm中,形成数据流
        InputStream resourceAsStream = TestXML_1.class.getClassLoader().getResourceAsStream("xml/4.xml");
        //2. 创建解析对象
        SAXReader saxParser= new SAXReader();
        //3. 获得整个文档对象(整个xml文件)[将数据流转换为一个文档对象]
        Document read = saxParser.read(resourceAsStream);
        //4. 首先读取根节点
        Element rootElement = read.getRootElement();
        //5. 获得根元素下的所有子元素
        List<Element> elements = rootElement.elements();
        for (Element element : elements) {
    
    
            System.out.println(element);
            Attribute type = element.attribute("type");
            System.out.println("type:"+type.getValue());
        }
    }
}

Insert image description here

Add XML elements using java program

    public static void main(String[] args) throws DocumentException, IOException {
    
    
        //1. 加载XML文件到jvm中,形成数据流
        InputStream resourceAsStream = TestXML_1.class.getClassLoader().getResourceAsStream("xml/4.xml");
        //2. 创建解析对象
        SAXReader saxParser= new SAXReader();
        //3. 获得整个文档对象(整个xml文件)[将数据流转换为一个文档对象]
        Document read = saxParser.read(resourceAsStream);
        //4. 首先读取根节点
        Element rootElement = read.getRootElement();
        //5. 创建元素节点
        Element student = rootElement.addElement("student");
        Element id = student.addElement("id");
        Element name = student.addElement("name");
        Element age = student.addElement("age");
        id.setText("3");
        name.setText("curry");
        age.setText("36");
        //6. 写入到xml文件
        FileOutputStream out=new FileOutputStream(new File("/Users/jackchai/Desktop/自学笔记/java项目/leetcode/leetcodetest/src/xml/4.xml"));
        OutputFormat format=new OutputFormat("\t",true,"UTF-8");
        XMLWriter  writer=new XMLWriter(out,format);
        writer.write(read);
        writer.close();
    }

Insert image description here

4. Xpath

1 Introduction

xpath is a way to quickly find information in xml documents. When simply using Dom4j, you can only obtain and process elements layer by layer. With xpath, accessing hierarchical nodes is very simple.

2. Use of Xpath

Import dependency packages

Insert image description here

parsing method

public class TestXML_1 {
    
    
    public static void main(String[] args) throws DocumentException, IOException {
    
    
        //1. 加载XML文件到jvm中,形成数据流
        InputStream resourceAsStream = TestXML_1.class.getClassLoader().getResourceAsStream("xml/4.xml");
        //2. 创建解析对象
        SAXReader saxParser= new SAXReader();
        //3. 获得整个文档对象(整个xml文件)[将数据流转换为一个文档对象]
        Document read = saxParser.read(resourceAsStream);
        //4. 首先读取根节点
        Element rootElement = read.getRootElement();
        //5. 获取所有学生信息
        List<Node> student = rootElement.selectNodes("student");
        student.forEach(s-> System.out.println(s));
        //6. 获得所有学生的名字
        List<Node> nodes = rootElement.selectNodes("student/name");//也可以使用"//name":忽略层级和位置只获取name标签
        nodes.forEach(s-> System.out.println(((Element)s).getData()));
        //7. 获得第一个学生信息
        List<Node> node = rootElement.selectNodes("student[1]");
        System.out.println(node);
        //8. 获得所有带有type属性的学生的名字
        List<Node> node3 = rootElement.selectNodes("student[@type]/name");
        //9. 获得指定属性值的学生
        List<Node> node4 = rootElement.selectNodes("student[@type=\"usa\"]/name");
        //10. 获得年龄超过30的学生
        List<Node> node8 = rootElement.selectNodes("student[age>22]/name");
        System.out.println("123"+node8);
    }
}

Guess you like

Origin blog.csdn.net/qq_43456605/article/details/131153970