XML - basic grammar and usage rules

Table of contents

1. Introduction to XML 

1.1. What is XML?

1.2. What is the difference between XML and HTML?

1.3. The role of XML

1.4, XML custom tags

2. The syntax structure of XML

2.1, XML naming rules

2.2. XML declaration

2.3. XML elements

2.4. XML comments

2.5. XML attributes

2.6, CDATA area

3. XML constraints

3.1, DTD constraints

3.2, DTD constraint extraction

3.3, DTD attributes

3.4, Schema constraints

4. Data analysis and common XML analysis methods

4.1. DOM method

4.2. SAX method

4.3, JDOM mode

4.4、DOM4J


1. Introduction to XML 

1.1. What is XML?

XML stands for Extensible Markup Language (EXtensible Markup Language).

XML is a markup language much like HTML.

XML is designed to transport data, not display it.

XML tags are not predefined. You need to define tags yourself.

XML is designed to be self-describing.

XML is a W3C Recommendation.

1.2. What is the difference between XML and HTML?

XML is not a replacement for HTML. XML is designed to transmit and store data, with an emphasis on the content of the data.

While HTML is designed to display data, its focus is on the appearance of data. 

XML is a tool for information transfer independent of software and hardware 

1.3. The role of XML

Usually xml is used as a configuration file. (as a configuration file in frameworks such as Spring Struts Hibernate Springmvc Mybatis).

1.4, XML custom tags

Tags are not defined in any XML standard (such as <to> and <from>). These tags were invented by the creators of XML documents. The XML language has no predefined tags. The tags used in HTML are all predefined. HTML documents can only use tags defined in the HTML standard (such as <p>, <h1>, etc.). XML allows authors to define their own tags and their own document structure.

2. The syntax structure of XML

2.1, XML naming rules

XML elements must follow the following naming conventions:

1. The name can contain letters, numbers and other characters.

2. The name cannot start with numbers or punctuation marks.

3. The name cannot begin with the letters xml (or XML, Xml, etc.).

4. The name cannot contain spaces.

5. Any name can be used, and there are no reserved words.

Best Naming Habits

Make the name descriptive. Underscored names are also fine: <first_name>, <last_name>.

The name should be short and simple, eg: <book_title>, not: <the_title_of_the_book>.

Avoid "-" characters. If you name things like this: "first-name", some software will think you want to subtract name from first.

Avoid "." characters. If you name it like this: "first.name", some software will think that "name" is a property of the object "first".

Avoid ":" characters. Colons are converted to namespaces for use (described later).

XML documents often have a corresponding database with fields that correspond to elements in the XML document. A practical rule of thumb is to use the database's naming conventions to name elements in XML documents.

Non-English letters such as éòá are perfectly legal in XML, but be aware of potential problems if your software vendor doesn't support these characters.

2.2. XML declaration

Optional part of the XML declaration file, if present, it needs to be placed on the first line of the document, as follows:

<!--声明xml文件,设置xml文件的编码,版本的信息-->
<?xml version="1.0" encoding="utf-8"?>

2.3. XML elements

An XML element refers to the section from (and including) the opening tag up to (and including) the closing tag.

写法:<标签名> 
在xml中,同样的去区分单标签和双标签
单标签 :<标签名 />
双标签:<标签名> 内容(文本,其他标签) </标签名>

标签名是我们自己定义的。建议大家。采用标识符的命名规则去给一个标签起名字。(数字字母下划线,并且数字不能作为开头)

案例:
描述 书籍的信息。书名字,作者,单价。


标签的书写注意事项:
1、xml中的所有标签必须闭合。
2、xml中的标签名称严格区分大小写。<User>  <user>
3、在xml标签名中间不要书写空格,或者 冒号   逗号 等符号。
	标签的名字不要有空格一类特殊符号。
4、标签名不要以数字开始。(可以按照标识符的方式给标签去命名)

5、书写xml标签时  ,标签不能互相嵌套。
<age>23<name>zhangsan></age></name>
6、所有的xml文件只能有一个根标签。

7、我们可以通过浏览器来校验xml文件的格式是否正确。
<?xml version="1.0" encoding="utf-8"?>
<books>
    <book>
        <name>三国演义</name>
        <author>罗贯中</author>
        <price>39.9</price>
        <version>1.0</version>
    </book>
</books>

2.4. XML comments

Comments in xml are written in the same way as comments in html.

<!-- Content --> ------- itself is a multi-line comment.

Note : The content of the comment will be displayed in the browser.

When writing comments, try to avoid -- characters

<!--
    1、每一个xml,有且只有一个根标签,所有xml标签必须写在根标签中
    2、标签必须要有合闭
    3、xml格式是否正确,可以通过浏览器打开xml。来校验xml格式是否正确
    4、xml是区别大小写的
    5、xml书写标签名时,不要出现空格等特殊字符
    6、标签命名时不要以数字开头
    7、在书写标签中时不要乱嵌套或相互嵌套  <name><age></name></age>
-->

2.5. XML attributes

Attribute: Written in the tag. Extend the tag's data. A further description of the label.

Writing method: <tag name attribute name = "attribute value" attribute name = "attribute value"> </tag name> The attribute name is also customized.

Pay attention to the problem:

1. If it is a double tag, the attribute should be written in the start tag

2. There should be no spaces in the attribute name, and special characters ";" and ":" should not appear.

3. The attribute value must be surrounded by single quotes or double quotes.

2.6, CDATA area

CDATA area : special characters can be output: display the content written in CDATA as it is. It will be displayed intact.

We can use predefined entities to replace the output of some special characters.

Pay attention to the wording of the entity: &entity name;

&lt; < less than
&gt; > more than the
&amp; & ampersand
&apos ; ' single quote
&quot ; "" double quote
<books>
    <book>
        <name>西游记</name>
        <!--
          为author添加扩展信息, 如:name , age 等
           1、多个属性之间用空间分隔
           2、属性要书写在开始标签内
           3、在xml中属性一定要用双引或单引,引起来
           4、属性名要按命名规则来
      -->
        <author sex="男" address="郑州">&lt;吴承恩&gt;</author>
        <pirce>50</pirce>
        <version>1.2</version>
    </book>
</books>

3. XML constraints

What are XML constraints: There are certain constraints on how to restrict the content in xml. In fact, I will use xml as a configuration file later, so I must read and understand the constraints in xml.

DTD purpose: Can understand the DTD file, based on this DTD file can write an xml file that meets the constraints.

There are two types of XML constraints: DTD constraints and Schema constraints.

3.1, DTD constraints

element definition

grammar keywords effect example
<!ELEMENT node name (#PCDATA)> #PCDATA Represents a text element, the content inside must be text <!ELEMENT name (#PCDATA)>
<!ELEMENT node name EMPTY> EMPTY Cannot contain any child elements and text, only attributes can be used <!ELEMENT version EMPTY>
<!ELEMENT node name(e1,e2)> (e1,e2) Represents a mixed element, indicating that there are other nodes in this label <!ELEMENT person (name,age,contact,br*)>

element limit

limit Format
the number of times an element can appear 0 or 1: ? 0~N: * 1~N: +

The number of elements here refers to: For example, the number of times the label book can appear in the root directory of books in the following case. 

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE books [
        <!--定义根标签books-->
        <!ELEMENT books (book*)>
        <!--定义book标签中的子类标签-->
        <!ELEMENT book (name,author,price,version)>
        <!--子类标签name,其特性为#PCDATA,文本内容-->
        <!ELEMENT name (#PCDATA)>
        <!--子类标签author作者-->
        <!ELEMENT author (#PCDATA)>
        <!--子类标签 price价格-->
        <!ELEMENT price (#PCDATA)>
        <!--版本 这里特性为空值-->
        <!ELEMENT version EMPTY>
        ]>
<books>
    <!--因为book是*,所以可以0次,可以多次-->
    <book>
        <name></name>
        <author></author>
        <price></price>
        <version/>
    </book>
</books>

After the <!DOCTYPE> constraint is added to the xml file, when the tag <books> is entered, its subclass tags will be automatically imported.

3.2, DTD constraint extraction

There is a disadvantage of writing DTD constraints into an xml file, that is, only this file can be used. In order to facilitate multiple files to share this DTD constraint in the future, we need to write the constraints into a separate .dtd file.

book.dtd file

<!--定义根标签books-->
        <!ELEMENT books (book*)>
        <!--定义book标签中的属性标签-->
        <!ELEMENT book (name,author,price,version)>
        <!--属性标签name,其特性为#PCDATA,文本内容-->
        <!ELEMENT name (#PCDATA)>
        <!--属性标签author作者-->
        <!ELEMENT author (#PCDATA)>
        <!--属性标签 price价格-->
        <!ELEMENT price (#PCDATA)>
        <!--版本 这里特性为空值-->
        <!ELEMENT version EMPTY>

books.xml file

<?xml version="1.0" encoding="UTF-8" ?>
<!--引入外部DTD文件-->
<!DOCTYPE books SYSTEM "book.dtd">
<books>
    <book>
        <name>西游记</name>
        <author>吴承恩</author>
        <price>29.9</price>
        <version/>
    </book>
    <book>
        <name>水浒传</name>
        <author>施耐庵</author>
        <price>39.9</price>
        <version/>
    </book>
</books>

3.3, DTD attributes

Define the attribute attribute list in the tag
, you can define multiple attributes, and define multiple attributes for a tag.
TagName: The attribute belongs to that tag.
<!ATTLIST tag name 
                attribute name attribute type (CDATA (EN1|EN2)) attribute constraint (REQUIRED)

For readability. Wrap multiple attributes.
                Constraints for the type attribute of the attribute-name attribute Constraints
                for the type attribute of the attribute-name attribute

value explain
CDATA represent text
#REQUIRED attribute is required
#IMPLIED attributes are not required
#FIXED attribute is a fixed string value

 book.dtd file

<!--定义根标签books-->
        <!ELEMENT books (book*)>
        <!--定义book标签中的属性标签-->
        <!ELEMENT book (name,author,price,version)>
        <!--属性标签name,其特性为#PCDATA,文本内容-->
        <!ELEMENT name (#PCDATA)>
        <!--属性标签author作者-->
        <!ELEMENT author (#PCDATA)>
        <!--属性标签 price价格-->
        <!ELEMENT price (#PCDATA)>
        <!--版本 这里特性为空值-->
        <!ELEMENT version EMPTY>

        <!--name属性是必须的-->
        <!--age属性是必须的-->
        <!--sex属性不是必须的-->
        <!--sco的值是固定100-->
<!ATTLIST author
                name CDATA #REQUIRED
                age  CDATA #REQUIRED
                sex  CDATA #IMPLIED
                sco  CDATA #FIXED "100"

book.xml file

<?xml version="1.0" encoding="UTF-8" ?>
<!--引入外部DTD文件-->
<!DOCTYPE books SYSTEM "book.dtd">
<books>
    <book>
        <name>西游记</name>
        <author name="吴承恩" age="48" sex="男" sco="100">吴承恩</author>
        <price>29.9</price>
        <version/>
    </book>
    <book>
        <name>水浒传</name>
        <author name="施耐庵" age="58" sex="男" sco="100">施耐庵</author>
        <price>39.9</price>
        <version/>
    </book>
</books>

3.4, Schema constraints

1. XML Schema is an XML-based DTD replacement.

2. XML Schema complies with the XML grammar structure and is extensible, with the suffix .xsd (xml schema document).

3. XML Schema more easily describes the allowed document content, as well as constraint definitions, and supports namespaces.

Configure the xsd file 

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://www.example.org/bookSchema"
           targetNamespace="http://www.example.org/bookSchema"
           elementFormDefault="qualified">
    <!--
        xmlns:xs="http://www.w3.org/2001/XMLSchema"  约束XML里使用xs:作前缀的元素、属性、类型等名称的变量是属于
        xmlns="http://www.example.org/bookSchema" 表示默认的命名空间是,也就是指定未使用任何前缀的元素、数据的命名空间为它.
        targetNamespace="http://www.example.org/bookSchema" 显示被此 schema 定义的元素来自命名空间
        elementFormDefault="qualified"  所有全局元素的子元素将被以缺省方式放到无名命名空间
    -->
    <xs:element name='books'>
        <xs:complexType>
            <!-- minOccurs/maxOccurs:指定元素出现的次数-->
            <!-- minOccurs:限制最小出现次数,0表示不限制 -->
            <!-- maxOccurs:限制最大出现次数,unbounded表示无限制 -->
            <xs:sequence maxOccurs='unbounded '>
                <xs:element name='book'>
                    <xs:complexType>
                        <xs:sequence>
                            <!--定义name标签-->
                            <xs:element name='name' type="xs:string"/>
                            <!--定义author标签-->
                            <xs:element name="author" >
                                <!--定义author标签的属性-->
                                <xs:complexType>
                                    <xs:simpleContent>
                                        <xs:extension base="xs:string">
                                            <xs:attribute name="name" type="xs:string"/>
                                        </xs:extension>
                                    </xs:simpleContent>
                                </xs:complexType>
                            </xs:element>
                            <!--定义price标签-->
                            <xs:element name='price' type="xs:double"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

import xsd file 

<?xml version="1.0" encoding="UTF-8" ?>
<books xmlns="http://www.example.org/bookSchema"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.example.org/bookSchema book.xsd">
    <book>
        <name>红楼梦</name>
        <author>曹雪芹</author>
        <price>29.9</price>
    </book>
    <book>
        <name>三国演义</name>
        <author>罗贯中</author>
        <price>39.9</price>
    </book>
</books>

4. Data analysis and common XML analysis methods

4.1. DOM method

DOM (Document Object Model). It is an official W3C standard that expresses XML documents in a platform- and language-independent way. It loads markup language documents into memory at one time, and forms a dom tree in memory.

advantage

Easy to operate, you can perform all CRUD operations on documents

shortcoming

Usually, the entire XML document needs to be loaded to construct a hierarchical structure, which consumes a lot of resources.

4.2. SAX method

The advantages of SAX processing are very similar to the advantages of streaming. Analysis can begin immediately, rather than waiting for all the data to be processed. Read line by line, event-driven.

What is event-driven: a callback-based program running method. Analyze layer by layer from outside to inside.

advantage

① There is no need to wait for all the data to be processed, and the analysis can start immediately. ② Only check data when reading data, no need to save in memory. ③ You can stop parsing when a certain condition is met, without having to parse the entire document. ④High efficiency and performance, capable of parsing documents larger than system memory.

shortcoming

Can only read, cannot add, delete, modify

Difficult to access data in different parts of the same document at the same time, does not support XPath

4.3, JDOM mode

JDOM (Java-based Document Object Model) aims to be a Java-specific document model that simplifies interaction with XML and is faster than implementing it with DOM.

advantage

①Using concrete classes instead of interfaces simplifies the DOM API. ②A large number of Java collection classes are used, which is convenient for Java developers.

shortcoming

① No better flexibility. ② Poor performance. 

4.4、DOM4J

dom4j is a Java XML API, similar to jdom, used to read and write XML files. Excellent performance, powerful function, easy to use and open source.  

advantage

①A large number of Java collection classes are used to facilitate Java developers, and at the same time provide some alternative methods to improve performance. ② Support XPath. ③It has good performance.

shortcoming

A large number of interfaces are used, and the API is more complicated

Next, I will demonstrate the use of DOM4J to read the contents of the xml file and output it to the console.

First of all, we need to create a lib package in the directory, put the jar package in the link address below into the lib package, and manually add it to the module.

 

The jar package link of DOM4J is as follows:

Link: https://pan.baidu.com/s/1gwt_vNjALoae1ZsfKh-t_Q?pwd=6ly7 
Extraction code: 6ly7 

Book.xml file

<?xml version="1.0" encoding="UTF-8" ?>
<books>
    <book id="1">
        <name>西游记</name>
        <author>吴承恩</author>
        <price>29.9</price>
        <version/>
    </book>
    <book id="2">
        <name>水浒传</name>
        <author>施耐庵</author>
        <price>39.9</price>
        <version/>
    </book>
</books>

test class

public class DOMTest {
    public static void main(String[] args) throws DocumentException {
        SAXReader reader=new SAXReader();
        Document document = reader.read("Book.xml");
        //获取根目录元素对象
        Element dom = document.getRootElement();
        //获取所有根目录下的子节点
        List<Element> elements = dom.elements();
        for (Element element : elements) {
            //输出子节点的属性id值
            System.out.println(element.attributeValue("id"));
            //输出name值
            System.out.println(element.elementText("name"));
            //输出author值
            System.out.println(element.elementText("author"));
            //输出price值
            System.out.println(element.elementText("price"));
            System.out.println("----------------");
        }
    }
}

Guess you like

Origin blog.csdn.net/select_myname/article/details/126262534#comments_26978724