[XML] Superficial XML

table of Contents

1. XML Introduction

Two, XML syntax

     1. Document statement

     2. Label (element)

     3. Properties

     4. Notes

     5. CDATA area

Three, XML constraints

      1. DTD constraints

          1.1 Document format

          1.2 Understanding of constraint elements

          1.3 Tag type

          1.4 Quantifier

         1.5 Property declaration

     2. Schema constraints

          2.1 Document format

Four, XML parsing

      1. Three analytical methods

      2. Analysis tools

     3. Analysis principle and structural model 

     4. DOM4J and XPath usage and methods

          4.1 Common methods

           4.2 XPath use


1. XML Introduction

       XML: its full name---"Extensible editing (tag) language. It is an independent language with its own grammatical rules . It is essentially a "plain text file". XML is designed to transmit data, not display data, and XML tags are not predefined, we can define tags by ourselves

Two, XML syntax

The main components of XML syntax: document declaration, tags, attributes, comments, escape characters and CDATA area

     1. Document statement

format:

<?xml version = “1.0(或者1.1)"  encoding = “UTF-8" ?>

effect:

The document declaration is for the editor and parser of the XML file to see, and is used to identify the XML version syntax and encoding method.

Precautions:

1. A standard XML document is usually defined at 0 rows and 0 columns

2. The document statement is not required

     2. Label (element)

Tags: Also called "tags, elements", they are an important part of XML

format:

1) Complete label: ----》 <name>James<name>

2) Single label (self-closing and label)   : ----》 <student attribute = "attribute value" age="18"/> no label body

Precautions:

1) The name of the label can only contain: letters (including Chinese), numbers, and four symbols (_, -,:,.)

2) Numbers and symbols cannot start, and there can be no "spaces" in between [Try to start with English, end with English, do not carry Chinese, numbers, symbols, etc.]

3) XML tags are case sensitive

4) Tags can be nested

Example:

<students>
    <student>
        <name>张三</name>
    </student>
</students>

     3. Properties

format:

Attributes can be defined for any tag, complete tags can only be written in the start tag, self-closing and tags cannot be written after/

Precautions:

1) The attribute value must be enclosed in a pair of double quotes or single quotes

2) A label can have multiple attributes, but cannot have the same name

3) A label can define multiple attributes, and each attribute needs to be separated by ""space""

<student id = “it001" name = “张三" age = “18" sex = “男"/>

     4. Notes

format:

<!--Comment content-->

Precautions:

1) Comments cannot be written in the "label name"

2) The master cannot be nested

<!-- 学员信息-->
<students>
    <student>
        <!-- 学员信息-->
        <name>张三</name>
        <age>19</age>
    </student>
</student>

     5. CDATA area

In XML, there are some unique key symbols, such as (>, <, &, etc.). This kind of character is easy to affect the encoding of XML. One way to solve this problem is to use the conversion delimiter: several commonly used conversions. The definition is as follows:

                

Another solution is to write the XML content in the CDATA area :

        The content in the CDATA area is all considered to be text without special symbols.

<![CDATA[
        String str = “fjdsEFeafeEW1432”;
        int count = 0;
        for(int i = 0;i < str.length() ; i++){
            char c = str.charAt(i);
            if(c >= ‘0’ && c <= ‘9’){
                count++;
            }
        }
        System.out.println(“count = “ + count);
]]>

Three, XML constraints

        XML constraints: In our application of XML, XML should be given some constraints to standardize XML. In the application process, there are two types of constraints: one is DTD constraints, and the other is Schema constraints.

         DTD constraint: belongs to the constraint of the old version, the syntax is concise, the function is relatively single, and it is suitable for some small and simple documents.

         Schema constraint: It belongs to the new version constraint, with complex syntax and powerful functions. Suitable for some large and complex documents

          Constraints can be used to restrict:

          1) The tags that can appear in the document 2) The inclusion relationship between tags                                              

          3) The order in which the tags appear 4) The frequency of the tags, etc...

      1. DTD constraints

          1.1 Document format

<?xml version="1.0" encoding="UTF-8" ?>
<!--
    在需要使用此dtd的xml中引入约束
    <!DOCTYPE 书架 SYSTEM "book.dtd">
-->
<!ELEMENT 书架 (书+)>
<!ELEMENT 书 (书名,作者,售价)><!--约束元素书的子元素必须为书名、作者、售价-->
<!ELEMENT 书名 (#PCDATA)>
<!ELEMENT 作者 (#PCDATA)>
<!ELEMENT 售价 (#PCDATA)>

Use the following statements in XML to introduce constraints:

<!DOCTYPE 书架 SYSTEM "book.dtd">

Then you can start to write the content of the XML file:

<?xml version="1.0" encoding="utf-8" ?><!--文档声明-->
<!DOCTYPE 书架 SYSTEM "book.dtd"><!--引入DTD约束-->
<书架>
    <书>
        <书名></书名>
        <作者></作者>
        <售价></售价>
    </书>
</书架>

          1.2 Understanding of constraint elements

<?xml version="1.0" encoding="UTF-8" ?>
<!--
    在需要使用此dtd的xml中引入约束
    <!DOCTYPE 书架 SYSTEM "book.dtd">
-->
<!ELEMENT 书架 (书+)><!--约束根元素是“书架”,“书架的子元素为书,”,“+”为数量词-->
<!ELEMENT 书 (书名,作者,售价)><!--约束元素书的子元素必须为书名、作者、售价-->
<!ELEMENT 书名 (#PCDATA)>
<!ELEMENT 作者 (#PCDATA)>
<!ELEMENT 售价 (#PCDATA)>

          1.3 Tag type

Label type
Label type Code writing Description
PCDATA (#PCDATA) Interpreted string data
EMPTY EMPTY Empty element
ANY ANY Any type
<!ELEMENT 售价 (#PCDATA)> <!--"售价"元素体为字符串数据-->
<!ELEMENT 出版日期 ANY> <!--"出版日期"元素体为任意类型-->
<!ELEMENT 版本号 EMPTY> <!--"版本号"元素体为空元素

          1.4 Quantifier

Quantifier
Quantifier meaning
* Elements can appear 0~multiple
+ Elements can appear 1~multiple
Elements can be 0 or 1
Elements are displayed in order
| Element needs to select one of them

         1.5 Property declaration

<!ATTLIST 标签名称
       属性名称  属性类型    属性说明>
Attribute type
Attribute type meaning
CDATA Attribute is a text string
ID The attribute value is unique and cannot start with a number
ENUMERATED Enumeration within execution scope
Property description
Property description meaning
#REQUIRED Attribute must have
#IMPLIED Attributes are optional
#FIXED Property fixed value
<!ATTLIST 书                                <!--设置"书"元素的的属性列表-->
        id ID #REQUIRED                     <!--"id"属性值为必须有-->
        编号 CDATA #IMPLIED                  <!--"编号"属性可有可无-->
        出版社 (清华|北大|传智播客) "传智播客" <!--"出版社"属性值是枚举值,默认为“传智播客”-->
        type CDATA #FIXED "IT"              <!--"type"属性为文本字符串并且固定值为"IT"-->
>

     2. Schema constraints

Simple grammar, introduced later

          2.1 Document format

<?xml version="1.0" encoding="UTF-8" ?>
<!--
	将注释中的以下内容复制到要编写的xml的声明下面
	复制内容如下:
	<书架 xmlns="http://www.itcast.cn"
		  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		  xsi:schemaLocation="http://www.itcast.cn bookshelf.xsd"
    >
 -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.itcast.cn"
           elementFormDefault="qualified">
    <xs:element name='书架' >
        <xs:complexType>
            <xs:sequence maxOccurs='unbounded' >
                <xs:element name='书' >
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name='书名' type='xs:string' />
                            <xs:element name='作者' type='xs:string' />
                            <xs:element name='售价' type='xs:double' />
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

schame constraint: able to constrain the type of data

Four, XML parsing

      1. Three analytical methods

1. DOM analysis: read all the contents of the document into memory at one time, and generate a DOM tree model in the memory.

[Advantages: You can add or delete elements. Disadvantages: slow processing and memory usage, only suitable for small files]

2. SAX analysis: read one line at a time, parse one line (only read)

[Benefits: fast processing, no memory usage. Disadvantages: no document structure, so you can’t add or delete the content of the document]

3. PULL analysis: Android's analysis method

      2. Analysis tools

1) JAXP parsing: The parsing tool that comes with the JDK is primitive and low-level, so it is not convenient to use

2) JSOUP parsing: parsing HTML (web crawler)

3) JDOM analysis: brother of DOM4J, the function is relatively weak, rarely used

4) DOM4J parsing: one of the most widely used XML parsing tools, internally combining DOM and SAX parsing methods

     3. Analysis principle and structural model 

        Parsing principle : XML DOM is the same as HTML DOM. The XML file is loaded into memory, a DOM tree is generated, and a Document object is obtained. The DOM can be operated through the Document object.

Structure model:

                        

     4. DOM4J and XPath usage and methods

Steps for usage:

    1) dom4j is a third-party "package"

    Link: https://pan.baidu.com/s/1SPxnd7yP_SsgygNKee9Qsw 
    Extraction code: kcv6

    2) Copy the dom4j jar package to the module directory, then right-click to add to the class library (add as library)

          4.1 Common methods

Construction method
method effect
new  SAXReader Create a saxreader object (construction method)
Document  read(string url) Load and execute xml document
Document object
method effect
Element getRootElement () Get the root element
Element object
method effect
List<Element>   elements([String element]) Get the child elements of the root element (you can specify or not specify the root element name)
Element element([String element]) Get the first child element (you can specify or not specify the element name)
string  getName() Get the name of the current element
string attributeValue(string  attrName) Get the attribute value of the specified attribute name
string  elementText(string element) Get the text value of the specified name element
string  getText() Get the text content of the current element
public class Test {
    public static void main(String[] args) throws DocumentException {
        //3、利用类加载器创建InputStream流对象
        InputStream in = Test.class.getClassLoader().getResourceAsStream("books.xml");
        //1、创建saxReader对象
        SAXReader saxReader = new SAXReader();
        //2、读取xml文件,获取DOM对象
        Document document = saxReader.read(in);
        //4、获取XML文件的根目录
        Element rootElement = document.getRootElement();

        //5、通过根元素对象获取子元素对象
        List<Element> listElements = rootElement.elements();
        //6、遍历子元素对象
        for (Element listElement : listElements) {
            //获取子元素的id属性名
            /*Attribute name = listElement.attribute("id");
            String value = name.getValue();
            System.out.println(value);
            */
            //获取子元素的子元素
            //  List<Element> elements = listElement.elements();
            // // //遍历子元素集合
            // for (Element childElement : elements) {
            //     //获取指定标签的文本数据
            //     // System.out.println(childElement.getText());//1
            //     // System.out.println(childElement.getStringValue());//2
            // }
            //3、根据名称获取元素
            Element name = listElement.element("name");
            //根据元素获取标签数据
            String text = name.getText();
            //输出值
            System.out.println(text);
            // 4、
            // System.out.println(listElement.elementText("name"));
        }
    }
}

           4.2 XPath use

在DOM4J的基础上,在此导入jaxen-1.1-beta-6.jar包

链接:https://pan.baidu.com/s/1Q5QgiVsdz-v1U-ThpC-gag 
提取码:0lix

可以直接写出想要获取文本信息的路径,然后使用selectISingleNode就可以获取:

public class Test {
    public static void main(String[] args) throws DocumentException {
        //3、利用类加载器创建输出流对象
        InputStream in = Test.class.getClassLoader().getResourceAsStream("books.xml");
        //1、创建 Saxreader对象
        SAXReader saxReader = new SAXReader();
        //2、读取xml文件,获取DOM对象
        Document document = saxReader.read(in);
        //4、获取xml文件的根元素
        Element rootElement = document.getRootElement();
        //5、获取指定标签的文本数据
        Node node = rootElement.selectSingleNode("/books/book/name");
        //6、获取节点的文本数据并解析
        System.out.println(node.getText());
    }
}

 

Guess you like

Origin blog.csdn.net/weixin_43267344/article/details/108268887