XML notes (grammar, format, java parsing xml)

Introduction to XML

What is xml?

XML is an extensible markup language.

The role of xml?

The main functions of xml are:

  1. Used to save data, and these data are self-descriptive

  2. It can also be used as a configuration file for projects or modules

  3. It can also be used as a format for network transmission (now JSON is the main format)

xml syntax

  1. Document statement.

  2. Element (label)

  3. xml attribute

  4. xml comment

  5. Text area (CDATA area)

xml format

<?xml version="1.0" encoding="UTF-8"?> xml 声明。
<!-- xml 声明version 是版本的意思encoding 是编码-->
而且这个<?xml 要连在一起写,否则会有报错

Attributes

version is the version number
encoding is the xml file encoding
standalone="yes/no" indicates whether the xml file is a standalone xml file

Code case

<?xml version="1.0" encoding="UTF-8"?>
<!-- xml 声明version 是版本的意思encoding 是编码-->
<books> <!-- 这是xml 注释-->
<book id="SN123123413241"> <!-- book 标签描述一本图书id 属性描述的是图书的编号-->
<name>java 编程思想</name> <!-- name 标签描述的是图书的信息-->
<author>华仔</author> <!-- author 单词是作者的意思,描述图书作者-->
<price>9.9</price> <!-- price 单词是价格,描述的是图书的价格-->
</book>
<book id="SN12341235123"> <!-- book 标签描述一本图书id 属性描述的是图书的编号-->
<name>葵花宝典</name> <!-- name 标签描述的是图书的信息-->
<author>班长</author> <!-- author 单词是作者的意思,描述图书作者-->
<price>5.5</price> <!-- price 单词是价格,描述的是图书的价格-->
</book>
</books>

xml comment

html and XML comments are the same:

Element (label)

Let's recall first:
html tag:

格式:<标签名>封装的数据</标签名>
单标签: <标签名/> <br /> 换行<hr />水平线
双标签<标签名>封装的数据</标签名>
标签名大小写不敏感
标签有属性,有基本属性和事件属性
标签要闭合(不闭合,html 中不报错。但我们要养成良好的书写习惯。闭合)

What are XML elements?

XML element refers to the part from (and including) the start tag to (and including) the end tag.

Elements can contain other elements, text, or a mixture of both. Elements can also have attributes.

<bookstore>
<book category="CHILDREN">
  <title>Harry Potter</title> 
  <author>J K. Rowling</author> 
  <year>2005</year> 
  <price>29.99</price> 
</book>
<book category="WEB">
  <title>Learning XML</title> 
  <author>Erik T. Ray</author> 
  <year>2003</year> 
  <price>39.95</price> 
</book>
</bookstore> 

In the above example, <bookstore>and <book>we have element content, because they contain other elements. <author>Only text content, because it only contains text.

In the above example, only the <book>element has the attribute (category = "CHILDREN").

XML naming rules

XML elements must follow the following naming rules:

  • The name can contain letters, numbers and other characters
    • The name cannot start with a number or punctuation
      Insert picture description here
  • The name cannot start with the characters "xml" (or XML, Xml)
  • Insert picture description here
  • The name cannot contain spaces. Any name can be used and there are no reserved words.
    Insert picture description here

Elements (tags) in xml are also divided into single tags and double tags:

Single tag
format: <tag name attribute="value" attribute="value"… />
double tag
format: <tag name attribute="value" attribute="value" …>text data or subtag</tag name>
Insert picture description here

xml attribute

The tag attributes of xml and html are very similar.Attributes can provide additional information about the element
Attributes can be written on the label:
multiple attributes can be written on a label.The value of each attribute must be enclosed in quotation marks.
The rules are consistent with the label writing rules.

Grammar rules:

  1. All XML elements must have closing tags (that is, closed)

  2. XML tags are case sensitive

  3. XML must be nested correctly

  4. An XML document must have a root element. The
    root element is the top-level element, and the element
    without a parent tag is called the top-level element.
    The root element is the top-level element without a parent tag, and it is the only one.

  5. XML attribute values ​​must be quoted

  6. Special characters in XML
    Insert picture description here

  7. Text area (CDATA area)
    CDATA grammar can tell the xml parser that the text content in my CDATA is just plain text and does not need xml grammar to parse the
    CDATA format:

![CDATA[ 这里可以把你输入的字符原样显示,不会解析xml ]]>
Insert picture description here

Introduction to xml parsing technology

xml extensible markup language.
Regardless of whether it is an html file or an xml file, they are all marked-up documents and can be parsed using the dom technology developed by the w3c organization
Insert picture description here
The document object represents the entire document (it can be an html document or an xml document)
Early JDK provided us with two xml parsing technologies, DOM and Sax introduction (outdated, but we need to know these two technologies)

The dom parsing technology is formulated by the W3C organization, and all programming languages ​​use the characteristics of their own language to implement this parsing technology. Java also implements the dom technology parsing mark.
Sun company upgraded the DOM parsing technology in the JDK5 version: SAX (Simple API for XML)
SAX parsing, which is not the same as the parsing formulated by W3C. It uses a similar event mechanism to tell the user what is currently being parsed through a callback.It reads the xml file line by line for analysis. Will not create a large number of dom objects.( Some parse all dom objects and then return them to the user )
So when it parses xml, it uses memory. And performance. Both are better than Dom analysis.

Third-party analysis:

  1. jdom is encapsulated on the basis of dom,

  2. dom4j encapsulates jdom again. (This Dom4j is a third-party analysis technology. We need to use a good class library provided by a third party to parse the xml file.

  3. Pull is mainly used in the development of Android mobile phones. It is very similar to sax. It is an event mechanism that parses xml files.

dom4j analysis technology (emphasis *****)

Since dom4j is not a technology of Sun company, but a technology of a third-party company, if we need to use dom4j, we need to download the jar package of dom4j from the official website of dom4j.

Use of Dom4j library

Directory structure
docs is the document directory

  1. How to check Dom4j documentation
    Insert picture description here
  1. Dom4j quick start
    Insert picture description here

lib directory
(The lib directory is that dom4j needs to rely on other third-party libraries)

The src directory is the source directory of the third-party library

dom4j programming steps:

Step 1: Load the xml file to create the Document object.
Step 2: Get the root element object through the Document object.
Step 3: Use the root element .elelemts (tag name); to return a collection, which is placed in this collection. All the element objects of the tag name you specify.
Step 4: Find the sub-element you want to modify or delete, and perform the corresponding operations.
Step 5, save to the hard disk

Case

<?xml version="1.0" encoding="UTF-8"?>
<books>
<book sn="SN12341232">
<name>辟邪剑谱</name>
<price>9.9</price>
<author>班主任</author>
</book>
<book sn="SN12341231">
<name>葵花宝典</name>
<price>99.99</price>
<author>班长</author>
</book>
</books>
/*
* 读取xml 文件中的内容
*/
@Test
public void readXML() throws DocumentException {
    
    
起始标签和结束标签之间的文本内容
// 第一步,通过创建SAXReader 对象。来读取xml 文件,获取Document 对象
SAXReader reader = new SAXReader();
Document document = reader.read("src/books.xml");
// 第二步,通过Document 对象。拿到XML 的根元素对象
Element root = document.getRootElement();
// 打印测试
// Element.asXML() 它将当前元素转换成为String 对象
// System.out.println( root.asXML() );
// 第三步,通过根元素对象。获取所有的book 标签对象
// Element.elements(标签名)它可以拿到当前元素下的指定的子元素的集合
List<Element> books = root.elements("book");
// 第四小,遍历每个book 标签对象。然后获取到book 标签对象内的每一个元素,
for (Element book : books) {
    
    
// 测试
// System.out.println(book.asXML());
// 拿到book 下面的name 元素对象
Element nameElement = book.element("name");
// 拿到book 下面的price 元素对象
Element priceElement = book.element("price");
// 拿到book 下面的author 元素对象
Element authorElement = book.element("author");
// 再通过getText() 方法拿到起始标签和结束标签之间的文本内容
System.out.println("书名" + nameElement.getText() + " , 价格:"
+ priceElement.getText() + ", 作者:" + authorElement.getText());
}
}

Guess you like

Origin blog.csdn.net/weixin_46168350/article/details/111876833