1.XML concept
XML: Extensible Markup Language, 可扩展标记语言
Extensible: tags are customizable, write what to write what they meet the label's name
2.XML function - storing data
- As XML
配置文件
- After the data can save up
在网络中传输
(XML is plain text, and its language and platform-independent)
3.XML and HTML difference
Extended --W3C: World Wide Web Consortium, created in 1994, released a number of far-reaching impact on web technology standards and implementation guidelines, which will contain XML and HTML
XML和HTML区别
:
- XML tags are customizable, HTML tags are predefined
- Strict XML syntax, HTML syntax loose
- XML is to store data, HTML data is showing
4.XML grammar
4.1. Simple XML code
Create a file on the desktop, XML file extension is ".xml", written after the introductory period can be opened with Notepad XML code:
<?xml version = '1.0'?>
<users>
<user id = "1">
<name>zhangsan</name>
<age>23</age>
<gender>male</gender>
</user>
<user id = "2">
<name>lisi</name>
<age>22</age>
<gender>female</gender>
</user>
</users>
How to write XML code to verify correct? XML文档可以被所有浏览器解析
, The browser has a corresponding XML parsing engine, as long as the XML file into the error does not come, then it shows the XML code is no problem, as follows:
4.2. The basic syntax
- XML document
后缀名
".xml" - The first line of the document (no blank lines or spaces in front) must have
文档声明
- XML documents and only a
根标签
属性值
Must be enclosed in quotation marks, single or double quotation marks can be `- Must be
正确闭合
either self-closing and, like
, have a beginning or end tag labels to match each other - XML tags
区分大小写
4.3 part
4.3.1. Document declaration
4.3.1.1. Format
<? Xml list of attributes?>: Note that there are no spaces between the question mark and xml
4.3.1.2. Attribute list
- version: version number, write 1.0,
不写version会报错
- encoding: encoding, told parsing engine, the current character set used by the document,
默认ISO-8859-1
- standalone: Are independent, yes (does not depend on other files) and no (dependent on other files) two values in practice
很少专门去设置
4.3.2. Instructions
Use in conjunction with css
4.3.3. Label
Custom label name, 自定义规则
as follows:
- The name can contain characters, numbers, and other characters
- The name can not start with a number or punctuation
- The name can not start with xml (or XML, Xml etc.)
- The name can not contain spaces
4.3.4. Properties
id属性值唯一
4.3.5. Text
特殊字符要转义
, Such as greater than number is smaller than number, etc., to make it easier, we have a CDATA region, the following format:
<![CDATA[
要展示的数据
]]>
Examples are as follows:
<code>
<!--编写代码:if(a < b && a > c){}-->
<!--转义-->
if(a < b && a > c){}
<!--CDATA块-->
<![CDATA[
if(a < b && a > c){}
]]>
</code>
4.3.6. Notes
<!--注释内容-->
5.XML constraints
5.1. The basic concepts of constraint document
Who write XML? - user software users
who parse XML: - Software
as we need to:
- Can
在xml中引入
constraint file - Can
简单地读懂
(many development environments can automatically provide the appropriate documents suggesting the constraints, we simply need to read to) constraint document
5.2. Constraints document technology
The market constraint document is divided into two categories:
- DTD: Simple technical constraints
- Schema: the more complex technical constraints
5.2.1.DTD constraints
A simple DTD constraint document:
The introduction of 5.2.1.1.DTD
内部的DTD
(uncommonly used)
The rules define constraints in xml document
外部的DTD
The rules defined in the external constraints of the DTD file, an external DTD in two ways:
- local:
- The internet:
The disadvantage of 5.2.1.2.DTD
约束性不够强
5.2.2.Schema constraints
A simple constraint Schema document:
每个自定义类型都进行了更详细的定义
5.2.2.1. Schema constraint introduced
- Fill the root element of the XML document
- The introduction of xsi prefix
- The introduction of xsd file namespace
- For each xsd constraint specifies a prefix as identification
6.XML resolve
Analysis: xml document operation, the read data of the document into memory
Manipulate XML documents:
解析
(Read): The data in the document into memory写入
: Save the in-memory data into an XML document, persistent storage
6.1 XML parsing mode:
6.1.1.DOM
DOM: The disposable loading a markup language document into memory, a DOM tree formed in memory
DOM优点
:
- Easy to operate, can be CRUD operations for all documents
DOM缺点
:
- The disposable loading everything into memory when the file is particularly large, the resulting tree structure is very much memory
6.1.2.SAX
SAX: read line by line, based on event-driven
SAX优点
: memory is always only one line, do not account for memory, memory is suitable for smaller devices
SAX缺点
: read only, not additions and deletions
6.2.XML common parser
6.2.1.JAXP (use less)
Provided by Sun, support DOM and SAX two ideas
6.2.2.DOM4J (Excellent)
Based on the excellent DOM parser
6.2.3.jsoup
A Java HTML parser can parse a URL address directly, HTML text, which provides a very labor-saving API, operating data can be read through DOM, CSS and jQuery method of operation is similar to the
6.2.3.1.jsoup Quick Start
step:
- Import jar package
- Gets Document object that represents the whole DOM tree structure
Acquisition method:
- Parsing from a URL, file or string
- Using DOM or CSS selectors to locate, retrieve data
- Operable elements, attributes, text
- Obtaining the corresponding tab: Element Object
- retrieve data
Use 6.2.3.2.jsoup object
Jsoup
: Tools, parsing HTML or XML document and return Document, mainly to understand the parse method
parse: parse parse HTML or XML document and return Document
- parse (File in, String charsetName): parse XML or HTML file
- parse (String html): parse XML or HTML string
- parse (URL url, int timeoutMillis): Gets the specified object html or xml path through the network, the more common, will be used when doing reptiles
Document
: Inherited from Element, it is a document object that represents the memory DOM tree
Mainly used to get the Element object:
- getElementById (String tagName): Element object according to obtain a unique ID attribute value (used very much)
- getElementByTag (String tagName): Gets the object collection element according to the label name
- getElementByAttribute (String key): Gets a collection of objects based on the attribute name element
- getElementByAttributeValue (String key, String value): to get the object set according to the corresponding attribute name element and attribute values
Elements
: Element object collection element, as mayArrayList<Element>
be usedElement
: Element object, you can get element object, attribute values, text, etc.
1. Get the child element object:
- getElementById (String tagName): Element object according to obtain a unique ID attribute value (used very much)
- getElementByTag (String tagName): Gets the object collection element according to the label name
- getElementByAttribute (String key): Gets a collection of objects based on the attribute name element
- getElementByAttributeValue (String key, String value): to get the object set according to the corresponding attribute name element and attribute values
2. Obtain the property value
- String attr (String key): Gets the property value based on the attribute name
3. Get text
- String text (): Gets
纯文本内容
- String html (): Get the entire contents of the label body (including the contents of the string member child of the tag)
Node
: Node object, a parent object Document and Element
Provide quick and easy way 6.2.3.3.jsoup
selector
: Selector, then known hierarchical query syntax deeper content more convenient
A method used: SELECT Elements (String cssQuery)
2. Syntax: Reference Selector syntax defined in class
3. A more complex example:
XPath
: XPath XML Path Language i.e., it is a language used to determine the position of a portion of an XML document
Jsoup to use the XPath
额外导入一个Jar包
because XPath for XML query though, but XML itself and is independent
inquiry W3CSchool reference manual to complete the query using XPath syntax
6.2.4.PULL
Based on the Android operating system parser, SAX way