XML:
The concept: Extensible Markup Language Extensible Markup Language
- Scalable: tags are customizable.
- Features
- Storing data
- Profiles
- Transmission in the network
- Storing data
- The difference between the xml and html
- xml tags are customizable, html tags are predefined.
- xml strict syntax, html syntax loose
- xml is stored data, html is showing data
- w3c: World Wide Web Consortium
grammar:
- The basic syntax:
- Extension .xml xml document
- The first line must be defined as xml document declaration
- xml document and only one root tag
- Attribute values in quotation marks (odd and even can) to cause
- Tags must be properly closed
- xml tag names are case sensitive
Getting Started:
<?xml version='1.0' ?> <users> <user id='1'> <name>zhangsan</name> <age>23</age> <gender>male</gender> <br/> </user> <user id='2'> <name>lisi</name> <age>24</age> <gender>female</gender> </user> </users>
component:
- Document declaration
- format:
<?xml 属性列表 ?>
- List of attributes:
- version: The version number, required attributes
- encoding: encoding. Inform parsing engine used in the current document character set, the default value: ISO-8859-1
- standalone: independence
- Value:
- yes: do not rely on other files
- no: dependent on other files
- Value:
Command (Learn): css binding of impression data
<?xml-stylesheet type="text/css" href="a.css" ?>
- Label: Custom label name
- rule:
- The name can contain letters, numbers and other characters
- The name can not start with a number or punctuation
- The name can not start with the letters xml (or XML, Xml etc.)
- The name can not contain spaces
- rule:
- Properties:
the above mentioned id attribute values are unique - text:
- CDATA regions: the data in this area will be as display, can not special characters such as '<' & lt need
- format:
<![CDATA[ 数据内容 ]]>
- CDATA regions: the data in this area will be as display, can not special characters such as '<' & lt need
- Document declaration
constraint
Constraints: the provisions of the rules of writing xml document
- As a user of the frame (programmer):
- Constraints can be introduced in xml document
- Constraints can simply read the document
- classification:
DTD: A simple restriction technique, defective, incomplete defining attributes
- Schema: a complex technical constraints
- DTD:
- Introduced dtd document to xml document
- Internal dtd: the constraint rules defined in xml document
- The rules define constraints in external dtd file: external dtd
- local:
<!DOCTYPE 根标签名 SYSTEM "dtd文件的位置">
- The internet:
<!DOCTYPE 根标签名 PUBLIC "dtd文件名字" "dtd文件的位置URL">
dtd文件内容 <!ELEMENT students (student*) > <!ELEMENT student (name,age,sex)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT sex (#PCDATA)> <!ATTLIST student number ID #REQUIRED> XML引入内部dtd <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE students SYSTEM "student.dtd"> <students> <student number="0001"> <name>tom</name> <age>18</age> <sex>male</sex> </student> </students>
- Introduced dtd document to xml document
- Schema:
Introduction:
1. Fill xml document root element
2. Introduction xsi prefix xmlns:. Xsi = "http://www.w3.org/2001/XMLSchema-instance"
3. introducing xsd file namespace xsi:. SchemaLocation = " XXX / student.xsd "
4. xsd for each constraint specifies a prefix (xsd used to distinguish a plurality of documents), as the identification xmlns =" xxx / xml "( xmlns: a =" xxx / xml ")<students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="xxx/xml" xsi:schemaLocation="xxx/tudent.xsd">
- As a user of the frame (programmer):
Analysis: xml document operation, the read data of the document into memory
- Operating xml document
- Parsing (reading): Data is read the document into memory
- Write: save the data in memory to the xml document. Persistent storage
- Parse xml way:
- DOM: The one-time markup language document loaded into memory, the formation of a dom tree in memory
- Advantages: easy to operate, can be CRUD operations for all documents
- Cons: total memory
- SAX: read line by line, based on event-driven.
- Pros: do not take up memory.
- Disadvantages: can only be read, not additions and deletions
- DOM: The one-time markup language document loaded into memory, the formation of a dom tree in memory
- Common xml parser:
- JAXP: sun provided by the parser, and supports two ideas dom sax
- DOM4J: a very good parser
- Jsoup: jsoup is a Java HTML parser can parse a URL address directly, HTML text. It provides a very labor-saving API, which is taken out and manipulate data through DOM, CSS and an operation method is similar to jQuery.
- PULL: Android operating system, built-in parser, sax way.
- Jsoup: jsoup is a Java HTML parser can parse a URL address directly, HTML text. It provides a set of highly labor-saving API, which is taken out and manipulate data through DOM, CSS and an operation method is similar to jQuery.
- Getting Started:
- step:
- Import jar package
- Gets the Document object
- Acquiring a corresponding tag Element object
- retrieve data
- step:
- Code:
//2.1获取student.xml的path String path = JsoupDemo1.class.getClassLoader().getResource("student.xml").getPath(); //2.2解析xml文档,加载文档进内存,获取dom树--->Document Document document = Jsoup.parse(new File(path), "utf-8"); //3.获取元素对象 Element Elements elements = document.getElementsByTag("name"); System.out.println(elements.size()); //3.1获取第一个name的Element对象,elements继承ArrayList Element element = elements.get(0); //3.2获取数据 String name = element.text(); System.out.println(name);
- Getting Started:
- Objects of Use:
- Jsoup: tools, html or xml document can be resolved, return Document
- parse: parse html or xml documents, return Document
- parse (File in, String charsetName): parse xml or html file.
- parse (String html): parsed html or xml string
- parse (URL url, int timeoutMillis): Gets the document object specified html or xml path through the network
- parse: parse html or xml documents, return Document
- Document: document object. It represents the memory of the dom tree
- Gets the Element object
- getElementById (String id): obtain a unique id attribute value of the element according to the object
- getElementsByTag (String tagName): Gets the object collection element according to the label name
- getElementsByAttribute (String key): Gets the object collection element (id) based on the attribute name
- getElementsByAttributeValue (String key, String value): Gets an object collection element according to a corresponding attribute names and values
- Gets the Element object
- Elements: a collection of elements Element object. It can be used as ArrayList
To use - Element: element object
- Acquiring sub-element object
- getElementById (String id): obtain a unique id attribute value of the element according to the object
- getElementsByTag (String tagName): Gets the object collection element according to the label name
- getElementsByAttribute (String key): Gets a collection of objects based on the attribute name element
- getElementsByAttributeValue (String key, String value): Gets an object collection element according to a corresponding attribute names and values
- Gets the property value
- String attr (String key): Gets the property value based on the attribute name
- Get the text content
- String text (): Get the text content
- String html (): Get the entire contents of the label body (including the contents of a word string tag)
- Acquiring sub-element object
- Node: node object
- Document and Element is the parent class
import cn.wanghaomiao.xpath.exception.XpathSyntaxErrorException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.select.Elements; import java.io.IOException; import java.net.URL; public class JsoupDemo01 { public static void main(String[] args) throws IOException, XpathSyntaxErrorException { String path= JsoupDemo01.class.getClassLoader().getResource("").getPath(); URL url= new URL("https://www.baidu.com/"); Document document = Jsoup.parse(url,10000); Elements tag = document.getElementsByTag("map"); Elements btn = document.getElementsByAttributeValue("type","submit"); String text = btn.attr("value"); System.out.println(btn); System.out.println("-------------"); System.out.println(text); System.out.println("-------------"); } }
- Jsoup: tools, html or xml document can be resolved, return Document
- Quick and easy way:
- selector: selector
- Methods used: Elements select (String cssQuery)
- Syntax: Syntax class defined in the Reference Selector
- Methods used: Elements select (String cssQuery)
- XPath: XPath is the XML Path Language, which is a language for determining the position of a portion of XML (a subset of the Standard Generalized Markup Language) document
- Jsoup use of Xpath require additional import jar package.
- Queries w3cshool reference manual, xpath syntax of the query is complete
Code:
//1.获取student.xml的path String path = JsoupDemo6.class.getClassLoader().getResource("student.xml").getPath(); //2.获取Document对象 Document document = Jsoup.parse(new File(path), "utf-8"); //3.根据document对象,创建JXDocument对象 JXDocument jxDocument = new JXDocument(document); //4.结合xpath语法查询 //4.1查询所有student标签 List<JXNode> jxNodes = jxDocument.selN("//student"); for (JXNode jxNode : jxNodes) { System.out.println(jxNode); } System.out.println("--------------------"); //4.2查询所有student标签下的name标签 List<JXNode> jxNodes2 = jxDocument.selN("//student/name"); for (JXNode jxNode : jxNodes2) { System.out.println(jxNode); } System.out.println("--------------------"); //4.3查询student标签下带有id属性的name标签 List<JXNode> jxNodes3 = jxDocument.selN("//student/name[@id]"); for (JXNode jxNode : jxNodes3) { System.out.println(jxNode); } System.out.println("--------------------"); //4.4查询student标签下带有id属性的name标签 并且id属性值为pp List<JXNode> jxNodes4 = jxDocument.selN("//student/name[@id='sex']"); for (JXNode jxNode : jxNodes4) { System.out.println(jxNode); }
- selector: selector