Java rookie supply station-HTML, XML, and parsing XML

table of Contents

The difference between HTML and XML

Parse XML

DOM analysis

SAX analysis

Choose DOM or SAX?

DOM4J analysis

JDOM analysis


The difference between HTML and XML

1. XML is case-sensitive, but HTML does not.

2. In HTML, if the context clearly shows where the paragraph or list key ends, then you can omit closing tags such as </p> or </li>. In XML, the closing tag must not be omitted. 

HTML:<img src="1.jpg"><br><br>

XML:<img src="1.jpg"></img><br/><br/>

3. In XML, elements with a single tag but no matching closing tag must end with a / character. This way the analyzer knows that there is no need to look for the end tag.

4. In XML, attribute values ​​must be enclosed in quotation marks. In HTML, quotation marks are available or not.

5. In HTML, you can have attribute names without values. In XML, all attributes must have corresponding values.

 

  • XML is used to store and transmit data
  • HTML is used to display data
  • If you use HTML that fully complies with the XML syntax requirements, it is called conforming to the XHTML standard. Pages that comply with the XHTML standard are good for SEO.

Parse XML

  • DOM analysis

DOM(Document Object Model),

    DOM is the official W3C standard for representing XML documents in a platform- and language-independent way. DOM is a collection of nodes or pieces of information organized in a hierarchical structure. This hierarchical structure allows developers to find specific information in the tree. Analyzing this structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Because it is based on the information hierarchy, the DOM is considered to be tree-based or object-based.

[Advantages]
      ①Allow the application to make changes to the data and structure.
      ②The access is two-way, you can navigate up and down in the tree at any time to obtain and manipulate any part of the data.
[Disadvantages]
      ① It is usually necessary to load the entire XML document to construct the hierarchical structure, which consumes a lot of resources

  • SAX analysis

SAX(Simple API for XML)

  The advantages of SAX processing are very similar to those of streaming media. The analysis can start immediately instead of waiting for all the data to be processed. Moreover, since the application only checks the data when it is read, there is no need to store the data in the memory. This is a huge advantage for large documents. In fact, the application does not even have to parse the entire document; it can stop parsing when a certain condition is met. Generally speaking, SAX is much faster than its replacement DOM.
[Advantages]
① No need to wait for all data to be processed, analysis can start immediately.
②Check the data only when it is read, and it does not need to be stored in the memory.
③You can stop parsing when a certain condition is met, without parsing the entire document.
④High efficiency and performance, able to parse documents larger than system memory.
[Disadvantages]
① The application needs to be responsible for the processing logic of TAG (such as maintaining the parent-child relationship, etc.). The more complex the document, the more complex the program.
②One-way navigation, unable to locate the document level, and it is difficult to access different parts of the same document at the same time. XPath is not supported.
 

Choose DOM or SAX?

  • For developers who need to write their own code to process XML documents, choosing DOM or SAX parsing model is a very important design decision. DOM uses a tree structure to access XML documents, while SAX uses an event model.
  • The DOM parser converts the XML document into a tree containing its content, and can traverse the tree. The advantage of using the DOM parsing model is that it is easy to program. Developers only need to call the instructions to build the tree, and then use the navigation APIs to access the required tree nodes to complete the task. You can easily add and modify elements in the tree. However, because the entire XML document needs to be processed when using the DOM parser, the performance and memory requirements are relatively high, especially when encountering large XML files. Due to its traversal capabilities, DOM parsers are often used in services that require frequent changes to XML documents.
  • The SAX parser uses an event-based model. It can trigger a series of events when parsing an XML document. When a given tag is found, it can activate a callback method to tell that the tag specified by the method has been found . SAX's memory requirements are usually relatively low, because it allows developers to determine the tag to be processed, especially when developers only need to process part of the data contained in the document, SAX's expansion capability is better reflected . However, when using a SAX parser, coding is more difficult, and it is difficult to access multiple different data in the same document at the same time.
     
  • DOM4J analysis

DOM4J(Document Object Model for Java)

[Advantages]
① A large amount of Java collection classes are used, which is convenient for Java developers and provides some alternative methods to improve performance.
②Support XPath.
③It has good performance.
[Disadvantages]
①Large: A large number of interfaces are used, and the API is more complicated.

Although DOM4J represents a completely independent development result, initially, it was an intelligent branch of JDOM. It incorporates many functions beyond basic XML document representation, including integrated XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option of constructing document representation, and it has parallel access function through DOM4J API and standard DOM interface.

Since the second half of 2000, it has been under development. To support all these functions, DOM4J uses interfaces and abstract basic class methods. DOM4J makes extensive use of the Collections class in the API, but in many cases, it also provides some alternative methods to allow better performance or more direct coding methods. The direct benefit is that although DOM4J pays the price of a more complex API, it provides much greater flexibility than JDOM.
When adding flexibility, XPath integration, and large document processing goals, DOM4J's goals are the same as JDOM: ease of use and intuitive operation for Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goal of essentially dealing with all Java/XML issues. In accomplishing this goal, it places less emphasis on preventing incorrect application behavior than JDOM.
DOM4J is a very, very excellent Java XML API, with excellent performance, powerful functions and extremely easy to use characteristics, and it is also an open source software. Nowadays, you can see that more and more Java software is using DOM4J to read and write XML. It is particularly worth mentioning that even Sun's JAXM is also using DOM4J.
 

  • JDOM analysis

JDOM(Java-based Document Object Model)

[Advantages]
① Use concrete classes instead of interfaces, simplifying the DOM API.
②A large number of Java collection classes are used, which is convenient for Java developers.
[Disadvantages]
①There is no better flexibility.
②Poor performance.

The purpose of JDOM is to become a Java-specific document model, which simplifies the interaction with XML and is faster than using DOM. Because it is the first Java-specific model, JDOM has been vigorously promoted and promoted. It is being considered through the "Java specification request JSR-102 to eventually use it as a "Java standard extension". JDOM development has been started since the beginning of 2000.
JDOM and DOM are mainly different in two ways. First, JDOM only uses specific classes and does not use it. Interface. This simplifies the API in some ways, but also limits flexibility. Second, the API makes extensive use of the Collections class, simplifying the use of those Java developers who are already familiar with these classes. The JDOM document states that its purpose is to "use 20% (or less) effort to solve 80% (or more) Java/XML problems" (assumed to be 20% based on the learning curve). JDOM is certainly useful for most Java/XML applications, and Most developers find the API to be much easier to understand than the DOM. JDOM also includes quite extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do something beyond the basics. The work (or even understanding the error in some cases). This may be
more meaningful work than learning DOM or JDOM interface.
JDOM itself does not contain a parser. It usually uses SAX2 parser to parse and validate the input XML document (Although it can also take the previously constructed DOM representation as input.) It contains some converters to output the JDOM representation as a SAX2 event stream, DOM model or XML text document. JDOM is released under a variant of the Apache license. Source code.

 

package xml;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.dom4j.Document;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

/**
 * 解析 XML
 * XML 解析有两种方式:SAX ,DOM
 * SAX:simple api for xml ,解析XML的简单API,特点是内存占用少,速度快,
 * 但是由于逐行扫描形式解析,对整体结构没有把控,不能修改XML内容
 * 
 * DOM:document object model, 文档对象模型。特点是解析XML时会将
 * XML结构内建成一颗树然后通过遍历树的形式解析xml内容,由于对
 * 整体结构有把控,可以修改XML内容。但是由于会内建整棵树,因此内存占用多,
 * 速度慢。
 * W3C推荐的解析形式为DOM|
 * 
 * DOM4J  dom for java  
 * 
 * o2o to
 * p2p  p to p
 */
public class ParseXmlDemo {
	
	public static void main(String[] args) {
		/*
		 *使用dom 解析XML 的大致步骤
		 *1:创建SAXReader 
		 *2:使用SAXReader 读取到XML 文档并生成Document对象(内建树过程)
		 *3:通过Document 对象获取根元素
		 *4:通过根元素开始按照XML的结构逐级获取子元素以达到遍历XML的目的 
		 */
		
		try {
			//1
			SAXReader reader= new SAXReader();
			//2
			Document doc =reader.read(new File("./emplist.xml"));
			/*
			 * 3: 获取根标签
			 * Element 类的每一个实例用于表示XML文档中的一个元素(
			 * 一对标签)通过Element 可以获取其表示的这对标签的相关信息
			 * 常用操作有:
			 * String getName()
			 * 获取当前标签名字
			 * 
			 * String getText()
			 * 获取当前标签中间的文本
			 * 
			 * Element element(String name)
			 * 获取当前标签下指定名字的字标签
			 * 
			 * List elements()
			 * 获取当前标签下所有字标签
			 * 
			 * List elements(String name)
			 * 获取当前标签下所有同名字标签(指定的名字)
			 * 
			 */
			//<list>标签
			Element root = doc.getRootElement();
			System.out.println(root.getName());//输出跟标签名字
			/*
			 *将emplist.xml文件中所有的员工信息读取出来并存入一个集合 
			 */
			List<Emp> empList = new ArrayList<>();
			
			//从<list>标签下获取所有<emp>标签
			List<Element> list=root.elements("emp");
			System.out.println(list.size());
			//遍历每一个<emp>标签
			for(Element empEle: list){
				//获取该员工的名字
				//1先获取<emp>标签下名为<name>的子标签
				Element nameEle = empEle.element("name");
				String name = nameEle.getTextTrim();
				
				//获取年龄
				int age = Integer.parseInt(
						empEle.elementText("age")
						);
				
				//获取性别
				String gender = empEle.elementText("gender");
				
				//获取工资
				int salary = Integer.parseInt(
						empEle.elementText("salary")
						);
				//attributeValue方法用来获取当前标签下指定名字的属性对应的值
				//获取<emp>标签中的属性
				int id = Integer.parseInt(
						empEle.attributeValue("id")
						);
				Emp emp = new Emp(id, name, age, gender, salary);
				empList.add(emp);
			}
			System.out.println("解析完成!");
			for(Emp e:empList){
				System.out.println(e);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		
	}
	
}

 

Guess you like

Origin blog.csdn.net/c202003/article/details/107242503