What are the ways to parse XML data?

Last time we talked about the four ways of parsing JSON, so this time we will take a look at the four ways of parsing XML.

Four ways of analysis

  • DOM analysis
  • SAX analysis
  • JDOM analysis
  • DOM4J analysis

Case practice

DOM analysis

DOM (Document Object Model, document object model), in the application, the DOM-based XML parser converts an XML document into a collection of object models (usually called DOM tree ), the application is through the object model The operation to realize the operation of XML document data. XML itself appears in the form of a tree, so when the DOM is manipulated, it will also be transformed in the form of a chapter tree. In the entire DOM tree, the largest place refers to Document, which represents a document in which there is only one root node.

Note: When using DOM operations, each text area is also a node, called a text node.

Core operation interface

There are the following four core operation interfaces in DOM parsing:

Document : This interface represents the entire XML document. It represents the root of the entire DOM tree. It provides an entry point for accessing and operating data in the document. All element content in the XML file can be accessed through the Document node.

Node : This interface plays a pivotal role in the entire DOM tree. A large part of the core interface of DOM operation is inherited from the Node interface. For example: Interfaces such as Document and Element. In the DOM tree, each Node interface represents a node in the DOM tree.

NodeList : This interface represents a collection of nodes, which is generally used to represent a group of nodes in an orderly relationship, for example: The child nodes of a node will directly affect the NodeList collection when the document changes.

NamedNodeMap : This interface represents a one-to-one correspondence between a set of nodes and their unique names. This interface is mainly used for the representation of attribute nodes.

DOM parsing process

If a program needs to perform DOM parsing and reading operations, it also needs to follow the steps below:

① 建立 DocumentBuilderFactory : DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
② 建立 DocumentBuilder: DocumentBuilder builder = factory.newDocumentBuilder();
③ 建立 Document : Document doc = builder.parse(“要解析的文件路径”);
④ 建立 NodeList : NodeList nl = doc.getElementsByTagName(“读取节点”);
⑤ 进行 XML 信息读取

SAX analysis

SAX (Simple API for XML) parsing is parsed step by step in the order of xml files. SAX does not have an official standards organization. It does not belong to any standards organization or group, nor does it belong to any company or individual, but provides a computer technology used by anyone.

SAX (Simple API for XML, a simple interface for manipulating XML), unlike DOM operations, SAX uses a sequential mode for access, which is a way to quickly read XML data. When the SAX parser is used for operation, a series of things will be triggered. When the scan reaches the beginning and end of the document (document), the beginning and end of the element (element), the relevant processing methods will be called, and these operation methods will make corresponding actions. Operate until the end of the entire document scan.

If you want to achieve this kind of SAX parsing, you must first build a SAX parser.

// 1、创建解析器工厂
SAXParserFactory factory = SAXParserFactory.newInstance();
// 2、获得解析器
SAXParser parser = factory.newSAXParser();
// SAX 解析器 ,继承 DefaultHandler
String path = new File("resource/demo01.xml").getAbsolutePath();
// 解析  
parser.parse(path, new MySaxHandler());

JDOM analysis

In the XML operation standards provided by W3C itself, DOM and SAX, but from a development perspective, DOM and SAX have their own characteristics. DOM can be modified, but it is not suitable for reading large files, while SAX can read large files. But it cannot be modified by itself. The so-called JDOM = Modifiable DOM + SAX to read large files. JDOM itself is a free and open source component that can be downloaded directly from www.jdom.org.

Common classes of JDOM manipulation xml:

Document: Represents the entire xml document, which is a tree structure

Eelment: Represents an xml element and provides methods to manipulate its sub-elements, such as text, attributes, and namespaces

Attribute: indicates the attributes contained in the element

Text: Represents xml text information

XMLOutputter: xml output stream, the bottom layer is realized through JDK middle stream

Format: Provide settings such as encoding, style and layout of xml file output

We found that the output operation of JDOM is much more convenient than the traditional DOM, and it is more intuitive, including the output is very easy. What is observed at this time is JDOM's support for DOM parsing, but it is also said that JDOM itself also supports the characteristics of SAX; therefore, SAX can be used for parsing operations.

// 获取 SAX 解析器
SAXBuilder builder = new SAXBuilder();
File file = new File("resource/demo01.xml");
// 获取文档
Document doc = builder.build(new File(file.getAbsolutePath()));  
// 获取根节点  
Element root = doc.getRootElement();  
System.out.println(root.getName());
// 获取根节点下所有的子节点, 也可以根据标签名称获取指定的直接点
List<Element> list = root.getChildren();
System.out.println(list.size());
for(int x = 0; x<list.size(); x++){
    Element e = list.get(x);  
    // 获取元素的名称和里面的文本
    String name = e.getName();
    System.out.println(name + "=" + e.getText());
    System.out.println("==================");
}

DOM4J analysis

dom4j is a simple open source library for processing XML, XPath and XSLT. It is based on the Java platform, uses the Java collection framework, and fully integrates DOM, SAX and JAXP. Download path:

http://www.dom4j.org/dom4j-1.6.1/

http://sourceforge.net/projects/dom4j

DOM4J is a free XML open source component like JDOM, but because the technology is used more in current development frameworks, such as Hibernate, Spring, etc., all use DOM4J, so as an introduction, you can have an understanding of this component. There is no good or bad, general frameworks use DOM4J more, and if we usually use JDOM, it is more common. It can be found that DOM4J has played a lot of new features, such as the output format can be very good.

File file = new File("resource/outputdom4j.xml");
SAXReader reader = new SAXReader();
// 读取文件作为文档
Document doc = reader.read(file);
// 获取文档的根元素
Element root = doc.getRootElement();
// 根据跟元素找到全部的子节点
Iterator<Element> iter = root.elementIterator();
while(iter.hasNext()){
    Element name = iter.next();
    System.out.println("value = " + name.getText());
}

Extension ~ Creation of XML

DOM creation

If you want to generate an XML file, you should use the newDocument() method when creating a document

If you want to output the DOM document, it is more troublesome. Write multiple copies at once

public static void createXml() throws Exception{  
    //获取解析器工厂  
    DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();  
    //获取解析器  
    DocumentBuilder builder=factory.newDocumentBuilder();  
    //创建文档  
    Document doc=builder.newDocument();  
    //创建元素、设置关系  
    Element root=doc.createElement("people");  
    Element person=doc.createElement("person");  
    Element name=doc.createElement("name");  
    Element age=doc.createElement("age");  
    name.appendChild(doc.createTextNode("lebyte"));  
    age.appendChild(doc.createTextNode("10"));  
    doc.appendChild(root);  
    root.appendChild(person);  
    person.appendChild(name);  
    person.appendChild(age);  
    //写出去  
    // 获得变压器工厂  
    TransformerFactory tsf=TransformerFactory.newInstance();  
    Transformer ts=tsf.newTransformer();  
    //设置编码  
    ts.setOutputProperty(OutputKeys.ENCODING, "UTF-8");  
    //创建带有 DOM 节点的新输入源,充当转换 Source 树的持有者  
    DOMSource source=new DOMSource(doc);  
    //充当转换结果的持有者  
    File file=new File("src/output.xml");  
    StreamResult result=new StreamResult(file);  
    ts.transform(source, result);  
} 

SAX creation

//创建一个SAXtransformerfactory对象
SAXTransformerFactory stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
try {
    //通过SAXTransformerFactory对象创建一个TransfomerHandler对象
    TransformerHandler handler = stf.newTransformerHandler();
    //通过transformerHandler对象创建一个transformer对象
    Transformer tf = handler.getTransformer();
    //设置Transfomer对象的属性
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    //创建一个Result的对象,将其与handler关联
    File file = new File("src/output.xml");
    if(!file.exists()){
        file.createNewFile();
    }
    Result result = new StreamResult(new FileOutputStream(file));
    handler.setResult(result);
    //通过Handler编写XML的内容         
    //打开Document 
    handler.startDocument();
    AttributesImpl attr = new AttributesImpl();
    //创建根节点bookstore
    handler.startElement("", "", "bookstore", attr);
    attr.clear();
    attr.addAttribute("", "", "id", "", "1");
    handler.startElement("", "", "book", attr);
    attr.clear();
    handler.startElement("", "", "name", attr);
    handler.characters("颈椎病康复指南".toCharArray(), 0, "颈椎病康复指南".length());
    handler.endElement("","","name");
    //关闭各节点
    handler.endElement("", "", "book");
    handler.endElement("", "", "bookstore");
    handler.endDocument();
} catch (SAXException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (TransformerConfigurationException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

JDOM creation

// 创建节点  
Element person = new Element("person");  
Element name = new Element("name");  
Element age = new Element("age");  
// 创建属性  
Attribute id = new Attribute("id","1");  
// 设置文本  
name.setText("lebyte");  
age.setText("10");  
// 设置关系  
Document doc = new Document(person);  
person.addContent(name);  
name.setAttribute(id);  
person.addContent(age);  
XMLOutputter out = new XMLOutputter();  
File file = new File("resource/outputjdom.xml");  
out.output(doc, new FileOutputStream(file.getAbsoluteFile())); 

DOM4J creation

// 使用 DocumentHelper 来创建 Document 对象  
Document document = DocumentHelper.createDocument();  
// 创建元素并设置关系  
Element person = document.addElement("person");  
Element name = person.addElement("name");   
Element age = person.addElement("age");  
// 设置文本  name.setText("lebyte");  
age.setText("10"); 
// 创建格式化输出器  
OutputFormat of = OutputFormat.createPrettyPrint();  
of.setEncoding("utf-8");  
// 输出到文件  
File file = new File("resource/outputdom4j.xml");  
XMLWriter writer = new XMLWriter(new FileOutputStream(new  File(file.getAbsolutePath())),of);  
// 写出  
writer.write(document);  
writer.flush();  
writer.close(); 

Guess you like

Origin blog.51cto.com/15064873/2571150