Java parsing XML

1. Common parsing methods of XML

Common methods of parsing XML mainly include DOM and SAX

1. DOM parsing method - based on document tree

DOM, the Document Object Model (Document Object Model), parses the XML document into a tree-like model and puts it into memory to complete the parsing work, and then the operations on the document are completed on this tree-like model. This in-memory document tree will be several times the actual size of the document.

2. SAX parsing method - event-driven

Namely XML Simple API for XML, read through the entire document, generate events according to the content of the document, and hand over the processing of these events to the event handler.

3. Comparison of DOM and SAX parsing methods

SAX JUDGMENT
Sequentially read documents and generate corresponding events, can process XML documents of any size Creates the document tree in memory, not suitable for processing large XML documents.
Documents can only be parsed in sequence once, and random access to documents is not supported. Access any part of the document tree at will, with no limit on the number of times.
XML document content can only be read, not modified The document tree can be modified at will, thereby modifying the XML document.
The development is more complicated, and you need to implement the event handler yourself. Easy to understand and easy to develop.
More flexibility for developers to create their own XML object model with SAX. The document tree has been created on top of the DOM.

 

2. Parsing XML in Java

Sun provides the java API for XML Parsing (JAXP) interface to use SAX and DOM, through JAXP, we can use any JAXP-compatible XML parser.

1. Basic class and XML to be parsed

<?xml version="1.0" encoding="UTF-8"?>
<MemInfo class="0501">
	<person no="1">
		<name>James</name>
		<age>32</age>
	</person>
	<person no="2">
		<name>Kim</name>
		<age>38</age>
	</person>
	<person no="3">
		<name>Joe</name>
		<age>24</age>
	</person>
</MemInfo>

 

public class ClassInfo {
	private String no;
	private List<Person> students;

	public String getNo() {
		return no;
	}

	public void setNo(String no) {
		this.no = no;
	}

	public List<Person> getStudents() {
		return students;
	}

	public void setStudents(List<Person> students) {
		this.students = students;
	}
}

 

public class Person {
	private String no;
	private String name;
	private byte age;

	public String getNo() {
		return no;
	}

	public void setNo(String no) {
		this.no = no;
	}

	public String getName() {
		return name;
	}

	public void setName(String name) {
		this.name = name;
	}

	public byte getAge() {
		return age;
	}

	public void setAge(byte age) {
		this.age = age;
	}
}

 2. SAX analysis

import java.util.ArrayList;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.alibaba.fastjson.JSONObject;

/**
 * SAX parser
 */
public class MemInfoParser extends DefaultHandler {
	/**
	 * log4j logs
	 */
	protected static Logger log = LogManager.getLogger();

	private ClassInfo cls;
	private Person person;
	/**
	 *
	 */
	private String preTag;

	/**
	 * Document start calling
	 */
	@Override
	public void startDocument() throws SAXException {
		cls = new ClassInfo();
		cls.setStudents(new ArrayList<>());
	}

	/**
	 * document end call
	 */
	@Override
	public void endDocument() throws SAXException {
		log.info("Data obtained by parsing: " + JSONObject.toJSONString(cls));
	}

	/**
	 * Element processing start call - multiple times
	 */
	@Override
	public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
		switch (qName) {
		case "MemInfo":
			cls.setNo(attributes.getValue("class"));
			break;
		case "person":
			person = new Person();
			person.setNo(attributes.getValue("no"));
			break;
		default:
			break;
		}
		preTag = qName;
	}

	/**
	 * Element processing end call - multiple times
	 */
	@Override
	public void endElement(String uri, String localName, String qName) throws SAXException {
		switch (qName) {
		case "MemInfo":
			break;
		case "person":
			cls.getStudents().add(person);
			person = null;
			break;
		default:
			break;
		}
		preTag = null;
	}

	/**
	 * Handle TextNode text node calls - multiple times
	 */
	@Override
	public void characters(char[] ch, int start, int length) throws SAXException {
		//preTag is empty, indicating that the processing is a blank text node, discard it, PS: the blank part between elements will be processed by the SAX parser as a text node, such as the blank between person and name nodes
		if (preTag == null)
			return;
		// text content
		String text = new String(ch, start, length);
		switch (preTag) {
		case "name":
			person.setName(text);
			break;
		case "age":
			person.setAge (Byte.parseByte (text));
			break;
		default:
			break;
		}
	}
}

 Test class:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.XMLReader;

public class SaxParserTest {
	public static void main(String[] args) throws Exception {
		String path = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/sax/MemInfo.xml";
		// create a parsing factory
		SAXParserFactory factory = SAXParserFactory.newInstance();
		// create parser
		SAXParser parser = factory.newSAXParser();
		// get the reader
		XMLReader reader = parser.getXMLReader();
		// set the content handler
		MemInfoParser handler = new MemInfoParser();
		reader.setContentHandler(handler);
		// read xml document
		reader.parse(path);
	}
}

 

3. DOM way to read and write

 

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import com.alibaba.fastjson.JSONObject;

/**
 * XML parsing - Dom implementation
 */
public class MemInfoParser {

	public static void main(String[] args) {
		String path = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/MemInfo.xml";
		System.out.println(JSONObject.toJSONString(parser(path)));
	}

	public static ClassInfo parser(String docPath) {
		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		// Get the DocumentBuilder instance from DocumentBuilderFactory
		DocumentBuilder db;
		try {
			// Get the DOM document instance from the XML document
			db = dbf.newDocumentBuilder();
			Document doc = db.parse(new File(docPath));
			/*
			 * Create related objects to store XML data
			 */
			ClassInfo cls = new ClassInfo();
			List<Person> stuList = new ArrayList<>();
			cls.setStudents (stuList);
			// Get the class information in the document node
			cls.setNo(doc.getDocumentElement().getAttribute("class"));
			/*
			 * Get all student nodes and traverse to get data
			 */
			NodeList stuNodes = doc.getElementsByTagName("person");
			int len = stuNodes.getLength();
			for (int i = 0; i < len; i++) {
				Element stu = (Element) stuNodes.item(i);
				Node eltName = stu.getElementsByTagName("name").item(0);
				Node eltAge = stu.getElementsByTagName("age").item(0);
				Person person = new Person();
				person.setName(eltName.getFirstChild().getNodeValue());
				person.setNo(stu.getAttribute("no"));
				person.setAge(Byte.parseByte(eltAge.getFirstChild().getNodeValue()));
				stuList.add(person);
			}
			return cls;
		} catch (ParserConfigurationException e) {
			e.printStackTrace ();
		} catch (SAXException e) {
			e.printStackTrace ();
		} catch (IOException e) {
			e.printStackTrace ();
		}
		return null;
	}
}
/**
 * XML generation - dom way
 */
public class MemInfoBuilder {
	/**
	 * log4j2 logs
	 */
	protected static Logger log = LogManager.getLogger();

	public static void main(String[] args) {
		String xmlPath = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/dom/dom-data.xml";
		//
		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		// Get the DocumentBuilder instance from DocumentBuilderFactory
		DocumentBuilder db;
		try {
			// Get the DOM document instance from the XML document
			db = dbf.newDocumentBuilder();
			Document doc = db.newDocument();
			/*
			 * Generate document tree
			 */
			// root node
			Element root = doc.createElement("MemInfo");
			// set root node properties
			root.setAttribute("class", "0501");
			// Add child node data to the root node
			root.appendChild(createStuElement(doc, "1", "James", 32));
			root.appendChild(createStuElement(doc, "2", "Kim", 38));
			root.appendChild(createStuElement(doc, "3", "Joe", 24));
			// add the root node to the document tree
			doc.appendChild(root);
			/*
			 * Prepare to generate files
			 */
			// Set standalone in the XML declaration to yes, that is, there is no dtd and schema as the XML description document, and this attribute is not displayed
			doc.setXmlStandalone(true);
			// Create TransformerFactory object
			TransformerFactory tff = TransformerFactory.newInstance();
			// Create Transformer object
			Transformer tf = tff.newTransformer ();
			//
			tf.setOutputProperty(OutputKeys.INDENT, "yes");
			// output to file
			tf.transform(new DOMSource(doc), new StreamResult(new FileOutputStream(xmlPath)));
		} catch (ParserConfigurationException | FileNotFoundException | TransformerException e) {
			log.error(e);
		}
	}

	private static Element createStuElement(Document doc, String no, String name, int age) {
		Element stuElem = doc.createElement("person");
		stuElem.setAttribute("no", no);
		//create name node
		Element nameElem = doc.createElement("name");
		nameElem.appendChild(doc.createTextNode(name));
		//create age node
		Element ageElem = doc.createElement("age");
		ageElem.appendChild(doc.createTextNode(age + ""));
		//Add the name and age nodes to the student node and return
		stuElem.appendChild(nameElem);
		stuElem.appendChild(ageElem);
		return stuElem;
	}
}

 

3. Other parsers

1. JDOM

JDOM is an open source project, which is based on a tree structure and uses pure JAVA technology to parse, generate, serialize and perform various operations on XML documents.

Jdom can work with existing XML technologies such as Simple API for XML (SAX) and Document Object Model (DOM).

Examples>>

 

2.dom4j

dom4j is an open source Java XML API, an upgrade of jdom, used to read and write XML files. dom4j is a very good Java XML API with excellent performance, powerful functions and extremely easy-to-use features. Its performance exceeds the official dom technology of sun company.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326988384&siteId=291194637