1. Common parsing methods of XML
Common methods of parsing XML mainly include DOM and SAX
1. DOM parsing method - based on document tree
DOM, the Document Object Model (Document Object Model), parses the XML document into a tree-like model and puts it into memory to complete the parsing work, and then the operations on the document are completed on this tree-like model. This in-memory document tree will be several times the actual size of the document.
2. SAX parsing method - event-driven
Namely XML Simple API for XML, read through the entire document, generate events according to the content of the document, and hand over the processing of these events to the event handler.
3. Comparison of DOM and SAX parsing methods
SAX | JUDGMENT |
Sequentially read documents and generate corresponding events, can process XML documents of any size | Creates the document tree in memory, not suitable for processing large XML documents. |
Documents can only be parsed in sequence once, and random access to documents is not supported. | Access any part of the document tree at will, with no limit on the number of times. |
XML document content can only be read, not modified | The document tree can be modified at will, thereby modifying the XML document. |
The development is more complicated, and you need to implement the event handler yourself. | Easy to understand and easy to develop. |
More flexibility for developers to create their own XML object model with SAX. | The document tree has been created on top of the DOM. |
2. Parsing XML in Java
Sun provides the java API for XML Parsing (JAXP) interface to use SAX and DOM, through JAXP, we can use any JAXP-compatible XML parser.
1. Basic class and XML to be parsed
<?xml version="1.0" encoding="UTF-8"?> <MemInfo class="0501"> <person no="1"> <name>James</name> <age>32</age> </person> <person no="2"> <name>Kim</name> <age>38</age> </person> <person no="3"> <name>Joe</name> <age>24</age> </person> </MemInfo>
public class ClassInfo { private String no; private List<Person> students; public String getNo() { return no; } public void setNo(String no) { this.no = no; } public List<Person> getStudents() { return students; } public void setStudents(List<Person> students) { this.students = students; } }
public class Person { private String no; private String name; private byte age; public String getNo() { return no; } public void setNo(String no) { this.no = no; } public String getName() { return name; } public void setName(String name) { this.name = name; } public byte getAge() { return age; } public void setAge(byte age) { this.age = age; } }
2. SAX analysis
import java.util.ArrayList; import org.apache.logging.log4j.LogManager; import org.apache.logging.log4j.Logger; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; import com.alibaba.fastjson.JSONObject; /** * SAX parser */ public class MemInfoParser extends DefaultHandler { /** * log4j logs */ protected static Logger log = LogManager.getLogger(); private ClassInfo cls; private Person person; /** * */ private String preTag; /** * Document start calling */ @Override public void startDocument() throws SAXException { cls = new ClassInfo(); cls.setStudents(new ArrayList<>()); } /** * document end call */ @Override public void endDocument() throws SAXException { log.info("Data obtained by parsing: " + JSONObject.toJSONString(cls)); } /** * Element processing start call - multiple times */ @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { switch (qName) { case "MemInfo": cls.setNo(attributes.getValue("class")); break; case "person": person = new Person(); person.setNo(attributes.getValue("no")); break; default: break; } preTag = qName; } /** * Element processing end call - multiple times */ @Override public void endElement(String uri, String localName, String qName) throws SAXException { switch (qName) { case "MemInfo": break; case "person": cls.getStudents().add(person); person = null; break; default: break; } preTag = null; } /** * Handle TextNode text node calls - multiple times */ @Override public void characters(char[] ch, int start, int length) throws SAXException { //preTag is empty, indicating that the processing is a blank text node, discard it, PS: the blank part between elements will be processed by the SAX parser as a text node, such as the blank between person and name nodes if (preTag == null) return; // text content String text = new String(ch, start, length); switch (preTag) { case "name": person.setName(text); break; case "age": person.setAge (Byte.parseByte (text)); break; default: break; } } }
Test class:
import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.XMLReader; public class SaxParserTest { public static void main(String[] args) throws Exception { String path = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/sax/MemInfo.xml"; // create a parsing factory SAXParserFactory factory = SAXParserFactory.newInstance(); // create parser SAXParser parser = factory.newSAXParser(); // get the reader XMLReader reader = parser.getXMLReader(); // set the content handler MemInfoParser handler = new MemInfoParser(); reader.setContentHandler(handler); // read xml document reader.parse(path); } }
3. DOM way to read and write
import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; import com.alibaba.fastjson.JSONObject; /** * XML parsing - Dom implementation */ public class MemInfoParser { public static void main(String[] args) { String path = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/MemInfo.xml"; System.out.println(JSONObject.toJSONString(parser(path))); } public static ClassInfo parser(String docPath) { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // Get the DocumentBuilder instance from DocumentBuilderFactory DocumentBuilder db; try { // Get the DOM document instance from the XML document db = dbf.newDocumentBuilder(); Document doc = db.parse(new File(docPath)); /* * Create related objects to store XML data */ ClassInfo cls = new ClassInfo(); List<Person> stuList = new ArrayList<>(); cls.setStudents (stuList); // Get the class information in the document node cls.setNo(doc.getDocumentElement().getAttribute("class")); /* * Get all student nodes and traverse to get data */ NodeList stuNodes = doc.getElementsByTagName("person"); int len = stuNodes.getLength(); for (int i = 0; i < len; i++) { Element stu = (Element) stuNodes.item(i); Node eltName = stu.getElementsByTagName("name").item(0); Node eltAge = stu.getElementsByTagName("age").item(0); Person person = new Person(); person.setName(eltName.getFirstChild().getNodeValue()); person.setNo(stu.getAttribute("no")); person.setAge(Byte.parseByte(eltAge.getFirstChild().getNodeValue())); stuList.add(person); } return cls; } catch (ParserConfigurationException e) { e.printStackTrace (); } catch (SAXException e) { e.printStackTrace (); } catch (IOException e) { e.printStackTrace (); } return null; } }
/** * XML generation - dom way */ public class MemInfoBuilder { /** * log4j2 logs */ protected static Logger log = LogManager.getLogger(); public static void main(String[] args) { String xmlPath = "/data/workspace/tec-demo/src/main/java/cn/tinyf/demo/xml/dom/dom-data.xml"; // DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // Get the DocumentBuilder instance from DocumentBuilderFactory DocumentBuilder db; try { // Get the DOM document instance from the XML document db = dbf.newDocumentBuilder(); Document doc = db.newDocument(); /* * Generate document tree */ // root node Element root = doc.createElement("MemInfo"); // set root node properties root.setAttribute("class", "0501"); // Add child node data to the root node root.appendChild(createStuElement(doc, "1", "James", 32)); root.appendChild(createStuElement(doc, "2", "Kim", 38)); root.appendChild(createStuElement(doc, "3", "Joe", 24)); // add the root node to the document tree doc.appendChild(root); /* * Prepare to generate files */ // Set standalone in the XML declaration to yes, that is, there is no dtd and schema as the XML description document, and this attribute is not displayed doc.setXmlStandalone(true); // Create TransformerFactory object TransformerFactory tff = TransformerFactory.newInstance(); // Create Transformer object Transformer tf = tff.newTransformer (); // tf.setOutputProperty(OutputKeys.INDENT, "yes"); // output to file tf.transform(new DOMSource(doc), new StreamResult(new FileOutputStream(xmlPath))); } catch (ParserConfigurationException | FileNotFoundException | TransformerException e) { log.error(e); } } private static Element createStuElement(Document doc, String no, String name, int age) { Element stuElem = doc.createElement("person"); stuElem.setAttribute("no", no); //create name node Element nameElem = doc.createElement("name"); nameElem.appendChild(doc.createTextNode(name)); //create age node Element ageElem = doc.createElement("age"); ageElem.appendChild(doc.createTextNode(age + "")); //Add the name and age nodes to the student node and return stuElem.appendChild(nameElem); stuElem.appendChild(ageElem); return stuElem; } }
3. Other parsers
1. JDOM
JDOM is an open source project, which is based on a tree structure and uses pure JAVA technology to parse, generate, serialize and perform various operations on XML documents.
Jdom can work with existing XML technologies such as Simple API for XML (SAX) and Document Object Model (DOM).
2.dom4j
dom4j is an open source Java XML API, an upgrade of jdom, used to read and write XML files. dom4j is a very good Java XML API with excellent performance, powerful functions and extremely easy-to-use features. Its performance exceeds the official dom technology of sun company.