dom4j and XML document manipulation

dom4 Profile

1, DOM4J is dom4j.org produced an open source XML parsing package. DOM4J used in the Java platform, using the Java Collections Framework and fully supports DOM, SAX and JAXP.

　　 DOM4J biggest feature is the use of a large number of interfaces. Its main interfaces are defined inside org.dom4j

Attribute	It defines the XML attributes.
Branch	It means a child node can contain. Such as XML elements (Element) and documentation (Docuemnts) defines a common behavior
CDATA	It defines the XML CDATA area
CharacterData	It is a marker interface, based on the node identifier characters. As CDATA, Comment, Text.
Comment	It defines the behavior of XML comments
Document	It defines the XML document
DocumentType	Defined XML DOCTYPE declaration
Element	Custom XML elements
ElementHandler	Element object defines the processor
ElementPath	It is used ElementHandler, to get the current level of information being processed path
Entity	Defined XML entity
Node	Polymorphic behavior is defined as all of the XML node dom4j
NodeFilter	It defines the behavior of a filter or dom4j predicate generated in the node (the predicateA)
ProcessingInstruction	Custom XML processing instructions
Text	Custom XML text node
Visitor	For implementing the Visitor pattern
XPath	After a string analysis will provide an XPath expression

2, the relationship between these interfaces as follows

　　interface java.lang.Cloneable

　　 interface org.dom4j.Node

　　 interface org.dom4j.Attribute

　　 interface org.dom4j.Branch

　　 interface org.dom4j.Document

　　 interface org.dom4j.Element

　　 interface org.dom4j.CharacterData

　　 interface org.dom4j.CDATA

　　 interface org.dom4j.Comment

　　 interface org.dom4j.Text

　　 interface org.dom4j.DocumentType

　　 interface org.dom4j.Entity

　　 interface org.dom4j.ProcessingInstruction

XML documents

1. What is XML?

　　Extensible Markup Language subset of the standard generalized markup language, referred to as XML. Is a marker for an electronic document to have a structured markup language.

　　Expandable means can be custom label, the label must be noted that the presence of <school> </ school> pairs

　　Profile for storing data, the transmission data

2, XML structure　

　　<?? xml version = "1.0 " encoding = "UTF-8"> header must exist
　　xml:? xml declaration is a document
　　version: Version
　　encoding: the encoding format

　　The following is the content header portion

3, xml writing specifications
　　1.xml not case sensitive, but XML capitalization sensitive.
　　2.xml keyword tag can not be used, for example, Version the XML
　　3. properly nested
　　4 can not begin with a digit
　　5. The only one root tag

4, read XML documents

　　The first step: Get Document Object

public static Document load(String filename) {  
    Document document = null;  
    try {  
        SAXReader saxReader = new SAXReader();  
        document = saxReader.read(new File(filename)); // 读取XML文件,获得document对象  
    } catch (Exception ex) {  
        ex.printStackTrace();  
    }  
    return document;  
}  
  
public static Document load(URL url) {  
    Document document = null;  
    try{   
        SAXReader SAXReader = new new SAXReader ();   
        document = saxReader.read (URL); // read the XML file to obtain the document object   
    } the catch (Exception EX) {   
        ex.printStackTrace ();   
    }   
    return document;   
}

　　Step Two: Get the root

Element document.getRootElement root = ();

　　The third step: the root node of the traversal

 for(Iterator it=root.elementIterator();it.hasNext();){      
      Element element = (Element) it.next();      
      // do something      
 }

　　The contents of the access node: a fourth step

String text = element.getText();

5, a number of related methods

　　5.1, Document related objects

　　　　1, reads the XML file, to get the document object.

　　SAXReader reader = new SAXReader();

　　 Document document = reader.read(new File("input.xml"));

　　　　2, parsing XML text form, to obtain the document object.

String text = "<members></members>";

Document document = DocumentHelper.parseText(text);

　　　　3, take the initiative to create a document object.

Document document = DocumentHelper.createDocument();

Element root = document.addElement ( "members"); // Create a root node

　　5.2, node-related

　　　　1. Obtain the root of the document.

　　　　Element rootElm document.getRootElement = ();

　　　　2. Obtain a single child node of a node.

　　　　Element memberElm = root.element ( "member"); // "member" is a node name

　　　　3. To get the text node

　　　　String text=memberElm.getText();

　　　　String text = root.elementText ( "name"); the point is to obtain byte character name below the root node.

　　　　4. Get all the nodes under the specified name and a node traversal.

　　　　List nodes = rootElm.elements("member");

　　　　for (Iterator it = nodes.iterator(); it.hasNext();) {

　　　 Element elm = (Element) it.next();

　　　　// do something

　　　　}

　　　 5. traversal of all child nodes of a node.

　　 for(Iterator it=root.elementIterator();it.hasNext();){

　　 Element element = (Element) it.next();

　　 // do something

　　 }

　　　　6. Add a child node in a node.

　　　　　　Element ageElm = newMemberElm.addElement("age");

　　　　7. Set the text node.

　　　　　　ageElm.setText("29");

　　　　8. To delete a node.

　　　　　　parentElm.remove (childElm); // node childElm is to be deleted, parentElm is its parent

　　　　9. Add a CDATA nodes.

　　　　 Element contentElm = infoElm.addElement("content");

　　　　 contentElm.addCDATA(diary.getContent());

　　5.3, the property-related.

　　　　1. Obtain the specified attribute node

Element document.getRootElement root = ();

Attribute attribute=root.attribute("size"); // 属性名name

　　　　2. Obtain text attributes

　　 String text=attribute.getText();

　　　　String text2=root.element("name").attributeValue("firstname");

　　　　// This is the value of the name attribute bytes firstname points made under the root node.

　　　　3. traverse all the attributes of a node

　　　　Element document.getRootElement root = ();

　　　　for(Iterator it=root.attributeIterator();it.hasNext();){

　　　　　　Attribute attribute = (Attribute) it.next();

　　　　　　String text=attribute.getText();

　　　　　　System.out.println(text);

　　　　}

　　　　4. Set the attributes of the nodes and a text.

　　　　　　newMemberElm.addAttribute("name", "sitinspring");

　　　　5. Set the Text property

　　Attribute attribute=root.attribute("name");

　　 attribute.setText("sitinspring");

　　　　6. Delete a property

　　Attribute attribute=root.attribute("size");// 属性名name

　　root.remove(attribute);

　　5.4, write documents to XML files.

　　　　1. The documents are all in English, do not set encoding, just write.

　　　　XMLWriter writer = new XMLWriter(new FileWriter("output.xml"));

　　　　writer.write(document);

　　　　writer.close();

　　　　2. documents containing Chinese, set the encoding format and then write.

　　　　OutputFormat format = OutputFormat.createPrettyPrint();

　　　　format.setEncoding ( "GBK"); // specify the XML encoding

　　　　XMLWriter writer = new XMLWriter(new FileWriter("output.xml"),format);

　　　　writer.write(document);

　　　　writer.close();

　　5.5, and XML string conversion

　　　　1. string into XML

　　　　String text = "<members> <member>sitinspring</member> </members>";

　　　　Document document = DocumentHelper.parseText(text);

　　　　2. XML document or node into a string.

　　　　SAXReader reader = new SAXReader();

　　　　Document document = reader.read(new File("input.xml"));

　　　　Element document.getRootElement root = ();

　　　　String docXmlText=document.asXML();

　　　　String rootXmlText=root.asXML();

　　　　Element memberElm=root.element("member");

　　　　String memberXmlText=memberElm.asXML();

Xpath

1, using xpath find the need to introduce jaxen-xx-xx.jar

2, the conventional method

　　List list=document.selectNodes("/books/book/@show");

3, Syntax

　　1, to select nodes

　　XPath expressions using the path select nodes in an XML document, or a node along the path to the selected step.

　　Common path expressions:

expression	description
nodename	Select all child nodes of the current node
/	Choose from the root node
//	Selecting from the current node matches the selected node in the document, regardless of their location
.	Select the current node
..	Select the parent of the current node
@	Select Properties

　　Example:

Path expression	result
bookstore	Select all the child nodes of the bookstore element
/bookstore	Select the root element bookstore
bookstore/book	Select all of the bookstore for the book under the name of the child element .
//book	选取所有 book 子元素，而不管它们在文档中的位置。
bookstore//book	选取bookstore 下名字为 book的所有后代元素，而不管它们位于 bookstore 之下的什么位置。
//@lang	选取所有名为 lang 的属性。

　　2、谓语

路径表达式	结果
/bookstore/book[1]	选取属于 bookstore 子元素的第一个 book 元素。
/bookstore/book[last()]	选取属于 bookstore 子元素的最后一个 book 元素。
/bookstore/book[last()-1]	选取属于 bookstore 子元素的倒数第二个 book 元素。
/bookstore/book[position()<3]	选取最前面的两个属于 bookstore 元素的子元素的 book 元素。
//title[@lang]	选取所有拥有名为 lang 的属性的 title 元素。
//title[@lang='eng']	选取所有 title 元素，要求这些元素拥有值为 eng 的 lang 属性。
/bookstore/book[price>35.00]	选取所有 bookstore 元素的 book 元素，要求book元素的子元素 price 元素的值须大于 35.00。
/bookstore/book[price>35.00]/title	选取所有 bookstore 元素中的 book 元素的 title 元素，要求book元素的子元素 price 元素的值须大于 35.00

　　3、选取未知节点

　　　　XPath 通配符可用来选取未知的 XML 元素。

通配符	描述
*	匹配任何元素节点
@*	匹配任何属性节点
node()	匹配任何类型的节点

　　　　实例

路径表达式	结果
/bookstore/*	选取 bookstore 元素的所有子节点
//*	选取文档中的所有元素
//title[@*]	选取所有带有属性的 title 元素。

　　4、选取若干路径

　　　　通过在路径表达式中使用“|”运算符，您可以选取若干个路径。

　　　　实例

路径表达式	结果
//book/title \| //book/price	选取所有 book 元素的 title 和 price 元素。
//title \| //price	选取所有文档中的 title 和 price 元素。
/bookstore/book/title\|//price	选取所有属于 bookstore 元素的 book 元素的title 元素，以及文档中所有的 price 元素。

　　5、XPath 轴

　　　　轴可定义某个相对于当前节点的节点集。

轴名称	结果
ancestor	选取当前节点的所有先辈（父、祖父等）
ancestor-or-self	选取当前节点的所有先辈（父、祖父等）以及当前节点本身
attribute	选取当前节点的所有属性
child	选取当前节点的所有子元素。
descendant	选取当前节点的所有后代元素（子、孙等）。
descendant-or-self	选取当前节点的所有后代元素（子、孙等）以及当前节点本身。
following	选取文档中当前节点的结束标签之后的所有节点。
namespace	选取当前节点的所有命名空间节点
parent	选取当前节点的父节点。
preceding	选取文档中当前节点的开始标签之前的所有节点。
preceding-sibling	选取当前节点之前的所有同级节点。
self	选取当前节点。

　　6、步的语法：轴名称::节点测试[谓语]

　　　　实例

例子	结果
child::book	选取所有属于当前节点的子元素的 book 节点
attribute::lang	选取当前节点的 lang 属性
child::*	选取当前节点的所有子元素
attribute::*	选取当前节点的所有属性
child::text()	选取当前节点的所有文本子节点
child::node()	选取当前节点的所有子节点
descendant::book	选取当前节点的所有 book 后代
ancestor::book	选择当前节点的所有 book 先辈
ancestor-or-self::book	选取当前节点的所有book先辈以及当前节点（假如此节点是book节点的话）
child::*/child::price	选取当前节点的所有 price 孙。

　　7、XPath 运算符

运算符	描述	实例	返回值
\|	计算两个节点集	//book \| //cd	返回所有带有 book 和 ck 元素的节点集
+	加法	6 + 4	10
-	减法	6 - 4	2
*	乘法	6 * 4	24
div	除法	8 div 4	2
=	等于	price=9.80	如果 price 是 9.80，则返回 true。如果 price 是 9.90，则返回 fasle。
!=	不等于	price!=9.80	如果 price 是 9.90，则返回 true。如果 price 是 9.80，则返回 fasle。
<	小于	price<9.80	如果 price 是 9.00，则返回 true。如果 price 是 9.90，则返回 fasle。
<=	小于或等于	price<=9.80	如果 price 是 9.00，则返回 true。如果 price 是 9.90，则返回 fasle。
>	大于	price>9.80	如果 price 是 9.90，则返回 true。如果 price 是 9.80，则返回 fasle。
>=	大于或等于	price>=9.80	如果 price 是 9.90，则返回 true。如果 price 是 9.70，则返回 fasle。
or	或	price=9.80 or price=9.70	如果 price 是 9.80，则返回 true。如果 price 是 9.50，则返回 fasle。
and	与	price>9.00 and price<9.90	如果 price 是 9.80，则返回 true。如果 price 是 8.50，则返回 fasle。
mod	计算除法的余数	5 mod 2	1