java-xml parsing

Label

grammar:

<student></student>  开始标签  标签体内容  结束标签
1)<student/> 或 <student></student> 空标签。没有标签体内容
2)xml标签名称区分大小写。
3)xml标签一定要正确配对。
4)xml标签名中间不能使用空格
5)xml标签名不能以数字开头
6)注意: 在一个xml文档中,有且仅有一个根标签

Attributes

grammar:

<Student name="eric">student</Student>
注意:
1)属性值必须以引号包含,不能省略,也不能单双引号混用!!!
2)一个标签内可以有多个属性,但不能出现重复的属性名!!!

escape character

特殊字符  转义字符
 <         &lt;
 >         &gt;
 "         &quot;
&         &amp;
空格      &nsbp;

Documentation Statement

grammar:

CDATA block

Function: Allows some content that needs to contain special characters to be output as it is

xml parsing method

DOM parsing

DOM parsing principle:

1)JAXP (oracle-Sun公司官方)
2)JDOM工具(非官方)
3)Dom4J工具(非官方)
    三大框架(默认读取xml的工具就是Dom4j)

DOM parsing principle: The xml parser loads the entire xml document into the memory at one time, and then builds a Document object tree in the memory, obtains the node objects on the tree through the Document object, and accesses (operates) the xml document through the node object. Content.

SAX parsing

SAX analysis principle:

1)Sax解析工具(oracle-sun公司官方)

Dom4j Tools

//1.创建一个xml解析器对象
SAXReader reader = new SAXReader();
//2.读取xml文档,返回Document对象
Document doc = reader.read(new File("./src/contact.xml"));
//2.nodeIterator: 得到当前节点下的所有子节点对象(不包含孙以下的节点)
    Iterator<Node> it = doc.nodeIterator();
    while(it.hasNext()){//判断是否有下一个元素
        Node node = it.next();//取出元素
        String name = node.getName();//得到节点名称
        //System.out.println(name);

        //System.out.println(node.getClass());
        //继续取出其下面的子节点
        //只有标签节点才有子节点
        //判断当前节点是否是标签节点
        if(node instanceof Element){
            Element elem = (Element)node;
            Iterator<Node> it2 = elem.nodeIterator();
            while(it2.hasNext()){
                Node n2 = it2.next();
                System.out.println(n2.getName());
            }
        }
    }

/**
 * 遍历xml文档的所有节点
 * @throws Exception
 */
@Test
public void test2() throws Exception{
    //1.读取xml文档,返回Document对象
    SAXReader reader = new SAXReader();
    Document doc = reader.read(new File("./src/contact.xml"));

    //得到根标签
    Element rooElem = doc.getRootElement();

    getChildNodes(rooElem);

}

/**
 * 获取 传入的标签下的所有子节点(含孙节点等)
 * @param elem
 */
private void getChildNodes(Element elem){
    System.out.println(elem.getName());

    //得到子节点
    Iterator<Node> it = elem.nodeIterator();
    while(it.hasNext()){
        Node node = it.next();

        //1.判断是否是标签节点
        if(node instanceof Element){
            Element el = (Element)node;
            //递归
            getChildNodes(el);
        }
    };
}

/**
 * 获取标签
 */
@Test
public void test3() throws Exception{
    //1.读取xml文档,返回Document对象
    SAXReader reader = new SAXReader();
    Document doc = reader.read(new File("./src/contact.xml"));

    //2.得到根标签
    Element  rootElem = doc.getRootElement();
    //得到标签名称
    String name = rootElem.getName();
    System.out.println(name);

    //3.得到当前标签下指定名称的第一个子标签
    /*
    Element contactElem = rootElem.element("contact");
    System.out.println(contactElem.getName());
    */

    //4.得到当前标签下指定名称的所有子标签
    /*
    Iterator<Element> it = rootElem.elementIterator("contact");
    while(it.hasNext()){
        Element elem = it.next();
        System.out.println(elem.getName());
    }
    */

    //5.得到当前标签下的的所有子标签
    List<Element> list = rootElem.elements();
    //遍历List的方法
    //1)传统for循环  2)增强for循环 3)迭代器
    /*for(int i=0;i<list.size();i++){
        Element e = list.get(i);
        System.out.println(e.getName());
    }*/

/*  for(Element e:list){
        System.out.println(e.getName());
    }*/
    /*
    Iterator<Element> it = list.iterator(); //ctrl+2 松开 l
    while(it.hasNext()){
        Element elem = it.next();
        System.out.println(elem.getName());
    }*/

    //获取更深层次的标签(方法只能一层层地获取)
    Element nameElem = doc.getRootElement().
                element("contact").element("name");
    System.out.println(nameElem.getName());

}

/**
 * 获取属性
 */
@Test
public void test4() throws Exception{
    //1.读取xml文档,返回Document对象
    SAXReader reader = new SAXReader();
    Document doc = reader.read(new File("./src/contact.xml"));

    //获取属性:(先获的属性所在的标签对象,然后才能获取属性)
    //1.得到标签对象
    Element contactElem = doc.getRootElement().element("contact");
    //2.得到属性
    //2.1  得到指定名称的属性值
    /*
    String idValue = contactElem.attributeValue("id");
    System.out.println(idValue);
    */

    //2.2 得到指定属性名称的属性对象
    /*Attribute idAttr = contactElem.attribute("id");
    //getName: 属性名称    getValue:属性值
    System.out.println(idAttr.getName() +"=" + idAttr.getValue());*/

    //2.3 得到所有属性对象,返回LIst集合
    /*List<Attribute> list = contactElem.attributes();
    //遍历属性
    for (Attribute attr : list) {
        System.out.println(attr.getName()+"="+attr.getValue());
    }*/

    //2.4 得到所有属性对象,返回迭代器
    Iterator<Attribute> it = contactElem.attributeIterator();
    while(it.hasNext()){
        Attribute attr = it.next();
        System.out.println(attr.getName()+"="+attr.getValue());
    }

}

/**
 * 获取文本
 */
@Test
public void test5() throws Exception{
    //1.读取xml文档,返回Document对象
    SAXReader reader = new SAXReader();

    Document doc = reader.read(new File("./src/contact.xml"));


    /**
     * 注意: 空格和换行也是xml的内容
     */
    String content = doc.getRootElement().  ();
    System.out.println(content);


    //获取文本(先获取标签,再获取标签上的文本)
    Element nameELem = 
        doc.getRootElement().element("contact").element("name");
    //1. 得到文本
    String text = nameELem.getText();
    System.out.println(text);

    //2. 得到指定子标签名的文本内容
    String text2 = 
        doc.getRootElement().element("contact").elementText("phone");
    System.out.println(text2);

}

Dom4j modify xml document

Document doc = new SAXReader().read(new File("./src/contact.xml"));
    //指定文件输出的位置
    FileOutputStream out = new FileOutputStream("e:/contact.xml");
    /**
     * 1.指定写出的格式
     */
    OutputFormat format = OutputFormat.createCompactFormat(); //紧凑的格式.去除空格换行.项目上线的时候
    //OutputFormat format = OutputFormat.createPrettyPrint(); //漂亮的格式.有空格和换行.开发调试的时候
    /**
     * 2.指定生成的xml文档的编码
     *    同时影响了xml文档保存时的编码  和  xml文档声明的encoding的编码(xml解析时的编码)
     *    结论: 使用该方法生成的xml文档避免中文乱码问题。
     */
    format.setEncoding("utf-8");


    //1.创建写出对象
    XMLWriter writer = new XMLWriter(out,format);

    //2.写出对象
    writer.write(doc);
    //3.关闭流
    writer.close();

Increase:

    DocumentHelper.createDocument()  增加文档
                addElement("名称")  增加标签
                addAttribute("名称",“值”)  增加属性

Revise:

                Attribute.setValue("值")  修改属性值
                Element.addAtribute("同名的属性名","值")  修改同名的属性值
                Element.setText("内容")  修改文本内容

delete:

                Element.detach();  删除标签  
                Attribute.detach();  删除属性


/**
 * 增加:文档,标签 ,属性
 */
@Test
public void test1() throws Exception{
    /**
     * 1.创建文档
     */
    Document doc = DocumentHelper.createDocument();
    /**
     * 2.增加标签
     */
    Element rootElem = doc.addElement("contactList");
    //doc.addElement("contactList");
    Element contactElem = rootElem.addElement("contact");
    contactElem.addElement("name");
    /**
     * 3.增加属性
     */
    contactElem.addAttribute("id", "001");
    contactElem.addAttribute("name", "eric");

    //把修改后的Document对象写出到xml文档中
    FileOutputStream out = new FileOutputStream("e:/contact.xml");
    OutputFormat format = OutputFormat.createPrettyPrint();
    format.setEncoding("utf-8");
    XMLWriter writer = new XMLWriter(out,format);
    writer.write(doc);
    writer.close();
}

/**
 * 修改:属性值,文本
 * @throws Exception
 */
@Test
public void test2() throws Exception{
    Document doc = new SAXReader().read(new File("./src/contact.xml"));

    /**
     * 方案一: 修改属性值   1.得到标签对象 2.得到属性对象 3.修改属性值
     */
    //1.1  得到标签对象
    /*
    Element contactElem = doc.getRootElement().element("contact");
    //1.2 得到属性对象
    Attribute idAttr = contactElem.attribute("id");
    //1.3 修改属性值
    idAttr.setValue("003");
    */
    /**
     * 方案二: 修改属性值
     */
    //1.1  得到标签对象
    /*
    Element contactElem = doc.getRootElement().element("contact");
    //1.2 通过增加同名属性的方法,修改属性值
    contactElem.addAttribute("id", "004");
    */

    /**
     * 修改文本 1.得到标签对象 2.修改文本
     */
    Element nameElem = doc.getRootElement().
        element("contact").element("name");
    nameElem.setText("李四");



    FileOutputStream out = new FileOutputStream("e:/contact.xml");
    OutputFormat format = OutputFormat.createPrettyPrint();
    format.setEncoding("utf-8");
    XMLWriter writer = new XMLWriter(out,format);
    writer.write(doc);
    writer.close();
}


/**
 * 删除:标签,属性
 * @throws Exception
 */
@Test
public void test3() throws Exception{
    Document doc = new SAXReader().read(new File("./src/contact.xml"));

    /**
     * 1.删除标签     1.1 得到标签对象  1.2 删除标签对象    
     */
    // 1.1 得到标签对象
    /*
    Element ageElem = doc.getRootElement().element("contact")
                .element("age");

    //1.2 删除标签对象
    ageElem.detach();
    //ageElem.getParent().remove(ageElem);
    */
    /**
     * 2.删除属性   2.1得到属性对象  2.2 删除属性
     */
    //2.1得到属性对象
    //得到第二个contact标签
    Element contactElem = (Element)doc.getRootElement().
        elements().get(1);
    //2.2 得到属性对象
    Attribute idAttr = contactElem.attribute("id");
    //2.3 删除属性
    idAttr.detach();
    //idAttr.getParent().remove(idAttr);

    FileOutputStream out = new FileOutputStream("e:/contact.xml");
    OutputFormat format = OutputFormat.createPrettyPrint();
    format.setEncoding("utf-8");
    XMLWriter writer = new XMLWriter(out,format);
    writer.write(doc);
    writer.close();
}

xpath

Function: It is mainly used to quickly obtain the required node objects.

xpath method

List selectNodes("xpath expression"); Query multiple node objects

Node selectSingleNode("xpath expression"); Query a node object

xPath syntax

/ Absolute path means starting from the root of the xml or a child element (a hierarchy)

// Relative paths represent selection elements without any hierarchy.

*      通配符         表示匹配所有元素

[] Condition indicates the element under which condition to select

@Attribute means select attribute node

The and relation represents a conditional AND relation (equivalent to &&)

text() text means selecting text content

/**
     * 需求: 删除id值为2的学生标签
     */
    Document doc = new SAXReader().read(new File("e:/student.xml"));

    //1.查询id为2的学生标签
    //使用xpath技术
    Element stuElem = (Element)doc.selectSingleNode("//Student[@id='2']");

    //2.删除标签
    stuElem.detach();

    //3.写出xml文件
    FileOutputStream out = new FileOutputStream("e:/student.xml");
    OutputFormat format = OutputFormat.createPrettyPrint();
    format.setEncoding("utf-8");
    XMLWriter writer = new XMLWriter(out,format);
    writer.write(doc);
    writer.close();

Document doc = new SAXReader().read(new File("./src/contact.xml"));

    String xpath = "";

    /**
     * 1.   /      绝对路径      表示从xml的根位置开始或子元素(一个层次结构)
     */
    xpath = "/contactList";
    xpath = "/contactList/contact";

    /**
     * 2. //     相对路径       表示不分任何层次结构的选择元素。
     */
    xpath = "//contact/name";
    xpath = "//name";

    /**
     * 3. *      通配符         表示匹配所有元素
     */
    xpath = "/contactList/*"; //根标签contactList下的所有子标签
    xpath = "/contactList//*";//根标签contactList下的所有标签(不分层次结构)

    /**
     * 4. []      条件           表示选择什么条件下的元素
     */
    //带有id属性的contact标签
    xpath = "//contact[@id]";
    //第二个的contact标签
    xpath = "//contact[2]";
    //选择最后一个contact标签
    xpath = "//contact[last()]";

    /**
     * 5. @     属性            表示选择属性节点
     */
    xpath = "//@id"; //选择id属性节点对象,返回的是Attribute对象
    xpath = "//contact[not(@id)]";//选择不包含id属性的contact标签节点
    xpath = "//contact[@id='002']";//选择id属性值为002的contact标签
    xpath = "//contact[@id='001' and @name='eric']";//选择id属性值为001,且name属性为eric的contact标签

    /**
     *6.  text()   表示选择文本内容
     */
    //选择name标签下的文本内容,返回Text对象
    xpath = "//name/text()";
    xpath = "//contact/name[text()='张三']";//选择姓名为张三的name标签


    List<Node> list = doc.selectNodes(xpath);
    for (Node node : list) {
        System.out.println(node);
    }

sax parsing

DOM parsing principle: load the xml document into the memory at one time, and then build the Document tree in the memory. More memory requirements.

Disadvantages: It is not suitable for reading large-capacity xml files, which can easily lead to memory overflow.

SAX parsing principle: load a little, read a little, process a little. The memory requirements are relatively low.

sax parsing tool

SAX parsing tool - provided by Sun. Built in jdk. org.xml.sax.*

Core API: SAXParser class: used to read and parse xml file objects

parse(File f, DefaultHandler dh) method: Parse the xml file

参数一: File:表示 读取的xml文件。
参数二: DefaultHandler: SAX事件处理程序。使用DefaultHandler的子类

public static void main(String[] args)throws Exception {
    //1.创建SAXParser
    SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    //2.读取xml文件
    MyDefaultHandler2 handler = new MyDefaultHandler2();
    parser.parse(new File("./src/contact.xml"), handler);
    String content = handler.getContent();
    System.out.println(content);
}

public class MyDefaultHandler2 extends DefaultHandler {
//存储xml文档信息
private StringBuffer sb = new StringBuffer();

//获取xml信息
public String getContent(){
    return sb.toString();
}


/**
 * 开始标签
 * * @param qName: 表示开始标签的标签名
 * 
 * @param attributes: 表示开始标签内包含的属性列表
 */
@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    sb.append("<"+qName);
    //判断是否有属性
    if(attributes!=null){
        for(int i=0;i<attributes.getLength();i++){
            //得到属性名称
            String attrName = attributes.getQName(i);
            //得到属性值
            String attrValue = attributes.getValue(i);
            sb.append(" "+attrName+"=\""+attrValue+"\"");
        }
    }
    sb.append(">");
}

/**
 * 文本内容
 *  @param ch: 表示当前读完的所有文本内容
 * @param start: 表示当前文本内容的开始位置
 * @param length: 表示当前文本内容的长度
 */
@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    //得到当前读取的文本
    String content = new String(ch,start,length);
    sb.append(content);
}

/**
 * 结束标签
 * @param qName: 结束标签的标签名称
 */
@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {
    sb.append("</"+qName+">");
}       

API of DefaultHandler class:

void startDocument()  :  在读到文档开始时调用
void endDocument()  :在读到文档结束时调用
void startElement(String uri, String localName, String qName, Attributes attributes)  :读到开始标签时调用                
void endElement(String uri, String localName, String qName)   :读到结束标签时调用
void characters(char[] ch, int start, int length)  : 读到文本内容时调用

DOM parsing vs SAX parsing

DOM parsing

  1. Principle: One-time loading of xml documents, not suitable for large-capacity file reading
  2. DOM parsing can be arbitrarily added, deleted and changed to
  3. DOM parsing reads data anywhere, even back
  4. DOM parsing object-oriented programming methods (Node, Element, Attribute), Java developers coding is relatively simple.

SAX parsing

  1. Principle: Load a little, read a little, process a little. Suitable for reading large-capacity files
  2. SAX parsing can only read
  3. SAX parsing can only be read from top to bottom, in order, and cannot be read back
  4. SAX parsing an event-based programming approach. Java development coding is relatively complex.

Encapsulate xml content into objects

public class MyDefaultHandler3 extends DefaultHandler {
//存储所有联系人对象
private List<Contact> list = new ArrayList<Contact>();

public List<Contact> getList(){
    return list;
}
//保存一个联系人信息
private Contact contact;
/**
 * 思路: 
 *  1)创建Contact对象
 *  2)把每个contact标签内容存入到Contact对象
 *  3)把Contact对象放入List中
 */
//用于临时存储当前读到的标签名
private String curTag;

@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    curTag = qName;
    //读取到contact的开始标签创建Contact对象
    if("contact".equals(qName)){
        contact = new Contact();

        //设置id值
        contact.setId(attributes.getValue("id"));
    }
}

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    //当前文本内容
    String content = new String(ch,start,length);

    if("name".equals(curTag)){
        contact.setName(content);
    }

    if("age".equals(curTag)){
        contact.setAge(content);
    }

    if("phone".equals(curTag)){
        contact.setPhone(content);
    }

    if("email".equals(curTag)){
        contact.setEmail(content);
    }

    if("qq".equals(curTag)){
        contact.setQq(content);
    }
}

@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {
    //设置空时为了避免空格换行设置到对象的属性中
    curTag = null;
    //读到contact的结束标签放入List中
    if("contact".equals(qName)){
        list.add(contact);
    }
}
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324703372&siteId=291194637