XML parsing - three ways of parsing androidXML

Interview question and answer: What are the common ways of xml parsing?


XML is a general data exchange format. It is platform-independent, language-independent and system-independent, which brings great convenience to data integration and interaction. XML is parsed in the same way in different language environments, but the syntax is different.

  There are four ways to parse XML: 1. DOM parsing; 2. SAX parsing; Pull parsing.

  For the following XML files, three methods will be described in detail:

copy code
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
    <book id="1">
        <name>A Song of Ice and Fire</name>
        <author>George Martin</author>
        <year>2014</year>
        <price>89</price>
    </book>
    <book id="2">
        <name>Andersen's Fairy Tales</name>
        <year>2004</year>
        <price>77</price>
        <language>English</language>
    </book>    
</bookstore>
copy code

1. DOM analysis

  The full name of DOM is Document Object Model, which is the Document Object Model . In an application program, a DOM-based XML parser converts an XML document into a collection of object models (usually called a DOM tree) , and the application program implements operations on XML document data through the manipulation of this object model. Through the DOM interface, the application can access any part of the data in the XML document at any time. Therefore, this mechanism using the DOM interface is also called the random access mechanism.

  The DOM interface provides a way to access XML document information through a hierarchical object model that forms a tree of nodes based on the XML document structure . No matter what type of information is described in an XML document, be it tabular data, a list of items, or a document, the model generated using the DOM is in the form of a tree of nodes. That is, the DOM enforces a tree model to access information in XML documents. Since XML is essentially a hierarchical structure, this description method is quite efficient.   The random access method provided by the DOM tree brings great flexibility to application development, and it can arbitrarily control the content of the entire XML document. However, since the DOM parser converts the entire XML document into a DOM tree and stores it in memory , the memory requirement is higher when the document is larger or has a more complex structure . Moreover, traversing a tree with a complex structure is also a time-consuming operation. Therefore, the DOM analyzer has relatively high requirements on machine performance, and the implementation efficiency is not very ideal. However, since the idea of ​​the tree structure adopted by the DOM analyzer is consistent with the structure of the XML document , and in view of the convenience brought by random access, the DOM analyzer still has a wide range of use values.

    advantage:

      1. A tree structure is formed, which is helpful for better understanding and mastery, and the code is easy to write.

      2. During the parsing process, the tree structure is stored in the memory for easy modification.

    shortcoming:

      1、由于文件是一次性读取,所以对内存的耗费比较大

      2、如果XML文件比较大,容易影响解析性能且可能会造成内存溢出。

  以下是解析代码:

copy code
public class DOMTest {
    public static void main(String[] args) {
        //创建一个DocumentBuilderFactory的对象
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        //创建一个DocumentBuilder的对象
        try {
            //创建DocumentBuilder对象
            DocumentBuilder db = dbf.newDocumentBuilder();
            //通过DocumentBuilder对象的parser方法加载books.xml文件到当前项目下
            Document document = db.parse("books.xml");
            //获取所有book节点的集合
            NodeList bookList = document.getElementsByTagName("book");
            //通过nodelist的getLength()方法可以获取bookList的长度
            System.out.println("一共有" + bookList.getLength() + "本书");
            //遍历每一个book节点
            for (int i = 0; i < bookList.getLength(); i++) {
                System.out.println("=================下面开始遍历第" + (i + 1) + "本书的内容=================");
                //通过 item(i)方法 获取一个book节点,nodelist的索引值从0开始
                Node book = bookList.item(i);
                //获取book节点的所有属性集合
                NamedNodeMap attrs = book.getAttributes();
                System.out.println("第 " + (i + 1) + "本书共有" + attrs.getLength() + "个属性");
                //遍历book的属性
                for (int j = 0; j < attrs.getLength(); j++) {
                    //通过item(index)方法获取book节点的某一个属性
                    Node attr = attrs.item(j);
                    //获取属性名
                    System.out.print("属性名:" + attr.getNodeName());
                    //获取属性值
                    System.out.println("--属性值" + attr.getNodeValue());
                }
                //解析book节点的子节点
                NodeList childNodes = book.getChildNodes();
                //遍历childNodes获取每个节点的节点名和节点值
                System.out.println("第" + (i+1) + "本书共有" + 
                childNodes.getLength() + "个子节点");
                for (int k = 0; k < childNodes.getLength(); k++) {
                    //区分出text类型的node以及element类型的node
                    if (childNodes.item(k).getNodeType() == Node.ELEMENT_NODE) {
                        //获取了element类型节点的节点名
                        System.out.print("第" + (k + 1) + "个节点的节点名:" 
                        + childNodes.item(k).getNodeName());
                        //获取了element类型节点的节点值
                        System.out.println("--节点值是:" + childNodes.item(k).getFirstChild().getNodeValue());
                        //System.out.println("--节点值是:" + childNodes.item(k).getTextContent());
                    }
                }
                System.out.println("======================结束遍历第" + (i + 1) + "本书的内容=================");
            }
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }        
    }
}
copy code

二、SAX解析

  SAX的全称是Simple APIs for XML,也即XML简单应用程序接口。与DOM不同,SAX提供的访问模式是一种顺序模式,这是一种快速读写XML数据的方式。当使用SAX分析器对XML文档进行分析时,会触发一系列事件,并激活相应的事件处理函数,应用程序通过这些事件处理函数实现对XML文档的访问,因而SAX接口也被称作事件驱动接口。

    优点:

      1、采用事件驱动模式,对内存耗费比较小。

      2、适用于只处理XML文件中的数据时。

    缺点:

      1、编码比较麻烦。

      2、很难同时访问XML文件中的多处不同数据。

  以下是解析代码:

copy code
public class SAXTest {
    /**
     * @param args
     */
    public static void main(String[] args) {
        //创建一个SAXParserFactory的对象
        SAXParserFactory factory = SAXParserFactory.newInstance();
        //通过factory获取SAXParser实例
        try {
            SAXParser parser = factory.newSAXParser();
            //创建对象SAXParserHandler的实例
            SAXParserHandler handler = new SAXParserHandler();
            parser.parse("books.xml", handler);
            System.out.println("~!~!~!共有" + handler.getBookList().size()
                    + "本书");
            for (Book book : handler.getBookList()) {
                System.out.println(book.getId());
                System.out.println(book.getName());
                System.out.println(book.getAuthor());
                System.out.println(book.getYear());
                System.out.println(book.getPrice());
                System.out.println(book.getLanguage());
                System.out.println("----finish----");
            }
        } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

public class SAXParserHandler extends DefaultHandler {
    String value = null;
    Book book = null;
    private ArrayList<Book> bookList = new ArrayList<Book>();
    public ArrayList<Book> getBookList() {
        return bookList;
    }

    int bookIndex = 0;
    /**
     * 用来标识解析开始
     */
    @Override
    public void startDocument() throws SAXException {
        // TODO Auto-generated method stub
        super.startDocument();
        System.out.println("SAX解析开始");
    }
    
    /**
     * 用来标识解析结束
     */
    @Override
    public void endDocument() throws SAXException {
        // TODO Auto-generated method stub
        super.endDocument();
        System.out.println("SAX解析结束");
    }
    
    /**
     * 解析xml元素
     */
    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        //调用DefaultHandler类的startElement方法
        super.startElement(uri, localName, qName, attributes);
        if (qName.equals("book")) {
            bookIndex++;
            //创建一个book对象
            book = new Book();
            //开始解析book元素的属性
            System.out.println("======================开始遍历某一本书的内容=================");
            //不知道book元素下属性的名称以及个数,如何获取属性名以及属性值
            int num = attributes.getLength();
            for(int i = 0; i < num; i++){
                System.out.print("book元素的第" + (i + 1) +  "个属性名是:"
                        + attributes.getQName(i));
                System.out.println("---属性值是:" + attributes.getValue(i));
                if (attributes.getQName(i).equals("id")) {
                    book.setId(attributes.getValue(i));
                }
            }
        }
        else if (!qName.equals("name") && !qName.equals("bookstore")) {
            System.out.print("节点名是:" + qName + "---");
        }
    }
    
    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        //调用DefaultHandler类的endElement方法
        super.endElement(uri, localName, qName);
        //判断是否针对一本书已经遍历结束
        if (qName.equals("book")) {
            bookList.add(book);
            book = null;
            System.out.println("======================结束遍历某一本书的内容=================");
        }
        else if (qName.equals("name")) {
            book.setName(value);
        }
        else if (qName.equals("author")) {
            book.setAuthor(value);
        }
        else if (qName.equals("year")) {
            book.setYear(value);
        }
        else if (qName.equals("price")) {
            book.setPrice(value);
        }
        else if (qName.equals("language")) {
            book.setLanguage(value);
        }
    }
    
    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // TODO Auto-generated method stub
        super.characters(ch, start, length);
        value = new String(ch, start, length);
        if (!value.trim().equals("")) {
            System.out.println("节点值是:" + value);
        }
    }
}

 三 Pull

Pull内置于Android系统中。也是官方解析布局文件所使用的方式。Pull与SAX有点类似,都提供了类似的事件,如开始元素和结束元素。不同的是,SAX的事件驱动是回调相应方法,需要提供回调的方法,而后在SAX内部自动调用相应的方法。而Pull解析器并没有强制要求提供触发的方法。因为他触发的事件不是一个方法,而是一个数字。它使用方便,效率高

 1         public  List<Student> Xml_pull_parser(){
 2         List<Student> list=null;
 3         XmlPullParser parser= Xml.newPullParser();
 4         try {
 5             parser.setInput(getAssets().open("student.xml"),"UTF-8");
 6            int event_code= parser.getEventType();
 7             Student student=null;
 8             while (event_code!=XmlPullParser.END_DOCUMENT){
 9                 switch (event_code){
10                     case XmlPullParser.START_DOCUMENT:
11                         list=new ArrayList<>();
12                         break;
13                     case  XmlPullParser.START_TAG:
14                         if(parser.getName().equals("student")){
15                             student=new Student();
16                         }
17                         if(student!=null){
18                         if(parser.getName().equals("id")){
19                           //  Log.i(TAG, "Xml_pull_parser: id="+parser.getText());
20                             student.setId( Integer.parseInt(parser.nextText()));
21                         }else if(parser.getName().equals("name")){
22                             student.setName(parser.nextText());
23                         }else if(parser.getName().equals("age")){
24                             student.setAge(Integer.parseInt(parser.nextText()));
25                         }
26                         }
27                         break;
28                     case  XmlPullParser.END_TAG:
29                         if(parser.getName().equals("student")){
30                             list.add(student);
31                             student=null;
32                         }
33                         break;
34                 }
35               event_code=  parser.next();
36 
37             }
38         } catch (XmlPullParserException e) {
39             e.printStackTrace();
40         } catch (IOException e) {
41             e.printStackTrace();
42         }
43         return  list;
44     }

  • Memory usage: SAX and Pull are better than DOM;
  • Programming method: SAX is event-driven. When the corresponding event is triggered, the method programmed by the user will be called. That is, for each type of XML to be parsed, a new processing class suitable for this type of XML must be written. DOM is the specification of W3C, Pull is concise, SAX is more complicated to use than PULL
  • Access and modification: SAX adopts stream parsing and DOM random access.
  • Access method: SAX, Pull parsing method is synchronous, DOM verbatim.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325680692&siteId=291194637