Determining if at leaf node with SAX parser

Paul Reiners :

Using org.xml.sax.helpers.DefaultHandler, can you determine whether you're at a leaf node within endElement(String, String, String)?

Or do you need to use a DOM parser to determine this?

GhostCat salutes Monica C. :

Let's start with some basic definitions:

An XML document is an ordered, labeled tree. Each node of the tree is an XML element and is written with an opening and closing tag.

( from here ). The great part about that: it means that XML files have a very regular, simple structure. For example, the definition of leaf node is just that: a node that doesn't have any children.

Now: that endElement() method is invoked whenever a SAX parser encounters a closing tag of a node. Assuming that your XML has valid content, that also means that the parser gave you a corresponding startElement() call before!

In other words: all the information you need to determine if you are "ending" a leaf node are available to you:

  • you were told which elements are "started"
  • you are told which elements end

Take this example:

<outer>
  <inner/>
</outer>

This will lead to such a sequence of events/callbacks:

  • event: start element outer
  • event: start element inner
  • event: end element inner
  • event: end element outer

So, "obviously", when your parser remembers the history of events, determining which of inner or outer is a leaf node is straight forward!

Thus, the answer is: no, you don't need a DOM parser. In the end, the DOM is constructed from the very same information anyway! If the DOM parser can deduce the "scope" of objects, so can your SAX parser.

But just for the record: you still need to carefully implement your data structures that keep track of "started", "open" and "ended" tags, for example to correctly determine that this one:

<outer> <inner> <inner/> </inner> </outer>

represents two non-leafs (outer and the first inner), and one leaf node (the inner inner).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=36347&siteId=1