How to parse texts outside XML tag?

Darsh :

I want to parse texts that appear in XML file but outside XML tags. in the attached example I would like to parse only the texts that is outside of p tag, such as "FIELD OF THE TECHNOLOGY" and "DETAILED DESCRIPTION OF THE TECHNOLOGY".

An example of my XML file is:

<description>                        
FIELD OF THE TECHNOLOGY
<p>The present technology is directed ....</p>
<p>The present invention is.....</p>
<p>One promising approach has ...,</p>


DETAILED DESCRIPTION OF THE TECHNOLOGY
<p>The present tech provides, ....</p>
<p>A report by Kearse et al.,...</p>
</description>

kjhughes :

Terminology

In your example, the description element has mixed content. You're looking to extract the text node children of the description element. Identifying the right terminology is the first step to searching for answers (and narrowing overly broad questions).

Parsing XML

...with Java in general

...with mixed content:

...choosing parsing technology:

You can find many tutorials on choosing a parsing technology, but XPath is particularly well-suited for selecting parts of an XML document, and there are libraries available for most languages.

...via XPath, for example:

This XPath,

//description/text()

will select all immediate text node children from the description element. It will not include the p elements or descendents thereof, as requested.