jsoup 1.13.1 has been released , notable improvements include: parsing faster than 1.12.x has been significantly improved, the selector add new features, fix the problem Mark Invalid abnormalities, as well as many other improvements occur.
jsoup best Java HTML parser ( sweet potato authentication ), it is best to use a method HTML5 DOM and CSS selectors, provides a very convenient API for data extraction and processing. Feel the code below:
Document doc = Jsoup.connect("https://en.wikipedia.org/").get();
log(doc.title());
Elements newsHeadlines = doc.select("#mp-itn b a");
for (Element headline : newsHeadlines) {
log("%s\n\t%s",
headline.attr("title"), headline.absUrl("href"));
}
The above first code fetch the Wikipedia page , parses it into the DOM, and then select the heading "In the news" and fill it to the area of use Elements headline object class initialization. ( Online example , the complete code )
Download: https://jsoup.org/download
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
1.13.1 noteworthy improvements
- The new
Element.closest()
method, it will search the tree to find and select the most closely match the elements - Optimize memory, the
Document
permanent memory is reduced by about 39% of memory allocated to a decrease of about 9%
1 only when the element has attributes, will be in theElement
creationAttributes
Holder
2. given only when the DOM treebaseUri
is provided to a new value, before the track elementsbaseUri
3. after parsing, does notDocument.parser
retain the input character reader (and associated buffer) in - Compared with 1.12.x, parsing speed has been substantial improvement
- Remove the old methods and classes are marked as deprecated the old version
- Increase
Element.select(Evaluator)
andElement.selectFirst(Evaluator)
method that allows in the case of multiple use of the same evaluator reuse parsed CSS selectors
For more updates View https://jsoup.org/news/release-1.13.1