Campanula 1.2.0 was released, with new script extraction function

Campanula is a lightweight and efficient crawler tool, with simple configuration and convenient secondary development. It can crawl web pages rendered by js, can crawl any data, supports saving web page snapshots, and is intelligently anti-blocking.

As the first updated version after the epidemic, Campanula has more updated functions this time, and it has added a custom data extraction function through scripts, which makes the extraction function more powerful.

At the same time, in order for ordinary users to better judge the content pages, the content page confirmation mechanism is optimized this time, which can more easily determine which pages of data need to be crawled, and further improve the crawling efficiency.

The contents of this update are as follows:

1 Increase the script extraction strategy, support to extract the data information that meets the requirements from the downloaded web pages through js script
2 Increase the content page matching rules and content page filters, the content page selection method is more flexible
3 Optimize the code style, standardize the code, make It is more in line with Alibaba ’s development protocol.
4 Optimized simulation test interface, added content extraction test, link extraction, web page download, and content page rule test interface.
5 Optimized content page processing strategy and improved content page processing performance
. Clearly into the heart
7 some other optimizations

Source address: https://gitee.com/zhiyubujian/wind-bell
API document: https://apidoc.gitee.com/zhiyubujian/wind-bell/

Guess you like

Origin www.oschina.net/news/114879/wind-bell-1-2-0-released