Web crawler platform implemented by java language

Overview

Crawler platform
A web crawler platform implemented in java language, which defines the crawler process in a graphical manner, and a crawler can be implemented without code.

The main function

Features
1. Support css selector, regular extraction
2. Support JSON/XML format
3. Support Xpath/JsonPath extraction
4. Support multiple data sources, SQL select/insert/update/delete
5. Support crawling JS dynamically rendered pages
6. Support proxy
7. Support binary format
8. Support save/read files (csv, xls, jpg, etc.)
9. Commonly used string, date, file, encryption and decryption, random functions
, etc. 10. Support process nesting
11. Support Plug-in extensions (custom executors, custom functions, custom Controllers, type extensions, etc.)
12. Support HTTP interface

Installation and deployment

1. Prepare the environment

1. Install JDK
2. Install MYSQl database server, it is recommended to use version 5.7
3. Install maven3.0 service

Two, run the project

1. Go to the code cloud download page (https://gitee.com/jmxd/spider-flow) to download and unzip to the working directory
2. Set the Eclipse warehouse, menu Window->Preferences->Maven->User Settings->User Settings behind Browse, then import the settings.xml file in the conf directory of your own Maven directory, then click Apply, click OK
3. Import to Eclipse, menu file->Import, then select Maven->Existing Maven Projects, click Next> Button, select the working directory, and then click the Finish button to import successfully
4. Import the database, basic table: spider-flow/db/spiderflow.sql
5. Open and run org.spiderflow.SpiderApplication.java
6. Open the browser, Enter (http://localhost:8088/)

Three, the introduction of plug-ins

1. First download the required plug-in to the local and import it into the workspace or install it to the maven library
2. Introduce the plug-in in spider-flow/spider-flow-web/pom.xml

Reader benefits

Thank you for seeing here!
I have compiled a lot of 2021 latest Java interview questions (including answers) and Java study notes here, as shown below
Insert picture description here

The answers to the above interview questions are organized into document notes. As well as interviews also compiled some information on some of the manufacturers & interview Zhenti latest 2021 collection (both documenting a small portion of the screenshot) free for everyone to share, in need can click to enter signal: CSDN! Free to share~

If you like this article, please forward it and like it.

Remember to follow me!

Guess you like

Origin blog.csdn.net/weixin_49527334/article/details/114546591