No-code visual open source crawler software EasySpider, I hope it can help everyone

Software Introduction

EasySpider is a visual crawler software that allows everyone to use a graphical interface, design and execute crawler tasks visually without code. You only need to select the content you want to crawl on the webpage and operate according to the prompt box to complete the crawler design and execution. At the same time, the software can also make API calls in the form of Web services, so that it can be easily embedded into other systems.

The following is a sample interface:

Related Links

code repository

Github warehouse address, welcome to Star:

EasySpider Githubhttps://xn--github-9e0c.com/NaiboWang/EasySpider

Download EasySpider

Go to the Releases Page to download the latest version:

EasySpider download address https://github.com/NaiboWang/EasySpider/releases

video tutorial

Bilibili/B station video tutorial:

Visual crawler EasySpider: an open source free software that can design a crawler visually in a few minutes without writing code

Visual crawler EasySpider: How to visually crawl websites that require login to crawl without code

Visual crawler EasySpider: How to crawl websites that require verification codes

Flowchart Execution Logic Analysis - 58 City Listings Collection Case https://www.bilibili.com/video/BV1YL411z7uW

MacOS system design and implementation of eBay website crawler task tutorial https://www.bilibili.com/video/BV1WL411h71r

document

Please temporarily translate the English document: Wiki of EasySpider , or read the author's master's thesis (mainly see Chapters 3 and 5): Design and Implementation of Intelligent Service Packaging System for WEB Applications https://github.com/NaiboWang /EasySpider/blob/master/Docs/%E9%9D%A2%E5%90%91WEB%E5%BA%94%E7%94%A8%E7%9A%84%E6%99%BA%E8%83% BD%E5%8C%96%E6%9C%8D%E5%8A%A1%E5%B0%81%E8%A3%85%E7%B3%BB%E7%BB%9F%E8%AE%BE% E8%AE%A1%E4%B8%8E%E5%AE%9E%E7%8E%B0.pdf

Related honors and publications

1. The author himself completed the master's thesis of Zhejiang University and obtained a master's degree through this software.

2. Obtained the Chinese national invention patent authorization, and the author is the first inventor.

3. Received by CCF A top conference WWW 2023: https://dl.acm.org/doi/abs/10.1145/3543873.3587345

4. Reposted and publicized by Internet celebrity V "Aikeke-Love Life" with 816,000 fans on Weibo: https://s.weibo.com/weibo?q=easyspider

The poster just came back from WWW 2023 in the United States. At that time, many people were interested in the software. Here is the live poster:

Why use EasySpider

Compared with other visual crawler software, EasySpider has the following advantages:

1. The code is open source, so secondary development is possible.

2. It is completely free, unlike the "free" software such as Octopus, EasySpider is a software that requires no login, unlimited multi-opening, and unlimited machine deployment, and does not need to pay a penny to the author himself. (Of course, EasySpider is protected by patents, so if you want to use it commercially, please contact Tiandao Patent Office of Zhejiang University). In contrast, other free software has many restrictions, you can see their price details page for details.

3. Safe, all information is completely stored locally on the user, including tasks and collected data, so there is no need to worry about data leakage.

4. Cross-platform: supports Windows, Linux and MacOS at the same time.

5. The speed is fast. Usually, a crawler task can be designed and completed in only 2-5 minutes, and the collection speed is also fast, which usually depends on the specific machine environment.

6. It is more flexible and saves more browser configuration information. The most important thing is that it can be expanded and various plug-ins can be installed freely, such as verification code recognition plug-ins. The following plug-ins are recommended to identify verification codes:

From a demand-oriented point of view, crawlers are a basic need. We often need to crawl some online information. For example, for scientific researchers, crawling Wikipedia corpus for training is what students who do NLP often do; doing social networking Students who analyze often need to crawl information from Twitter and Weibo; students who work on recommendation systems will crawl information from shopping websites and so on. There are a lot of reptiles in the market, so I won’t go into details here. With EasySpider, no matter whether you know how to write crawlers before, you can now write code without bothering.

Software-related screenshots

These pictures are from my master's thesis, here are only pictures, what are these pictures for, please read my master's thesis, because it is too long:

Design and Implementation of Intelligent Service Packaging System for WEB Applications https://github.com/NaiboWang/EasySpider/blob/master/Docs/%E9%9D%A2%E5%90%91WEB%E5%BA%94%E7 %94%A8%E7%9A%84%E6%99%BA%E8%83%BD%E5%8C%96%E6%9C%8D%E5%8A%A1%E5%B0%81%E8%A3 %85%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1%E4%B8%8E%E5%AE%9E%E7%8E%B0.pdf

Technology Exchange

Since all the algorithm design, code implementation and document writing of EasySpider are done by me alone, the project is definitely not as perfect as a team writing together, and I think there are many functions that I want to develop, but I have more than enough energy, so there must be a lot to improve The place. Since the code is all public, you can modify and add new functions after fork. You are also welcome to submit PRs to make the functions of this software more perfect and build a beautiful open source community together. For the details of the algorithm involved in the software, you can read the host's master's thesis, which is very detailed:

Design and Implementation of Intelligent Service Packaging System for WEB Applications https://github.com/NaiboWang/EasySpider/blob/master/Docs/%E9%9D%A2%E5%90%91WEB%E5%BA%94%E7 %94%A8%E7%9A%84%E6%99%BA%E8%83%BD%E5%8C%96%E6%9C%8D%E5%8A%A1%E5%B0%81%E8%A3 %85%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1%E4%B8%8E%E5%AE%9E%E7%8E%B0.pdf

For the specific technologies used in software development, such as chrome extension development, the use of websocket, the ElectronJS cross-platform framework, etc., you can study my writing after downloading the code. I believe my code writing is by no means the best. Even at that time, because I wanted to graduate quickly, I just wanted to write a usable demo, so it can be said that it was a bit rough, such as too strong coupling, not modular enough, etc., so there is still a lot of room for improvement. Comments and suggestions are welcome.

For juniors who are new to CS, this project is also a good example, because from the perspective of development, this project includes modules such as front-end development, background development, database operation, and browser extension development; from the perspective of algorithm Generally speaking, this project includes algorithm skills such as depth first, breadth first, data structure, graph, compilation principle, recursion, etc. If you want to learn, maybe you can learn some knowledge from the source code of this project. Finally, I sincerely hope that the software can help everyone!

Guess you like

Origin blog.csdn.net/qq_20028731/article/details/130660004