Click " Python crawler and data mining " above to follow
Reply to " Books " to receive a total of 10 e-books of Python from beginner to advanced
now
day
Chickens
soup
He stored 10,000 books in his belly, and refused to bow his head in the grass.
Preface
Hi, everyone, meet again, I'm a Python advanced person, don't talk nonsense, just start the liver, Ori here!
Crawler management renderings
Dependent package
file:requirements.txt
The content of the file is posted directly here:
appdirs==1.4.4
APScheduler==3.5.1
attrs==20.1.0
Automat==20.2.0
beautifulsoup4==4.9.1
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
constantly==15.1.0
cryptography==3.0
cssselect==1.1.0
Django==1.11.29
django-apscheduler==0.3.0
django-cors-headers==3.2.0
djangorestframework==3.9.2
furl==2.1.0
gerapy==0.9.5
gevent==20.6.2
greenlet==0.4.16
hyperlink==20.0.1
idna==2.10
incremental==17.5.0
itemadapter==0.1.0
itemloaders==1.0.2
Jinja2==2.10.1
jmespath==0.10.0
lxml==4.5.2
MarkupSafe==1.1.1
orderedmultidict==1.0.1
parsel==1.6.0
Protego==0.1.16
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
PyDispatcher==2.0.5
pyee==7.0.2
PyHamcrest==2.0.2
pymongo==3.11.0
PyMySQL==0.10.0
pyOpenSSL==19.1.0
pyppeteer==0.2.2
pyquery==1.4.1
python-scrapyd-api==2.1.2
pytz==2020.1
pywin32==228
queuelib==1.5.0
redis==3.5.3
requests==2.24.0
Scrapy==1.8.0
scrapy-redis==0.6.8
scrapy-splash==0.7.2
scrapyd==1.2.1
scrapyd-client==1.1.0
service-identity==18.1.0
six==1.15.0
soupsieve==2.0.1
tqdm==4.48.2
Twisted==20.3.0
tzlocal==2.1
urllib3==1.25.10
w3lib==1.22.0
websocket==0.2.1
websockets==8.1
wincertstore==0.2
zope.event==4.4
zope.interface==5.1.0
project files
project files:qiushi.zip
Realization function: embarrassing encyclopedia paragraph crawler,
This is the Scrapy
project, the dependency package is as above
Run project steps
After installing the dependency package and decompressing the project file,
pip install -r requirements.txt
Excuting an order
scrapy crawl duanzi --nolog
Configure Scrapyd
It can be understood that it Scrapyd
is a Scrapy
project that manages the project we wrote. After configuring this, you can control the crawler through commands such as run , pause, etc.
I won’t talk about other things, this one is not used much, all we need to do is to start it.
Start Scrapyd service
Switch to the
qiushi
crawler project directory, theScrapy
crawler project needs to enter the crawler directory to execute the command
Excuting an order
scrapyd
The browser input
http://127.0.0.1:6800/
, the following picture appears to be correct
Package Scrapy and upload it to Scrapyd
These are just a start Scrapyd
, but will not Scrapy
deploy the project to Scrapy
the need to configure the following Scrapy
in the scrapy.cfg
file
The configuration is as follows
Packing command
scrapyd-deploy <部署名> -p <项目名>
This sample command
scrapyd-deploy qb -p qiushi
As shown in the figure, the following picture appears to indicate success
Note: There may be problems in the process, I will put the solution later!!!
Back to the browser again, there will be one more item qiushi
, so far, the Scrapyd
configuration has been completed
Configure Gerapy
After the above configuration is complete, you can configure Gerapy. In fact, Scrapyd has far more functions than the above, but it is a command operation, so it is not friendly.
Gerapy's visual crawler management framework needs to be Scrapyd
started when used and hung in the background. In fact, the essence is still to Scrapyd
send requests to the service, but it is just a visual operation.
Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapy-Redis, Scrapyd-API, Scrapy-Splash, Jinjia2, Django, Vue.js development
Configuration steps
Gerapy
It Scrapy
has nothing to do with, so you can choose a folder at will, here I created a gerapyDemo folder
Execute command to initialize gerpay
gerapy init
Will generate a gerapy folder
Enter the generated gerapy folder
Executing the command will generate a table
gerapy migrate
Start the gerapy service, the default is port 8000, you can specify the port to start
gerapy runserver gerapy runserver 127.0.0.1:9000 本机 9000端口启动
Open the browser, input
http://127.0.0.1:8000/
, the following interface appears to indicate success
Of course, under normal circumstances, it is probably such an interface, we need to generate an account password
Stop the service, enter the command gerapy creatsuperuser
, follow the prompts to create an account password and log in with the account
Add a crawler project in Gerapy
After all the above are configured, we can configure the crawler project. Through a little bit of way, you can run the crawler.
Click 主机管理-->创建
, ip is the host of the Scrapyd service, the port is the port of Scrapyd, the default is 6800, fill in and click create
Then in the host list, scheduling, you can run the crawler
Run crawler
Get the result, the result has been written to the local
Package crawler upload
In the above process, we can only play crawlers, but it is not thorough. According to the truth, we still have a packaging process. Only when the crawlers are packaged can it be truly combined.
step
First, you need to copy the crawler project to the projects folder under gerapy
Refresh the page, click Project Management, you can see that the configurable and packaging are in the status of x
Click Deploy, write a description, and click Package
Back to the main interface again, you can find that the packaging is already correct
At this point, basically the entire process ends.
Solve scrapyd-deploy is not an internal and external command
Normally, when scrapyd-deploy is executed, it will prompt that it scrapyd-deploy
is not an internal or external command, um... this is a normal operation
Resolution steps
Find the new and two files
Python
under the interpreterScripts
scrapy.bat
scrapyd-deploy.bat
Modify these two files, the content is as follows
scrapy.bat
@echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapy %*
scrapyd-deploy.bat
@echo off
D:\programFiles\miniconda3\envs\hy_spider\python D:\programFiles\miniconda3\envs\hy_spider\Scripts\scrapyd-deploy %*
Note: The red box indicates the position of the interpreter. The above content is one line. How I paste it is two lines..., just one-to-one correspondence.
Gerapy use process summary
1.gerapy init 初始化,会在文件夹下创建一个gerapy文件夹
2.cd gerapy
3.gerapy migrate
4.gerapy runserver 默认是127.0.0.1:8000
5.gerapy createsuperuser 创建账号密码,默认情况下都是没有的
6.游览器输入127.0.0.1:8000 登录账号密码,进入主页
7.各种操作,比如添加主机,打包项目,定时任务等
to sum up
The above is solved in an introductory way and arranges the following how to Gerapy + Scrpyd + Scrapy
deploy crawlers through visualization.
If there is a task problem during the operation, remember to leave a message below, we will see that the problem will be solved as soon as possible.
I am a programmer on Monday, if you think it is not bad, remember to give it a thumbs up, thank you for watching.
If you think the article is okay, remember to like and leave a comment to support us. Thank you for reading. If you have any questions, please remember to leave a message below~
If you want to learn more about Python, you can refer to the learning website: http://pdcfighting.com/, click to read the original text, you can go straight to it~
------------------- End -------------------
Recommendations of previous wonderful articles:
Teach you how to make a simple novel reader with Python
An article summarizes common operations on time in the Python library
Inventory of 5 personalized voice methods based on Python
Welcome everyone to like , leave a message, forward, reprint, thank you for your company and support
If you want to join the Python learning group, please reply in the background [ Enter the group ]
Thousands of rivers and mountains are always in love, can you click [ Looking ]
/Today's message topic/
Just say a word or two~~