Build Scrapyd Service
Check whether the installation systemd
CentOS 7 server
[root@VM_0_6_centos ~]# yum install systemd
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
epel | 5.3 kB 00:00:00
extras | 2.9 kB 00:00:00
os | 3.6 kB 00:00:00
updates | 2.9 kB 00:00:00
Package systemd-219-67.el7_7.2.x86_64 already installed and latest version
Nothing to do
New scrapyd.service file, and then add some content (requires root privileges) I was taking root account operations.
vim /lib/systemd/system/scrapyd.service
The system might not installed by default vim, install or use vi, etc. can be.
Add Content:
[Unit]
Description=scrapyd
After=network.target
Documentation=http://scrapyd.readthedocs.org/en/latest/api.html
[Service]
User=root
ExecStart=/usr/local/bin/scrapyd --logfile /var/scrapyd/scrapyd.log
[Install]
WantedBy=multi-user.target
- [Unit] The first block is typically a block profile, used, and relationships with other configuration Unit Unit of metadata definitions
- After: If the field should be specified Unit After start, you must start before the current service
- Documentation: Documentation server address
- Description: short description
- [Service] Service block for configuration, only the Service Unit have this type of block
- ExecStart: Start a command current services
- [Install]: usually the last block of the configuration file used to define how to start, and whether the boot
- WantedBy: its value is one or more Target, the current Unit activate (enable) will be placed symbolic link / etc / under systemd / system directory name + Target .wants suffix into the subdirectory, whereby we you can start a new service through the command line
Start Service
systemctl start scrapyd
service scrapyd start
Use curl tool to detect scrapyd server status:
[root@VM_0_6_centos ~]# curl http://localhost:6800/daemonstatus.json
{"node_name": "VM_0_6_centos", "status": "ok", "pending": 0, "running": 0, "finished": 1}
You can check the status of the server by the following commands:
[root@VM_0_6_centos ~]# systemctl status scrapyd
● scrapyd.service - scrapyd
Loaded: loaded (/usr/lib/systemd/system/scrapyd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-01-10 22:46:46 CST; 18h ago
Docs: http://scrapyd.readthedocs.org/en/latest/api.html
Main PID: 12072 (scrapyd)
CGroup: /system.slice/scrapyd.service
└─12072 /usr/bin/python3 /usr/local/bin/scrapyd --logfile /var/scr...
Jan 10 22:46:46 VM_0_6_centos systemd[1]: Started scrapyd.
By the following commands let Scrapyd along with the operating system starts
systemctl enable scrapyd
Scrapyd server adds the authentication information
With Nginx, for example, add a layer of reverse proxy in front of Scrapyd to implement user authentication
Install Nginx
yum install nginx
Nginx configuration
vim /etc/nginx/nginx.conf
We add a server at http braces in
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
root /usr/share/nginx/html;
# Load configuration files for the default server block.
include /etc/nginx/default.d/*.conf;
location / {
}
error_page 404 /404.html;
location = /40x.html {
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
}
}
# 以下是新添加的内容
server {
listen 6801;
location / {
proxy_pass http://127.0.0.1:6800;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/conf.d/.htpasswd;
}
}
Here we are listening port 6801, which is accessed through the port reached Scrapyd 6801 of 6800, which is our only exposed port 6801
Switch to the /etc/nginx/conf.d
directory, if this directory does not create a new one, create a user authentication
[root@VM_0_6_centos ~]# htpasswd -c .htpasswd ray
New password:
Re-type new password:
After two enter the password, we successfully created a ray users.
The final step
We have already opened scrapyd service must be stopped out
killall scrapyd
Scrapyd modify configuration files, in order to prevent from the outside can bypass Nginx, direct access to the 6800 port.
Will automatically search configuration Scrapyd startup files, and finally loaded configuration file will overwrite the previous configuration file, the configuration file load order is:
/etc/scrapyd/scrapyd.conf /etc/scrapyd/conf.d/* scrapyd.con ~/.scrapyd.con
Now in addition to the default configuration file is no other configuration files, modify the default configuration file:
vim /etc/scrapyd/scrapyd.conf
amend as below:
Blind_address field must be changed to 127.0.0.1 to prevent bypassing Nginx direct access to port 6800
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 127.0.0.1
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
After configuration is complete, the open Nginx Scrapyd and tested, the following steps:
Open scrapyd Service
service scrapyd start
Nginx open service
Switch to the /etc/nginx
directory and nginx -t
error detection, error-free after the nginx
open service
curl test tool
IP address of the server has done processing
(venv) F:\Crawl>curl http://***.***.***.**:6801
<html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.16.1</center>
</body>
</html>
The above information tells us that requires authentication, all of our configurations have been successful
Try direct access to the port will be 6800 Time out error
(venv) F:\Crawl>curl http://***.***.***.**:6800
curl: (7) Failed to connect to ***.***.***.** port 6800: Timed out
Use curl authentication tool, add parameters -u 用户:密码
to
(venv) F:\Crawl>curl http://***.***.***.**:6801/daemonstatus.json -u ray:*******
{"node_name": "VM_0_6_centos", "status": "ok", "pending": 0, "running": 0, "finished": 0}