Scrapyd server set up

Build Scrapyd Service

Check whether the installation systemd

CentOS 7 server

[root@VM_0_6_centos ~]# yum install systemd
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
epel                                                                                       | 5.3 kB  00:00:00
extras                                                                                     | 2.9 kB  00:00:00
os                                                                                         | 3.6 kB  00:00:00
updates                                                                                    | 2.9 kB  00:00:00
Package systemd-219-67.el7_7.2.x86_64 already installed and latest version
Nothing to do

New scrapyd.service file, and then add some content (requires root privileges) I was taking root account operations.

vim /lib/systemd/system/scrapyd.service

The system might not installed by default vim, install or use vi, etc. can be.

Add Content:

[Unit]
Description=scrapyd
After=network.target
Documentation=http://scrapyd.readthedocs.org/en/latest/api.html

[Service]
User=root
ExecStart=/usr/local/bin/scrapyd --logfile /var/scrapyd/scrapyd.log

[Install]
WantedBy=multi-user.target
  • [Unit] The first block is typically a block profile, used, and relationships with other configuration Unit Unit of metadata definitions
  • After: If the field should be specified Unit After start, you must start before the current service
  • Documentation: Documentation server address
  • Description: short description
  • [Service] Service block for configuration, only the Service Unit have this type of block
  • ExecStart: Start a command current services
  • [Install]: usually the last block of the configuration file used to define how to start, and whether the boot
  • WantedBy: its value is one or more Target, the current Unit activate (enable) will be placed symbolic link / etc / under systemd / system directory name + Target .wants suffix into the subdirectory, whereby we you can start a new service through the command line

Start Service

systemctl start scrapyd
service scrapyd start

Use curl tool to detect scrapyd server status:

[root@VM_0_6_centos ~]# curl http://localhost:6800/daemonstatus.json
{"node_name": "VM_0_6_centos", "status": "ok", "pending": 0, "running": 0, "finished": 1}

You can check the status of the server by the following commands:

[root@VM_0_6_centos ~]# systemctl status scrapyd

● scrapyd.service - scrapyd
   Loaded: loaded (/usr/lib/systemd/system/scrapyd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-01-10 22:46:46 CST; 18h ago
     Docs: http://scrapyd.readthedocs.org/en/latest/api.html
 Main PID: 12072 (scrapyd)
   CGroup: /system.slice/scrapyd.service
           └─12072 /usr/bin/python3 /usr/local/bin/scrapyd --logfile /var/scr...

Jan 10 22:46:46 VM_0_6_centos systemd[1]: Started scrapyd.

By the following commands let Scrapyd along with the operating system starts

systemctl enable scrapyd

Scrapyd server adds the authentication information

With Nginx, for example, add a layer of reverse proxy in front of Scrapyd to implement user authentication

Install Nginx

yum install nginx

Nginx configuration

vim /etc/nginx/nginx.conf

We add a server at http braces in

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
        server_name  _;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
        }

        error_page 404 /404.html;
            location = /40x.html {
        }

        error_page 500 502 503 504 /50x.html;
            location = /50x.html {
        }
    }
    # 以下是新添加的内容
    server {
        listen 6801;
        location / {
            proxy_pass http://127.0.0.1:6800;
            auth_basic "Restricted";
            auth_basic_user_file /etc/nginx/conf.d/.htpasswd;
        }
    }

Here we are listening port 6801, which is accessed through the port reached Scrapyd 6801 of 6800, which is our only exposed port 6801

Switch to the /etc/nginx/conf.ddirectory, if this directory does not create a new one, create a user authentication

[root@VM_0_6_centos ~]# htpasswd -c .htpasswd ray
New password:
Re-type new password:

After two enter the password, we successfully created a ray users.

The final step

We have already opened scrapyd service must be stopped out

killall scrapyd

Scrapyd modify configuration files, in order to prevent from the outside can bypass Nginx, direct access to the 6800 port.

Will automatically search configuration Scrapyd startup files, and finally loaded configuration file will overwrite the previous configuration file, the configuration file load order is:

/etc/scrapyd/scrapyd.conf /etc/scrapyd/conf.d/* scrapyd.con ~/.scrapyd.con

Now in addition to the default configuration file is no other configuration files, modify the default configuration file:

vim /etc/scrapyd/scrapyd.conf

amend as below:

Blind_address field must be changed to 127.0.0.1 to prevent bypassing Nginx direct access to port 6800

[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 5
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 127.0.0.1
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

After configuration is complete, the open Nginx Scrapyd and tested, the following steps:

Open scrapyd Service

service scrapyd start

Nginx open service

Switch to the /etc/nginxdirectory and nginx -terror detection, error-free after the nginxopen service

curl test tool

IP address of the server has done processing

(venv) F:\Crawl>curl http://***.***.***.**:6801
<html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.16.1</center>
</body>
</html>

The above information tells us that requires authentication, all of our configurations have been successful

Try direct access to the port will be 6800 Time out error

(venv) F:\Crawl>curl http://***.***.***.**:6800
curl: (7) Failed to connect to ***.***.***.** port 6800: Timed out

Use curl authentication tool, add parameters -u 用户:密码to

(venv) F:\Crawl>curl http://***.***.***.**:6801/daemonstatus.json -u ray:*******
{"node_name": "VM_0_6_centos", "status": "ok", "pending": 0, "running": 0, "finished": 0}

Guess you like

Origin www.cnblogs.com/1328497946TS/p/12180538.html