1. Configure the environment

The version of Alibaba Cloud is 2.7.5, so a 3.6.4 environment is newly installed with pyenv. After installation, you can use pyenv global 3.6.4 to use the 3.6.4 environment. I personally like this way. influence.

As shown below:

Next, according to Dacai's article, pip install gerapy can be done, and there is no problem in this step. Students who have questions can submit an issue to Dacai.

2. Start the service

First go to Alibaba Cloud's background to set up a security group, mine is like this:

Then go to the command window to release ports 8000 and 6800.

Then execute

gerapy init

cd gerapy

gerapy migrate

# Pay attention to the next step

gerapy runserver 0.0.0.0:8000 [If you are locally, you can execute gerapy runserver. If you are on Alibaba Cloud, you should change it to the previous execution]

Now visit in the browser: ip:8000 should be able to see the main interface

For the meaning of each of them, see the article of Dacai.

3. Create a project

Create a new scrapy crawler in the projects under gerapy, here I do the simplest:

scrapy startproject gerapy_test

cd gerapy_test

scrapy genspider baidu www.baidu.com

This is the simplest crawler, modify ROBOTSTXT_OBEY=False in a settings.py, and then modify baidu.py under a spider, here is optional, what I set here is the response.url returned by the output

4. Install scrapyd

pip install scrapyd

After installation, execute the command line

scrapyd

Then open ip:6800 in the browser. If you do not modify the configuration, it should not be opened here. When the clients are configured there, it should also be displayed as an error, like this:

Later, I found the reason and found that scrapyd also opens 127.0.0.1 by default.

So at this time, we need to change the configuration. For details, please refer to here . I modified it like this:

vim ~/.scrapyd.conf

[scrapyd]
bind_address = 0.0.0.0

After refreshing, you will see that the previous error has become normal

5. Packaging, deployment, scheduling

These steps are detailed in the articles of great talents. After packaging and deployment, after entering the scheduling interface of clients, click the run button to run the crawler.

Test Gerapy on Alibaba Cloud Tutorial

You can see the output result.

6. Conclusion

It is recommended that you try to use it, it is very convenient, I just used it very briefly here.

Sesame HTTP: Testing Gerapy Tutorial on Alibaba Cloud