Build a crawler on AWS EC2

Chapter 1, Create an EC2 instance

First, you need to register an AWS account, and find EC2 in the account service.

You can see that there are currently no instances in our account. Click the orange button [Start instance]

The first step is to choose the operating system. I choose Amazon Linux. Click [Next]

But this is not important, you can choose any operating system. As long as you are familiar with the corresponding operating system.

The second step is to choose the operating system. I choose the free one. Of course, this configuration is also very low. The local tyrants are free. Click [Next]

Keep the default for the third, fourth and fifth steps, click Next for all.

Go to step 6. Assign a security group. There is only one user here, so just choose the existing security group. Click [Approve and Start].

Step 7 Click Start, and a dialog box will pop up.

If you haven't created one before, you can create one first. Just enter one key name according to your own ideas. Then save the key file.

success

Chapter 2, Communication with EC2 instance

The local system needs SSH to communicate with EC2. If it is a MAC system, it can be done directly through the command line. If it's windows, it is recommended to make a tool, such as MobaXterm.

 

Open after installation.

 

The DNS is shown in the figure below.

Then you will find, huh, why can't it connect? ? ? ? ? ? ? ? ? ? ?

Let's look at the rules of the security group and verify if there is a rule that allows traffic from your computer to port 22 (SSH) .

Click on the back【default

As you can see, the source here needs to be changed. Click Edit and you can set it according to your needs. For example, if you set it to [My IP ] , your IP will be automatically obtained , and then click Save. Personally think it is also possible to set it to [Any position].

 

And then reconnect

Chapter Three, Environment Deployment

First look at whether the type of the instance is 32-bit or 64-bit,

1. Install Miniconda.

Since my project was completed by python3, the choice is as shown

Transfer the downloaded file to EC2

carried out

bash Miniconda3-latest-Linux-x86_64.sh

Then add conda to environment variables

export PATH=~/miniconda3/bin:$PATH

enter

conda list test whether the installation is successful, if there is a bunch of normal output, then it is installed.

2. Start installing the python-related environment, here you need to configure the environment yourself, and I use several commonly used packages.


conda create -n scrapy python=3.7.5

conda install scrapy

conda install beautifulsoup4

conda install lxml

conda install selenium


Install chrom first. 3. If webdriver is used, webdriver must be installed.


wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

sudo yum install google-chrome-stable_current_x86_64.rpm

google-chrome-stable -version

Check the version and find the corresponding webdriver

http://chromedriver.storage.googleapis.com/index.html


Copy the driver to ec2, remember to combine the driver with your crawler

4. Copy the project to the directory of the virtual machine. Just run it.

 

Guess you like

Origin blog.csdn.net/Kangyucheng/article/details/106666528