Crawlers and crawler steps

1. What is a crawler?

A crawler is to write a piece of code to allow the computer to imitate humans and automatically access the website.

2. The role of crawlers

Crawlers can automatically collect and organize data information on the Internet on behalf of people.

For example, you can crawl the prices of Lijiang inns during the National Day.

For example, you can grab tickets, download pictures, documents, videos, etc. in batches.

3. Crawler steps

[Crawler is usually divided into 3 steps]

  1. Get web page

  2. Parse web pages

  3. Storing data

1-1

【Remark】

The picture comes from "Fengbian Programming".

3.1 Get web pages

Obtaining a web page means obtaining web page information.

In web crawler technology, what is obtained here is the web page source code.

3.2 Parse web pages

Parsing web pages refers to extracting the desired data from the web page source code.

Since the structure of web pages has certain rules, we can efficiently extract web page data from it with some third-party libraries of Python.

Parsing a web page is to extract the data we want from the source code of the web page through a certain method.

These data may be links to videos, download addresses for images, titles of files, etc.

3.3 Storing data

Storing data means storing the acquired data in some form (text, pictures, videos, etc.).

Storage is actually a process of writing and saving.

Write the extracted data into csv, Word, Excel or database, or save it to a folder, etc.

4. Summary

at last:

[For those who want to learn crawlers, I have compiled a lot of Python learning materials and uploaded them to the CSDN official. Friends in need can scan the QR code below to obtain them]

1. Study Outline

Insert image description here

2. Development tools

Insert image description here

3. Python basic materials

Insert image description here

4. Practical data

Insert image description here

Guess you like

Origin blog.csdn.net/Z987421/article/details/133313638