How does Xiaobai get started with Python crawlers? This is the most detailed introductory tutorial I have ever seen

This article is aimed at beginners, I will use the simplest case to tell you how to get started with python crawlers!

img

To get started with Python crawlers, you first need to solve four problems

  • Familiar with python programming
  • understand HTML
  • Understand the fundamentals of web crawling
  • Learn to use python crawler library

01 Understand what a crawler is and what is its basic process?

Web crawlers are actually called network data collection , which is easier to understand.

It is to request data (HTML form) from the web server through programming, and then parse the HTML to extract the data you want.

The basic process is summarized in four steps:

img
1. Initiate a request

Initiate a request to the target site through the HTTP library, that is, send a Request, which can contain additional headers and other information, and then wait for the server to respond. The process of this request is like we open a browser, enter the URL in the address bar of the browser: www.baidu.com, and then click Enter. This process is actually equivalent to the browser as a browsing client sending a request to the server.

2. Get the response content

If the server can respond normally, we will get a Response. The content of the Response is the content to be obtained. The types may include HTML, Json strings, binary data (pictures, videos, etc.) and other types. This process is that the server receives the client's request and parses the HTML file of the web page sent to the browser.

3. Parse content

The obtained content may be HTML, which can be parsed using regular expressions and web page parsing libraries. It may also be Json, which can be directly converted to Json object for analysis. It may be binary data, which can be saved or further processed. This step is equivalent to the browser getting the server-side file locally, interpreting it and displaying it.

4. Save data

The way of saving can be to save the data as text, or save the data to the database, or save it as a file in a specific format such as jpg, mp4, etc. This is equivalent to downloading pictures or videos on the webpage when we browse the webpage.

02 What kind of data can crawlers capture?

img

1. Crawlers can crawl web page text data, such as HTML documents, text in json format, etc.

2. Can capture video files.

3. Picture file, the obtained binary file is saved as a picture format.

4. Other documents can be obtained as long as they are requested.

03 Website crawler example

Simulate Taobao automatic login, the login process is completely automated, no need to manually enter login information:

img

Crawl product data information:

img
Isn't it magical and interesting? If you like it, you can leave a message and like at the bottom of the article to show your support for me. Your comments, likes and reposts are the driving force for my continuous updates!

0 experience 0 foundation, how to learn technology to make money?

It is good to learn Python whether it is employment or sideline business to make money, but to learn Python, you still need a study plan.
( python part-time resources + python full set of learning materials )

Friends, if you need it, you can scan the QR code of CSDN official certification below on WeChat to get it for free [guaranteed 100% free]

1. Learning routes in all directions of Python

The technical points in all directions of Python are sorted out to form a summary of knowledge points in various fields. Its usefulness lies in that you can find corresponding learning resources according to the above knowledge points to ensure that you can learn more comprehensively.
insert image description here

2. Essential development tools for Python

insert image description here

3. Python video collection

Watching zero-based learning videos is the quickest and most effective way to learn. Following the teacher's ideas in the video, it is easy to get started from the basics to the in-depth.
insert image description here

4. Practical cases

Optical theory is useless, you have to learn to follow along, and you have to do it yourself, so that you can apply what you have learned to practice. At this time, you can learn from some actual combat cases.insert image description here

5. Python exercises

Check the learning results.
insert image description here
Finally, don't let go of the enthusiasm you started at that time, and become stronger and better together.

Guess you like

Origin blog.csdn.net/m0_59162248/article/details/130471017