Teach zero-based Xiaobai to get started easily with Python crawlers! Don't miss it if you want to learn Python crawler!

Since we want to learn Python crawlers, we must first understand the basic principles of web crawlers, and more importantly, be familiar with Python programming, and at the same time, we must also understand HTML, so that we can get started logically.

First of all, we need to know about web crawlers, which can actually be called network data collection. It is to request data (HTML form) from the web server through programming, and then parse the HTML to extract the data you want. There will be a lot of design, such as database, web server, HTTP protocol and many other knowledge.

When you first get started with crawlers, you can find a textbook or online tutorial for beginners to learn. Basically, I have a basic understanding of Python, thinking that it is not difficult to get started with Python. All suggestions first understand the basics of Python.

After laying the foundation, we must understand the basic principles of Python.
In fact, the crawler program mainly consists of these two: send GET request, 1. get HTML, 2. parse HTML, get data.
You must start parsing after you get it, and python also provides a lot of libraries to help you parse HTML, all of which is still very simple.

Here we can use getting Baidu title as an example.
To send HTML data request first, you can use the python built-in library urllib, which has a urlopen function, which can obtain HTML files according to the url.

Import the urlopen function of the urllib library

from urllib.request import urlopen 

Make a request, get html

html = urlopen("https://www.baidu.com/")

The obtained html content is bytes, convert it to a string

html_text = bytes.decode(html.read())

Print html content

print(html_text)

After obtaining it, you can see the effect and compare it with the Baidu homepage. The comparison will find that they are basically the same.

After obtaining the HTML, it is necessary to parse the HTML. We can use the Python library BeautifulSoup as a tool to parse the obtained HTML page. But BeautifulSoup is a third-party library, so you need to install it and use it.

After installation, BeautifulSoup will convert the HTML content into structured content, and then just extract the data from the structured tags.

If you want to get any information, just take out the information you want from the tag, for example, we want to get the title of Baidu

Import the urlopen function

from urllib.request import urlopen

Import BeautifulSoup

from bs4 import BeautifulSoup as bf

Request HTML

html = urlopen("https://www.baidu.com/")

Parse html with BeautifulSoup

obj = bf(html.read(),'html.parser')

Extract the title from the tag head and title

title = obj.head.title

Print title

print(title)

This way you can get the result!
You can try it yourself. According to this basic principle, you can know the basic Python crawler knowledge. More excellent and complex ones depend on accurate learning in the later stage. Learning comes step by step, and those who are new to Python must first lay a solid foundation. There are many tutorials and teaching materials to share online. And they are all free. You can find the basic teaching knowledge on the webpage and do more by yourself.
Insert picture description here

I still want to recommend the Python learning group I built by myself: 645415122 , all of whom are learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and share dry goods from time to time (only Python software development related), including a copy of the latest Python advanced materials and zero-based teaching compiled by myself in 2021. Welcome to the advanced and right

**The following content is useless, this blog was crawled and used by search engines
(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶ ̄)(* ̄︶  ̄)(* ̄︶ ̄)(* ̄︶ ̄) What
is python? How long does it take to learn python? Why is it called crawler
python? Crawler rookie tutorial python crawler universal code python crawler how to make money
python basic tutorial web crawler python python crawler classic examples
python reptiles
(¯)¯ *) (* ¯)¯) (¯)¯ *) (* ¯)¯) (¯)¯ *) (* ¯)¯) ( ¯)¯) ( ¯)¯)
above The content is useless, this blog was crawled and used by search engines

Guess you like

Origin blog.csdn.net/pyjishu/article/details/115183791