Step 6 Python crawler streamline data extracted by BeautifulSoup

I wrote an article on how to use BeautifulSoup parse the data, then this article will write How BeautifulSoup extract data.
find()与find_all()
find () and find_all () method is BeautifulSoup two objects.
They can match html tags and attributes extracted BeautifulSoup data objects that match the requirements.
Two usage is basically the same.
The difference is that, Find () extracts only the data that satisfy the first requirement, while find_all () extracts all the data to meet the requirements.
Usage look at the code below:

import requests
from bs4 import BeautifulSoup
url = '随便复制一个网站'
res = requests.get (url)
print(res.status_code)
soup = BeautifulSoup(res.text,'html.parser')
item = soup.find('div') #使用find()方法提取首个<div>元素,并放到变量item里。
print(type(item)) #打印item的数据类型
print(item)       #打印item 
import requests
from bs4 import BeautifulSoup
url = '随便一个网站'
res = requests.get (url)
print(res.status_code)
soup = BeautifulSoup(res.text,'html.parser')
items = soup.find_all('div') #用find_all()把所有符合要求的数据提取出来,并放在变量items里
print(type(items)) #打印items的数据类型
print(items)       #打印items

Tag object
Tag object is a data type, there are three usage:
1.Tag objects can find () and find_all () to continue the retrieval.
2. Tag proposed text object with Tag.text.
3. URL extracted with Tag [ 'href']. (The Tag [ 'attribute name'])

Here Insert Picture Description
So far, four reptile has completed three.
Reptile four steps:
1. Obtain the data. 2. Parse the data. 3. extract the data. 4. Save the data.

Here Insert Picture Description

Published 10 original articles · won praise 68 · views 10000 +

Guess you like

Origin blog.csdn.net/LoraRae/article/details/104435424