简单爬取红牛分公司基本数据part02
此教材利用requests、pandas、bs4三个模块。
代码实现如下:
方式一:直接在终端打印
import requests
import pandas
from bs4 import BeautifulSoup
url=requests.get('http://www.redbull.com.cn/about/branch')
soup = BeautifulSoup(url.text,'lxml')
# title_list = soup.find_all(name='h2')
# for title in title_list:
# print(title.text)
# addr_list = soup.find_all(name='p',class_='mapIco')
# for addr in addr_list:
# print(addr.text)
# email_list = soup.find_all(name='p',class_='mailIco')
# for email in email_list:
# print(email.text)
# phone_list = soup.find_all(name='h2')
# for phone in phone_list:
# print(phone.text)
# 列表生成式
title_list = [title.text for title in soup.find_all(name='h2')]
# print(title_list)
addr_list = [addr.text for addr in soup.find_all(name='p',class_='mapIco')]
# print(addr_list)
email_list = [email.text for email in soup.find_all(name='p',class_='mailIco')]
# print(email_list)
phone_list = [phone.text for phone in soup.find_all(name='p',class_='telIco')]
# print(phone_list)
# for i in range(40):
# print('''
# 公司名称:%s
# 公司地址:%s
# 公司邮箱:%s
# 公司电话:%s
# '''%(title_list[i],addr_list[i],email_list[i],phone_list[i]))
方式二:保存到Excel表格中
import requests
import pandas
from bs4 import BeautifulSoup
url=requests.get('http://www.redbull.com.cn/about/branch')
soup = BeautifulSoup(url.text,'lxml')
# title_list = soup.find_all(name='h2')
# for title in title_list:
# print(title.text)
# addr_list = soup.find_all(name='p',class_='mapIco')
# for addr in addr_list:
# print(addr.text)
# email_list = soup.find_all(name='p',class_='mailIco')
# for email in email_list:
# print(email.text)
# phone_list = soup.find_all(name='h2')
# for phone in phone_list:
# print(phone.text)
# 列表生成式
title_list = [title.text for title in soup.find_all(name='h2')]
# print(title_list)
addr_list = [addr.text for addr in soup.find_all(name='p',class_='mapIco')]
# print(addr_list)
email_list = [email.text for email in soup.find_all(name='p',class_='mailIco')]
# print(email_list)
phone_list = [phone.text for phone in soup.find_all(name='p',class_='telIco')]
# print(phone_list)
data_dict = {
"公司名称":title_list,
"公司地址":addr_list,
"公司邮箱":email_list,
"公司电话":phone_list
}
df = pandas.DataFrame(data_dict)
df.to_excel(r'data_info.xlsx')
更多精彩,后续更新!!!