Python使用requests及BeautifulSoup构建爬虫实例代码 - 代码天地

Python使用requests及BeautifulSoup构建爬虫实例代码

其他 2020-03-07 22:06:39 阅读次数: 0

本文研究的主要是Python使用requests及BeautifulSoup构建一个网络爬虫，具体步骤如下。

功能说明

在Python下面可使用requests模块请求某个url获取响应的html文件，接着使用BeautifulSoup解析某个html。

案例

假设我要http://maoyan.com/board/4猫眼电影的top100电影的相关信息，如下截图：在这里插入图片描述
获取电影的标题及url。

安装requests和BeautifulSoup

使用pip工具安装这两个工具。

pip install requests

在这里插入图片描述

pip install beautifulsoup4

在这里插入图片描述
程序

__author__ = 'Qian Yang'
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
def get_one_page(url):
  response= requests.get(url)
  if response.status_code == 200:
    return response.content.decode("utf8","ignore").encode("gbk","ignore")
#采用BeautifulSoup解析
def bs4_paraser(html):
  all_value = []
  value = {}
  soup = BeautifulSoup(html,'html.parser')
  # 获取每一个电影
  all_div_item = soup.find_all('div', attrs={'class': 'movie-item-info'})
  for r in all_div_item:
    # 获取电影的名称和url
    title = r.find_all(name="p",attrs={"class":"name"})[0].string
    movie_url = r.find_all('p', attrs={'class': 'name'})[0].a['href']
    value['title'] = title
    value['movie_url'] = movie_url
    all_value.append(value)
    value = {}
  return all_value
 
def main():
  url = 'http://maoyan.com/board/4'
  html = get_one_page(url)
  all_value = bs4_paraser(html)
  print(all_value)
 
if __name__ == '__main__':
  main()

代码测试可用，实现效果：
在这里插入图片描述

最后给大家推荐一个资源很全的python学习聚集地，[点击进入]，这里有我收集以前学习心得，学习笔记，还有一线企业的工作经验，且给大定on零基础到项目实战的资料，大家也可以在下方，留言，把不懂的提出来，大家一起学习进步
总结

以上就是本文关于Python使用requests及BeautifulSoup构建爬虫实例代码的全部内容，希望对大家有所帮助

程序员浩然

发布了33 篇原创文章 · 获赞 22 · 访问量 3万+

私信关注

猜你喜欢

转载自blog.csdn.net/haoxun09/article/details/104723024

Python使用requests及BeautifulSoup构建爬虫实例代码

Python爬虫之BeautifulSoup和requests的使用

python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例

[Python][爬虫03]requests+BeautifulSoup实例:抓取图片并保存

python爬虫基础（requests、BeautifulSoup）

python爬虫使用requests和BeautifulSoup出现中文乱码

python爬虫之requests+selenium+BeautifulSoup

python3 爬虫（requests+BeautifulSoup）

Python网络爬虫笔记（四）——requests与BeautifulSoup

python爬虫基础Ⅰ——requests、BeautifulSoup：书本信息

python爬虫爬取招聘（ requests，BeautifulSoup）

使用requests+BeautifulSoup的简单爬虫练习

爬虫【三】 requests和BeautifulSoup的使用

爬虫库requests和BeautifulSoup的基本使用

[Python][爬虫02]requests+BeautifulSoup实例:抓取网易云歌单

Python爬虫库BeautifulSoup的介绍与简单使用实例

python 爬虫（一） requests+BeautifulSoup 爬取简单网页代码示例

python爬虫——利用requests库BeautifulSoup简单爬取网页上照片—代码完善

爬虫 - requests 和 BeautifulSoup

python3 requests + BeautifulSoup 爬取阳光网投诉贴详情实例代码

【爬虫学习一】 Python实现简单爬虫（requests，BeautifulSoup）

【Python网络爬虫】使用requests和beautifulsoup4库轻松实现

一个超实用的python爬虫功能使用 requests BeautifulSoup

Python:requests库、BeautifulSoup4库的基本使用（实现简单的网络爬虫）

（待整理）Python:requests库、BeautifulSoup4库的基本使用（实现简单的网络爬虫）

Python爬虫实战：使用Requests和BeautifulSoup爬取网页内容

python爬虫之beautifulsoup的使用

python爬虫_BeautifulSoup库使用

Python爬虫——BeautifulSoup的使用（C）

python学习爬虫（1）--环境搭建Python+requests+BeautifulSoup

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)