python使用requests库获取网页的内容

网站地址:https://www.k374.com/index.php
网站内容如下:
在这里插入图片描述

第一步导入requests库,使用它访问网页获取到源代码
内容如下:

import requests  
r = requests.get('https://www.k374.com/index.php')  
print(r.text)

运行后获取到内容如下:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>飘花电影网首页-高清电影好看的电视剧-影视大全免费在线观看</title>
<meta name="keywords" content="飘花电影网,新飘花电影,飘花,飘花电影院,飘花网,飘花影视,飘雪影视在线观看影视大全,飘雪电影网电影,飘花电影网最新电影,飘 花电影网站,飘花电影网app,飘花飘花电影网,飘花电影piaohua.com,飘雪网在线观看免费观看BD,飘花电影网手机电影网,,飘花影院,飘花影院手机版" />
<meta name="description" content="先看网(www.k374.com)是一家免费VIP影视在线观看的平台,拥有海量、优质、高清电影和好看的电视剧,清晰画质在线动漫。专业全网收集最新,最好看的电视剧、高清电影、经典动漫、综艺娱乐节目,飘花电影网拥有丰富的内容、极致的观看体验、便捷的高速播放、24小时多平台无缝应用体验以 及快捷!" />
    <meta name="renderer" content="webkit|ie-comp|ie-stand">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
<link rel="shortcut icon" href=/statics/img/favicon.ico" type="image/x-icon" />
<link rel="stylesheet" href="/statics/css/mytheme-font.css?v=1.0.0" type="text/css" />
<link rel="stylesheet" href="/statics/css/mytheme-ui.css?v=1.0.0" type="text/css" />
<link rel="stylesheet" href="/statics/css/mytheme-site.css?v=1.0.0" type="text/css" />
<link rel="stylesheet" href="/statics/css/mytheme-color.css?v=1.0.0" type="text/css" name="default" />
<link rel="stylesheet" href="/statics/css/mytheme-color.css?v=1.0.0" type="text/css" name="skin" disabled/>
<link rel="stylesheet" href="/statics/css/mytheme-color1.css?v=1.0.0" type="text/css" name="skin" disabled/>
<link rel="stylesheet" href="/statics/css/mytheme-color2.css?v=1.0.0" type="text/css" name="skin" disabled/>
<link rel="stylesheet" href="/statics/css/mytheme-color3.css?v=1.0.0" type="text/css" name="skin" disabled/>

<script>var maccms={"path":"","mid":"","url":"www.k374.com","wapurl":"www.k374.com","mob_status":"0"};var myui={"tpl":"www.k374.com" ,"bdapi":"http://bdimg.share.baidu.com/static/api/js/share.js"}</script>
<script type="text/javascript" src="/statics/js/jquery.min.js?v=3.3.1"></script>
<script type="text/javascript" src="/statics/js/layer.js?v3.1.1"></script>
<script type="text/javascript" src="/statics/js/mytheme-site.js?v=1.0.0"></script>
<script type="text/javascript" src="/statics/js/mytheme-ui.js?v=1.0.0"></script>
<script type="text/javascript" src="/statics/js/mytheme-cms.js?v=1.1.0"></script>
<script type="text/javascript" src="/statics/js/home.js"></script>
<style type="text/css">[class*=col-],.myui-panel_hd,.myui-content__list li{ padding: 10px}.btn{ border-radius: 5px;}.myui-vodlist__thumb{ border-radius:5px; padding-top:150%; background: url(/statics/img/include.jpg) no-repeat;}.myui-vodlist__thumb.square{ padding-top: 100%; background: url(/statics/img/include.jpg) no-repeat;}.myui-vodlist__thumb.wide{  background: url(/statics/img/include.jpg) no-repeat;}.myui-vodlist__thumb.actor{ padding-top: 140%;}.flickity-prev-next-button.previous{ left: 10px;}.flickity-prev-next-button.next{ right: 10px;}.myui-sidebar{ padding: 0 0 0 20px;}.myui-panel{ padding: 10px; margin-bottom: 5px; border-radius: 5px;}.myui-layout{ margin: -10px -10px 20px;}.myui-panel-mb{ margin-bottom: 20px;}.myui-panel-box{ padding: 10px;}.myui-panel-box.active{ margin: -10px;}.myui-player__item .fixed{ width: 500px;}.myui-vodlist__text li a{ padding: 10px 15px 10px 0;}.myui-vodlist__media li { padding: 10px 0 10px;}.myui-screen__list{ padding: 10px 10px 0;}.myui-screen__list li{ margin-bottom: 10px; margin-right: 10px;}.myui-page{ padding: 0 10px;}.myui-extra{ right: 20px; bottom: 30px;}@media (min-width: 1400px){
   
   .container{ max-width: 1920px;}.container{ padding-left: 120px;  padding-right: 120px;}.container.min{ width: 1200px; padding: 0;}}@media (max-width: 767px){body,body.active{ padding-bottom: 50px;}[class*=col-],.myui-content__list li{ padding: 5px}.flickity-prev-next-button.previous{ left: 5px;}.flickity-prev-next-button.next{ right: 5px;}.myui-panel{ padding: 0; border-radius: 0;}.myui-layout{ margin: 0;}.myui-vodlist__text li a{ padding: 10px 15px 10px 0;}.myui-vodlist__media li { padding: 5px 0 5px;}.myui-screen__list{ padding: 10px 5px 0;}.myui-screen__list li{ margin-bottom: 5px; margin-right: 5px;}.myui-extra{ right: 20px; bottom: 80px;}.myui-page{ padding: 0 5px;}}</style>
<!--[if lt IE 9]>
<![endif]-->
</head>
    <body class="active">
        <header class="myui-header__top clearfix" id="header-top">
        <div class="container">
                <div class="row">
                        <div class="myui-header_bd clearfix">
                            <div class="myui-header__logo">
                                        <a class="logo" href="/">
                                                                <img class="img-responsive hidden-xs" alt='飘花电影网' src="https://www.k374.com/static/images/logo.jpg"/>
                                                <!--<img class="img-responsive visible-xs" src="https://www.k374.com/static/images/logo.jpg"/>-->  

                                </div>
                                <ul class="myui-header__menu nav-menu">


                                        <li class="active visible-inline-lg"><a href="/" rel='nofollow'>飘花电影网</a></li>
                                                                                    <li class=" visible-inline-lg"><a href="/vodtype/1.html" rel='nofollow'>电影</a></li>
                                                                    <li class=" visible-inline-lg"><a href="/vodtype/2.html" rel='nofollow'>热播电 视剧</a></li>
                                                                    <li class=" visible-inline-lg"><a href="/vodtype/3.html" rel='nofollow'>综艺</a></li>
                                                                    <li class=" visible-inline-lg"><a href="/vodtype/4.html" rel='nofollow'>动漫</a></li>
                                                                    <li class=" visible-inline-lg"><a href="/vodtype/9.html" rel='nofollow'>科幻片</a></li>
                                                                    <!--        <li class=" visible-inline-lg"><a href="/actor.html">明星</a></li> 
                                        <li class=" visible-inline-lg"><a href="/topic.html">专题</a></li> -->


                                        <li class="dropdown-hover">
                                                <a href="javascript:;" rel='nofollow'>频道 <i class="fa fa-angle-down"></i></a>
                                                <div class="dropdown-box bottom fadeInDown clearfix">
                                                        <ul class="item nav-list clearfix">

                                                                <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-warm" href="/" rel='nofollow'>飘花电影网</a></li>
                                                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/1.html" rel='nofollow'>电影</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/2.html" rel='nofollow'>热播电视剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/3.html" rel='nofollow'>综艺</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/4.html" rel='nofollow'>动漫</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/7.html" rel='nofollow'>喜剧片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/8.html" rel='nofollow'>爱情片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/9.html" rel='nofollow'>科幻片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/10.html" rel='nofollow'>恐怖片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/11.html" rel='nofollow'>剧情片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/12.html" rel='nofollow'>战争片</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/13.html" rel='nofollow'>国产剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/14.html" rel='nofollow'>台湾剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/15.html" rel='nofollow'>日剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/16.html" rel='nofollow'>欧美剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/24.html" rel='nofollow'>海外剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/25.html" rel='nofollow'>韩剧</a></li>
                                                                    <li class="col-lg-5 col-md-5 col-sm-5 col-xs-3"><a class="btn btn-sm btn-block btn-default" href="/vodtype/26.html" rel='nofollow'>港剧TVB</a></li>

                                                </ul>
                                                </div>
                                        </li>

                                                                    </div>
                                                        <ul class="myui-header__user">
                                                                <li class="visible-xs">
                                                <a class="open-search" href="javascript:;"><i class="fa fa-search"></i></a>
                                        </li>

内容比较多,下一步我们把想要的数据提取出来,比如我们只要这些类目,提取出来类目就可以了,数据的爬取就完成了。
在上面的例子中,我们抓取的是网站的一个页面,实际上它返回的是一个 HTML 文档。如果想抓取图片、音频、视频等文件,应该怎么办呢?

图片、音频、视频这些文件本质上都是由二进制码组成的,由于有特定的保存格式和对应的解析方式,我们才可以看到这些形形色色的多媒体。所以,想要抓取它们,就要拿到它们的二进制数据。
代码如下:

import requests  
import re
from bs4 import BeautifulSoup
r = requests.get('https://www.k374.com/index.php')  
Content=r.text
soup = BeautifulSoup(Content,'lxml')
q=soup.find_all(attrs={
   
   'class':'item nav-list clearfix'})
strq=str(q)
classname=re.findall('rel="nofollow">(.*?)</a></li>',strq)
print(classname)
print(len(classname))

结果如下:

['飘花电影网', '电影', '热播电视剧', '综艺', '动漫', '喜剧片', '爱情片', '科幻片', '恐怖片', '剧情片', '战争片', '国产剧', '台湾剧', '日剧', '欧美 剧', '海外剧', '韩剧', '港剧TVB']
18

猜你喜欢

转载自blog.csdn.net/m0_46400195/article/details/124660370