python 爬取电影名、电视名、或人民 - 代码天地

python 爬取电影名、电视名、或人民

编程语言 2018-04-21 18:55:17 阅读次数: 4

关于爬虫的一个入门博客：
http://blog.sina.com.cn/s/blog_63cf1c510101dshu.html
BeautifulSoup的使用：
http://wiki.jikexueyuan.com/project/python-crawler-guide/beautiful-soup.html
https://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html
关于编码的一些问题：
https://www.cnblogs.com/nyist-xsk/p/7732279.html
自己实现的爬取http://www.resgain.net/xmdq.html上人名的脚本：

#!/usr/bin/bash
# -*- coding: utf-8 -*-

import re
import urllib2
from bs4 import BeautifulSoup
import sys
reload(sys)
sys.setdefaultencoding('utf-8')


#根据指定的URL获取网页内容
def gethtml(url):
    req = urllib2.Request(url) 
    response = urllib2.urlopen(req) 
    html = response.read()
    return html

#获取分页数据
def getname(html):
   bs=BeautifulSoup(html)
   tmp=bs.find_all('a',target='_blank')
   #rel=u'([\u4E00-\u9FA5]+?)'
   rel=r'target=\"_blank\"\>(.+?)\<'
   names=re.findall(rel,str(tmp))
   return names

def save(url):
    html=gethtml(i)
    pname=getname(html)

    global fo
    for x in pname:
        #print x.decode("unicode_escape")
        fo.write(x.decode('unicode_escape')+'\n')

#获取主页分类
def getmain(html):
    bs=BeautifulSoup(html)
    tmp=bs.find_all('a',class_='html-attribute-value html-external-link')
    rel=r'href=\"(http://.[^w][^\"]+?)\"'
    tags=re.findall(rel,str(tmp))
    return tags


url="http://www.resgain.net/xmdq.html"
#html_main=gethtml(url)
html_main=open("t.html")
filename="name.txt"
fo=open(filename,"w")

all_tag=getmain(html_main)
#print all_tag

for i in all_tag:
    print i
    save(i)

    i1=i[:-6]
    i2=i[-5:]
    for j in range(2,10):
        url_child=i1+'_'+str(j)+i2
       # print url_child
        save(url_child)
fo.close()

猜你喜欢

转载自blog.csdn.net/w_manhong/article/details/80018410

python 爬取电影名、电视名、或人民

Python爬取豆瓣动作电影好评前100名

python爬虫--爬取豆瓣top250电影名

Python爬取豆瓣高分电影前250名

Python爬取猫眼电影前一百名

python爬虫爬取豆瓣电影前250名电影及评分（requests+pyquery)

python爬虫（一）爬取豆瓣电影排名前50名电影的信息

利用python爬取豆瓣电影榜top250的电影名及其对应网址

【Python简单爬虫设计】对豆瓣TOP100的电影名及简要的爬取

Python爬取电影天堂指定电视剧或者电影

Python爬取豆瓣电影

python爬取电影并下载

python爬取猫眼电影

Python爬取电影天堂

Python爬取电影信息

python 爬取电影天堂电影续编

python 爬取电影天堂电影

【python+爬虫】爬去猫眼电影前100名具体信息

python自己摸索：成功动态爬取豆瓣电影Top250名单，并从1-250分别列出来（附最接地气的--详细思路和代码注释）

Python爬取网易云音乐辑的图片、专辑名和专辑出版时间

python 爬取豆瓣电影案例

python爬取猫眼电影信息

python爬虫，爬取豆瓣电影信息

python爬虫爬取猫眼电影数据

Python爬虫爬取猫眼电影排行

[python爬虫]爬取电影天堂连接

Python爬虫：爬取网站电影信息

python爬虫实践——爬取豆瓣电影

用python爬取猫眼电影排行

Python爬取猫眼电影案例

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)