Tieba_Spider(爬虫)（py2.xx） - 代码天地

Tieba_Spider(爬虫)（py2.xx）

其他 2018-07-29 14:09:48 阅读次数: 0

import urllib
import urllib.request
import time

def loadPage(url, filename):
    print ("the sys is loading the file you wanted")
    headers = {"User-Agent":" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"}
    request = urllib.Request(url, headers = headers)
    response = urllib.urlopen(request)
    return response.read()

def writePage(html, filename):
    print ("Writing the file" + filename)
    with open(filename,  "w") as myfile:
        myfile.write(html)

def tiebaSpider(url, startPage, endPage):
    print ("we are ready to spidering")

    for page in range(startPage, endPage):
        pn = (page - 1) * 50
        fullurl = url + "&" + str(pn)
        print (fullurl)

        filename = "di" + str(page) + "ye.html"
        html = loadPage(fullurl,filename)

        writePage(html,filename)

        time.sleep(2)
        print ("Thanks for using")

if __name__ == "__main":

    url = "http://tieba.baidu.com/f?"
    kw = input("please input what you wanted spider:")
    startpage = int(input("please input the startpage:"))
    endpage = int(input("please input the endpage:"))

    key = urllib.urlencode({"kw":kw})
    fullurl = url + key
    print (fullurl)
    tiebaSpider(fullurl, startpage, endpage)

猜你喜欢

转载自blog.csdn.net/weixin_42694291/article/details/81166255

Tieba_Spider(爬虫)（py2.xx）

ImportError: cannot import name xx (scrapy爬虫之xx_spider.py和xxItem.py文件中import出错的问题)

Spider 爬虫

编写Spider.py

爬虫（Xpath）——爬tieba.baidu.com (bug)

爬虫-----lagou2.py

Python：Spider爬虫工程化入门到进阶（2）使用Spider Admin Pro管理scrapy爬虫项目

爬虫原理详解spider

Spider-爬虫介绍

【spider】爬虫分析

Web Spider - 爬虫

Java网络爬虫Spider

Spider爬虫个人练习

PHP Spider爬虫

爬虫 -Spider扩展介绍

爬虫---基础语法及案例 py-2

XX Spider.parse callback is not defined

JAVA 爬虫 WebCrawler Spider Bot

net spider（python 网络爬虫）

Spider爬虫框架之Selectors

爬虫框架Scrapy的组件spider

spider 02爬虫requests库

py爬虫 —— py爬虫requests

Python web spider(2)

Lesson 2 Spare that spider

2-2 基于 API 的爬虫（版本：py3）

CrawlSpider（规则爬虫）和Spider版爬虫

Spider-聚焦爬虫与通用爬虫的区别

小白的py爬虫学习笔记_1_2

scrapy报错之：XX Spider.parse callback is not defined

今日推荐

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

周排行

阿里云短信服务平台注册

Windows下的字符串处理(1)

sqoop: mysql导入数据到hdfs, hive, hbase

commons.lang中常用的工具类

离线安装PostgreSQL11.6

使用PyTorch简单实现卷积神经网络模型

一文彻底搞定谱聚类

一道面试题引发的血案

One Chat for Mac(聊天工具)

TCP/IP的底层队列是如何实现的？

每日归档

更多

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)