python scrapy爬取当当网商品信息 - 代码天地

python scrapy爬取当当网商品信息

其他 2018-05-29 10:22:37 阅读次数: 0

创建项目：scrapy startproject dangdang
这里写图片描述
如下用pycharm打开：

使用默认模版创建爬虫scrapy genspider -t basic dd dangdang.com

这里写图片描述

执行完毕：
这里写图片描述

一、编写item，需要爬取的信息model

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class DangdangItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()
    link = scrapy.Field()

二、开启seting中的pipelines

三、编写爬虫文件

# -*- coding: utf-8 -*-
import scrapy
from dangdang.items import DangdangItem
from scrapy.http import Request

class DdSpider(scrapy.Spider):
    name = 'dd'
    allowed_domains = ['dangdang.com']
    start_urls = ['http://dangdang.com/']
    header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER'}
    def parse(self, response):
        item = DangdangItem()
        item['title'] = response.xpath('p[@name="title"]/a/text()').extract()
        item['link'] = response.xpath('//p[@name="title"]/a/@href').extract()
        print(item)
        yield item
        for i in range(2, 10):  # 获取十页数据
            url = 'http://category.dangdang.com/pg' + str(i) + '-cp01.25.17.00.00.00.html'
            yield Request(url, callback=self.parse, headers=self.header)

猜你喜欢

转载自blog.csdn.net/qq_34288630/article/details/80492266

python scrapy爬取当当网商品信息

Python爬虫实战+Scrapy框架爬取当当网图书信息

Scrapy入门与当当网商品信息爬取实战

python爬虫爬取淘宝网商品信息

python爬取当当网的书籍信息并保存到csv文件

Python |（爬虫）爬取当当网书籍信息存到Excel中

Python爬取淘宝商品信息

爬虫项目实战十一：爬取当当网商品信息

[Python爬虫]爬虫实例:在线爬取当当网畅销书Top500的图书信息

[Python爬虫]爬虫实例:离线爬取当当网畅销书Top500的图书信息

Scrapy入门、当当网商品爬取实战

Python selenium库爬取淘宝网商品信息

Python爬虫模拟浏览器的headers、cookie，爬取淘宝网商品信息

scrapy 当当网书籍信息爬取存储MySQL

scrapy 爬取当当网信息并保存mysql

Python爬取淘宝商品信息入库

python学习之爬取淘宝商品信息

python爬虫 — 爬取淘宝商品信息

Python实战1_2：爬取商品信息

python：淘宝商品信息定向爬取

Python爬虫-爬取京东商品信息

python爬虫—selenium爬取京东商品信息

Python爬虫爬取淘宝，京东商品信息

Python爬取京东商品信息（方式①）

python爬取并分析淘宝商品信息

Python基于BeautifulSoup爬取京东商品信息

Python爬取淘宝商品信息并生成Excel

python爬虫——selenium爬取京东商品信息

scrapy爬取当当网

第一篇博客，使用python爬虫爬取当当网数据存入数据库中

今日推荐

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

最强开源大模型 Llama 3 上架 Gitee AI

虽然老乡鸡开源的不是代码，但背后的原因却让人很暖心

富文本编辑器 Quill 2.0 重磅发布，特性、可靠性与开发者体验大幅提升

周排行

android 文件上传（模拟表单提交）

node中遇到的一些问题

zhuanzai

树莓派3B板载蓝牙与HC05蓝牙模块配对(shell命令实现)

configparser模块简介 configparser模块简介

度度熊的01世界

浅谈log4j-6-xml配置转自godtrue

Kali无线渗透获取宿舍WiFi密码（WPA）

在VMware虚拟机中安装ubuntu

如何用微信公众号二维码事件做扫码登陆

每日归档

更多

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)

2024-04-16(70)

2024-04-15(42)

2024-04-14(0)

2024-04-13(119)

2024-04-12(38)