python使用scrapy爬取数据并保存到mysql以及遇到的一些问题

版权声明:转载请注明出处 https://blog.csdn.net/cotyyang/article/details/81713237
首先以管理员身份打开cmd 输入 pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ pymysql 安装pymysql

然后开始下一步操作:

这是我项目的items.py

class TaobaoItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    #pass
    price=scrapy.Field()
    goodsname=scrapy.Field()
    goodsurl=scrapy.Field()
    shopname=scrapy.Field()
    shopurl=scrapy.Field()
    monthlysales=scrapy.Field()
    totalcomment=scrapy.Field()

在mysql中建好表后

在settings.py中添加

MYSQL_HOST = 'localhost'
MYSQL_DBNAME = 'maorong'         #数据库名字,请修改
MYSQL_USER = 'root'             #数据库账号,请修改
MYSQL_PASSWD = '111000'         #数据库密码,请修改

MYSQL_PORT = 3306

ITEM_PIPELINES = {
    'taobao.pipelines.TaobaoPipeline': 300,
    #'taobao.pipelines.JsonWithEncodingPipeline': 300,#保存到文件中
}

在pipelines.py中设置

import pymysql

class TaobaoPipeline(object):

    maorong_name = 'maorong'
    maorong_insert="""
                        insert into item(price,goodsname,goodsurl,shopname,shopurl,monthlysales,totalcomment)
                        values('{price}','{goodsname}','{goodsurl}','{shopname}','{shopurl}','{monthlysales}','{totalcomment}')
                    """

    def __init__(self,settings):
        self.settings=settings

    def process_item(self, item, spider):
        sqltext = self.maorong_insert.format(
            price=pymysql.escape_string(item['price'][0]),
            goodsname=pymysql.escape_string(item['goodsname'][0]),
            goodsurl=pymysql.escape_string(item['goodsurl']),
            shopname=pymysql.escape_string(item['shopname'][0]),
            shopurl=pymysql.escape_string(item['shopurl'][0]),
            monthlysales=pymysql.escape_string(item['monthlysales'][0]),
            totalcomment=pymysql.escape_string(item['totalcomment'][0]),
        )
        self.cursor.execute(sqltext)
        return item

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def open_spider(self, spider):
        # 连接数据库
        self.connect = pymysql.connect(
            host=self.settings.get('MYSQL_HOST'),
            port=self.settings.get('MYSQL_PORT'),
            db=self.settings.get('MYSQL_DBNAME'),
            user=self.settings.get('MYSQL_USER'),
            passwd=self.settings.get('MYSQL_PASSWD'),
            charset='utf8',
            use_unicode=True)

        # 通过cursor执行增删查改
        self.cursor = self.connect.cursor();
        self.connect.autocommit(True)

    def close_spider(self, spider):
        self.cursor.close()
        self.connect.close()

然后就ok了

1.pymysql.err.ProgrammingError: 1064 (Python字符串转义问题)
pymysql.err.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near

使用pymysql.escape_string()方法price=pymysql.escape_string(item['price'][0])注意这里的[0],如果你的item里]的键值是一个list就需要加[0],如果是str就不用

2.Python: 插入数据库报错:Incorrect string value: ‘\xE9\x9C

这个错一般是存储的数据有中文,是因为在创建数据库的时候没有设置成utf8。注意表里的每项也要设置成utf8

3python连接SQL报错:1366, “Incorrect string value: ‘\xF0\x9F\x98\x81’

是因为mysql不能识别4个字节的utf8编码的字符,抛出了异常,这应该也是问题的根源。☺、��、类似于这种4个字节,将对应字符类型换成将对应的数据类型改为utf8mb4类型,同时连接类型也要改成utf8mb4_general_ci

猜你喜欢

转载自blog.csdn.net/cotyyang/article/details/81713237