Use Scrapy crawling picture storage, and save it locally

Use Scrapy crawling picture storage, and save it locally

on

Articles blog has a brief introduction of the crawling data flow, let us now continue to learn scrapy

 

 

aims:

Crawling love card title, price, and images stored in the database and store map to local

 

Well not much to say, let us achieve the next effect

 

We still use scrapy framework to write our project:

 

1. Start by creating a command reptiles project (in conjunction with the blog post), and to your project as shown in Figure

 

 

2. First configure your settings.py to note here to climb diagram (Figure Configuring a crawl pipeline ImagesPipeline to download pictures of the pipeline system),

As well as memory map address ( create a folder for the images in the project ),

FIG deposit a variety of ways, I just mentioned one, we may take a different approach

 

 

3. Then open your file reptiles (ie: car.py) started writing the data you want to crawl, where the need to pay attention, to start_urls [] instead Url address we have to crawl, and then crawling picture according to xpath 
( here you have to write the code yourself, do not copy)
 

 

 Field keep items.py 4. crawled in the same

 

 

5. Start reptile command scrapy crawl car run will be able to see the pictures stored on the local climb following at the command line

6. Finally storage, you have to look into the library, where can get into mysql and mongdb

 

mysql: database in advance and create a good table, the table field

Import pymysql
 # class NewcarPipeline (Object): 
    # connection mysql user passwords and change your own library 
    # DEF __init __ (Self): 
    #      self.conn = pymysql.connect (Host = '127.0.0.1', the User = 'root ', password =' 123456 ', DB =' Zou ') 
        # establish cursor object 
    #      self.cursor self.conn.cursor = () 
    #
         # traditional values 
    # DEF process_item (Self, Item, Spider): 
    #      name = Item [ 'name'] 
    #      Content Item = [ 'Content'] 
    #      . price = Item [ '. price'] 
    #      Image Item = [ 'image_urls'] 
    #
         # INSERT INTO your table name, inside parentheses correspond to your field
        
    #     sql = "insert into zou(name,content,price) values(%s,%s,%s)"
    #     self.cursor.execute(sql, (name,content,price))
    #     self.conn.commit()
    #     return item
    #关闭爬虫
    # def close_spider(self, spider):
    #     self.conn.close()

 

mongdb: do not build the library in advance, table

from pymongo import MongoClient
# class NewcarPipeline(object):
#     def open_spider(self, spider):
#         #     连端口 ip
#         self.con = MongoClient(host='127.0.0.1', port=27017)
#         #     库
#         db = self.con['p1']
#         # 授权
#         self.con = db.authenticate(name='wumeng', password='123456', source='admin')
#         #     集合
#         self.coll = db[spider.name]

#     def process_item(self, item, spider):
#         # 添加数据
#         self.coll.insert_one(dict(item))
#         return item

#     def close_spider(self):
#         # 关闭
#         self.con.close()

 

7. Run the  start command scrapy crawl car reptiles can see data in the library.

 

 

So far reptiles project done, this is just a simple reptiles, for reference, in case of other problems, you can refer to my blog! Enjoy looking forward to!

 

Guess you like

Origin www.cnblogs.com/wudameng/p/11094772.html