scrapy in ImagePipeline download pictures to a local, and extracted locally saved address

  1. By scrapy built into ImagePipeline download pictures to your local
    1. Open comments ITEM_PIPELINES in settings and added in here
      ' Scrapy.pipelines.images.ImagesPipeline ' :. 5, 
      # figures represent later execution priority, when performing pipeine will be performed in the digital ascending
    2. Added settings in
      = IMAGES_URLS_FIELD " image_url "   # image_url items.py is configured in the network address to climb obtain images 
      # save the configuration of the local address 
      project_dir = os.path.abspath (os.path.dirname ( __FILE__ ))   # Get the current project of absolute reptiles path 
      IMAGES_STORE = os.path.join (project_dir, ' ImagesRF Royalty Free ' )   # assembling a new image path 
       there are a lot of settings with special needs, then you can use Oh (details can go imagepipeine source view) 
      
         IMAGES_MIN_HEIGHT = 100    # set minimum download pictures height 
      
         IMAGES_MIN_WIDTH = 100  # set download pictures minimum width

       

      1. You may be given:

        ModuleNotFoundError: No module named 'PIL'
        1.  At this time install pip install pillow library on it

  2. Pictures obtain the address stored locally
    1. Download pictures, if you want to get the address of the picture stored locally, then you need to rewrite ImagesPipeline, and calls rewritten in the pipeline in settings
      # If it wants to rewrite, remember to introduce advance 
      from scrapy.pipelines.images Import ImagesPipeline 
      
      class ArticleImagePipeline (ImagesPipeline):
       # item_completed method overloading ImagePipeline in acquiring Download 
      DEF item_completed (Self, Results, Item, info): 
      
        for the ok, value in results:    # breakpoints can be seen that there results the image path 
      
          image_file_path = value [ ' path ' ] # the return path to save the item in the 
      
          item [ ' front_image_path ' ] = image_file_path
        return item

       

Guess you like

Origin www.cnblogs.com/tulintao/p/11588591.html