In duplicate data filtering scrapy

To ensure data climb when no duplicate data, they can achieve a deduplication item pipeline

 

Increasing the constructor method, in which the initialization for the title to the weight set

 

In process_item method, first remove the name of the item to be judged fields, check whether the collection already exists, and if that is duplicate data already exists thrown a DropItem, and will abandon this item, this item will otherwise save field to the collection, and returns the item

Guess you like

Origin www.cnblogs.com/tulintao/p/11700374.html