E-commerce platform commodity data crawler analysis (docking test is available in the test phase)

1. Brief description

Jingdong, Taobao, Tmall, Taote, Pinduoduo, Alibaba, 1688, Douyin, Suning, Amazon China, lazada, AliExpress and more than 50 well-known platforms in the world capture data, store it in the database and analyze it

2. Grab the DIC

the_basic_info = { 
                    'search_keyword': self.keyword, "keyword used" 
                    'last_crawling_timestamp': datetime.now(), "current crawling time" 
                    'platform': 'JD', "crawling platform" 
                    'product_name': product_name , "Product Name" 
                    'seller_name': seller_name, "Business Name" 
                    'sku_id': _data_pid, "ProductId" 
                    'default_price': float(final_price), "Final Price" 
                    'final_price': 0, 
                    'item_url': _http, "Product web address" 
                    'comments_ave_score':float(score_avg), "product rating" float(score_avg), "product rating" 
                    'comments_count': comment_count, "number of product reviews"
                    'images': img, "product image address" 
                    'current_stock': location_list, "Product storage address" 
                    'search_rank': rank, "Ranking under the current search index" 
                    'search_order': order, "Current index (by sales volume, Price, popularity, etc.)" 
                    'seller_url': seller_url, "Seller web page address" 
                    'comments_list': comment_list "Specific comments, support crawling 100 comments" 
                }

one example:

Product_name Dell Inspiron 15PR-6748B 15.6-inch Gaming Laptop (i7-7700HQ 8G 128GSSD+1T GTX1050 4G IPS) black last_crawling_timestamp 2017-12-28 20:20:09.684290 seller_name Dell JD self-operated flagship store
sku_
id
4824733
default_price 6599.0
item_url  【DELL Inspiron 15PR-6748B】Dell DELL Inspiron 15.6-inch Gaming Laptop (i7-7700HQ 8G 128GSSD+1T GTX1050 4G Independent Display IPS Fast Heat Dissipation) Black【Quotation Price Evaluation】-Jingdong comments_count
72000
comments_ave_score 5.0
images [' http://img13.360buyimg.com/n7/jfs/t12472/179/736139380/319777/f266f597/5a128bf6N079a87ba.jpg ']
search_rank 1
seller_url  Dell's self-operated official flagship of JD.com Shop - JD.com
comments_list [{'content_score': 5, 'content_time': '2017-12-05 18:54:31', 'content_title': None, 'content': 'It has been used for nearly a month, let me tell you about the experience. I bought it in the early morning of November 9th, and it arrived in the afternoon of the same day. The packaging is streamlined, and there is a Dell box in the Jingdong bag. The computer has a good appearance, A-side skin type, and the rear cooling vent is very handsome. The computer is not light and thin, because it is a bit thick because of the good workmanship, but this is a bit like a game book. There are also Shadow Elf 2pro and R720 in the dormitory. Compared with the 2pro keyboard, it is quite flexible to type, but the backlight is not as bright as the other two. Personally, I think the R720 has the best keyboard touch, and the keys are bigger. Let’s talk about the incomparable thing between R720 and 2PRO and the game box, that is the subwoofer, the sound quality is very good, the three roommates all praised and envied the sound quality of the game box. So my computer also became the stereo in our dormitory. . . The screen is ips45 color gamut. For those who have been using TN screens before, I feel that this computer screen is quite good. Let’s talk about performance. In fact, performance is the last thing to mention. The configuration is all there. Master Lu has a running score of nearly 18,000. 1050ti can handle most large-scale stand-alone games, and the picture quality in the game can run smoothly. When running a large game, the fan will run at full capacity, and the sound is a bit loud (good heat dissipation and low noise cannot be achieved at the same time), I pay more attention to heat dissipation, so it doesn't matter if the fan is louder, it sounds quite exciting. Solid state (not nvme protocol) and mechanical hard drives are relatively poor, and it takes about ten seconds to boot. Let's sum it up. Advantages: 1. High appearance 2. Good heat dissipation 3. Excellent workmanship 4. Configure subwoofer Disadvantages: 1. Low-end ips screen 2. Slightly thick and heavy 3. Hard disk is poor'}]

3. Test?

if __name__ == "__main__":
    j = JDMonitoringEngine()
    j.set_searching_url(_keyword="dell", _page_limit=1, _order=["sales"])
    url_list = j.url_list
    for _index, url_dict in enumerate(url_list):
        logger.info("Sending {0}/{1} url dict to basic info extraction".format(
            (_index + 1), len(url_list)))
        results = list(map(lambda x: j.get_basic_info(x), url_dict))

Change the _keyword, _page_limit, and _order in the jd_monitoring_engine main method
to the examples you want to test. The three parameters are keywords, search pages and search index.

4. Interface encapsulation code

1. Request method: HTTPS GET POST

2. Public parameters:

name type must describe
key String yes Call key (must be spliced ​​in the URL in GET mode)
secret String yes Call key (copy vxin:Taobaoapi2014 )
api_name String yes API interface name (included in the request address) [item_search, item_get, item_search_shop, etc.]
cache String no [yes, no] The default is yes, the cached data will be called, and the speed is relatively fast
result_type String no [json,jsonu,xml,serialize,var_export] returns the data format, the default is json, and the content output by jsonu can be read directly in Chinese
lang String no [cn,en,ru] translation language, default cn Simplified Chinese
version String no API version

3. Request parameters:

Settings:q=start_price=0&end_price=0&page=1&cat=0&discount_only=&sort=&seller_info=no&nick=&seller_info=&nick=&ppath=&imgid=&filter=

Parameter description: q: search keyword, support url
cat: category ID
start_price: start price
end_price: end price
sort: sort [bid,_bid,_sale,_review,_new]
  (bid: total price, sale: sales volume, number of reviews , new new product, add _ prefix to sort from big to small) <
page: 

4. Request code samples, support high concurrent requests (CURL, PHP, PHPsdk, Java, C#, Python...) 

# coding:utf-8
"""
Compatible for python2.x and python3.x
requirement: pip install requests
"""
from __future__ import print_function
import requests
# 请求示例 url 默认请求参数已经做URL编码
url = "https://api-vxin.Taobaoapi2014.cn/jd/item_search/?key=<您自己的apiKey>&secret=<您自己的apiSecret>&q=女装&start_price=0&end_price=0&page=1&cat=0&discount_only=&sort=&seller_info=no&nick=&seller_info=&nick=&ppath=&imgid=&filter="
headers = {
    "Accept-Encoding": "gzip",
    "Connection": "close"
}
if __name__ == "__main__":
    r = requests.get(url, headers=headers)
    json_obj = r.json()
    print(json_obj)

5. Due to the character limit of the article, the response example will not be displayed for the time being.

Guess you like

Origin blog.csdn.net/tbprice/article/details/130217449