Python automatically crawls Taobao product data into execl form!

 

Hello, everyone, I’m Yatou Shrine!

In the e-commerce era, product data on Taobao, JD.com, and Tmall are of great help to store operations. Therefore, obtaining data on corresponding store products can bring great value. So how do we obtain the corresponding data?

In the previous article, we talked about the python packaging exe executable file: the most detailed tutorial in the history of Python packaging into an exe file: https://blog.csdn.net/xtreallydance/article/details/112643658 This time we will explain the Taobao crawler code - --code show as below:

 
 

from selenium import webdriver import time import csv import re

Import the automated library selenium, cache the time library, save the file in csv format, and import the re-regular matching library

 
 

if __name__ =='__main__': keyword = input("Please enter the keyword of the product you want:") path = r'L:\webdriver\chromedriver.exe' driver = webdriver.Chrome(path) driver.get(' https://www.taobao.com/') main()

Enter the query keyword: For example, enter the ins trend t-shirt, path is the path where the webdriver.exe drive device is stored, instantiate a driver object, and then use the get method to visit the Taobao website! Then call the main() method.

 

Please remember, you must scan the code to log in! Otherwise, it will be crawled by Taobao! as the picture shows!

  • The results of the program running are as follows:

 

 

 
 

def main(): print('Crawling the first page of data') page = search_product(keyword) get_product() page_num = 1 # q unchanged 0 44 188(page number*44) while page_num != page: print( '-*-' * 50) print('Crawling the data of page ()'.format(page_num + 1)) print('*-*' * 50) driver.get('https://s. taobao.com/search?q={}&s={}'.format(keyword, page_num)) # Browser wait method driver.implicitly_wait(2) # Maximize the browser driver.maximize_window() get_product() page_num += 1

In the main() method, first use the search_product function and get_product function to crawl one page of data, and then use the while loop to crawl all the data. We first explain the crawling of one page of data.

 
 

def search_product(key): driver.find_element_by_id('q').send_keys(key) driver.find_element_by_class_name('btn-search').click() # Maximize the browser window driver.maximize_window() time.sleep(15) # Because automatic login is basically not possible, so manual login is required. # Find the page number tag page = driver.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[1] ').text page = re.findall('(\d+)', page)[0] return int(page)

First use the driver.find_element_by_id method to find the input box, enter the key variable into the search box, then use the driver.find_element_by_class_name method to find the search two words, use the click() method to click search. Maximize the window and pause for 15s. Because the automatic login to Taobao will be recognized by Alibaba, all the 15s pause is to manually scan the code to log in. Then use xapth to find the label of the page number, match the number to get the first value, and return the page number, such as page 5. , The return is 5. Pass the parameters into the page, and call the get_product() method to obtain detailed product data on this page, such as product name, product price, number of people paying, product address, product store name, etc., below Look at the function get_product()

 

 

 
 

def get_product(): divs = driver.find_elements_by_xpath('//div[@class="items"]/div[@class="item J_MouserOnverReq "]') print(divs) for div in divs: # 商品名称 info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a').text # 商品价格 price = div.find_element_by_xpath('.//strong').text + "元" # 付款人数 deal = div.find_element_by_xpath('.//div[@class="deal-cnt"]').text # 店铺名称 name = div.find_element_by_xpath('.//div[@class="shop"]/a').text # 店铺地点 place = div.find_element_by_xpath('.//div[@class="location"]').text print(info, price, deal, name, place, sep='|') with open('ins短袖.csv', 'a', newline="") as fp: csvwriter = csv.writer(fp, delimiter=',') csvwriter.writerow([info, price, deal, name, place])

First find the tag divs of the product list, then use the for loop to get the div tag of each product, use the xpath syntax to get the info, price, deal, name, place information and save it as a csv file!

  • The finally crawled data is imported into excel, as shown in the figure:

 

Okay, that’s all for today’s sharing, and then Xiaoyedou is going to learn, see you!

Guess you like

Origin blog.csdn.net/weixin_43881394/article/details/112983517