Python crawls Taobao product data, a crawler outsourcing project worth thousands of dollars!

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

The complete code can be obtained by clicking the link below~

Python free learning materials and group communication answers Click to join

selenium

Selenium is an automated testing tool for the Web. It was originally developed for automated testing of websites. It is like a button wizard for playing games, which can be automatically operated according to specified commands.

The Selenium testing tool directly controls the browser, just like a real user is operating. Selenium can let the browser automatically load the page according to the instructions, obtain the required data, even take a screenshot of the page, or determine whether certain actions on the website have occurred.

Module installation

pip install selenium
1

Google Drive download link:

https://npm.taobao.org/mirrors/chromedriver/
1
http://chromedriver.storage.googleapis.com/index.html
1

Configure the browser driver:

Decompress the downloaded browser driver, and place the decompressed exe file in the Python installation directory, which is the same directory as python.exe.

or

Put the driver and code in the same path~

Determine landing page

Selenium is to simulate human behavior to operate~ Follow the process~

Crawl content:

  • Commodity price
  • Product name
  • Sales
  • Shop name
  • Delivery place

1. Get the elements of the search box, and enter the content you want to search. Here we search for keywords for ladies bags

driver.find_element_by_css_selector('#q').send_keys('women's bag') 
1

2. Get the search button element and click search

driver.find_element_by_css_selector('.search-button').click()
1

3. The landing page will pop up

  • Solution 1 :
    -Obtain the account and password elements, and enter with the code~ Set the delay reasonably, and the verification code will not appear~
  • Option 2 :
    -Get the Alipay login element and click to manually scan the code to log in.

    Here, choose Option 2. Option One is feasible, but you will need your account password, so you can try it yourself
driver.find_element_by_css_selector('#login-form > div.login-blocks.sns-login-links > a.alipay-login').click()
1

4. Get data on the product list page


It's the same as ordinary crawler parsing website data~Get the list page label, and then extract it again.

Create a dictionary here, receive data, and save it to a csv file for convenience.

lis = driver.find_elements_by_css_selector('#mainsrp-itemlist .item') 
dit = () 
for li in lis: 
    time.sleep(1) 
    price = li.find_element_by_css_selector('.ctx-box .price strong').text + ' Yuan' # Commodity price 
    dit['commodity price'] = price 
    deal = li.find_element_by_css_selector('.ctx-box .deal-cnt').text # volume 
    dit['volume'] = deal 
    row = li.find_element_by_css_selector ('.ctx-box .row-2 a').text # Product name 
    dit['product name'] = row 
    shop = li.find_element_by_css_selector('.shop> a> span:nth-child(2)'). text # shop name 
    dit['shop name'] = shop 
    city ​​= li.find_element_by_css_selector('.row-3> div.location').text # Shipping address 
    dit['Shipping address'] = city

5. Save data

The last step is to save the data, the basic operation

f = open('Taobao data.csv', mode='a', encoding='utf-8-sig', newline='') 
csv_writer = csv.DictWriter(f, fieldnames=['commodity price','transaction Quantity','Product name','Shop name','Ship address']) 
csv_writer.writeheader() 
csv_writer.writerow(dit)

6. Turn the page and click the next page

def next_page():
    driver.find_element_by_css_selector('#mainsrp-pager > div > div > div > ul > li.item.next > a').click()
    driver.implicitly_wait(10)

7. Running effect chart

 

 

 

Guess you like

Origin blog.csdn.net/weixin_43881394/article/details/109095820