Python crawls Taobao commodity data, a crawler outsourcing project worth thousands of dollars

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

The complete code can be obtained by clicking the link below~

Python free learning materials and group communication answers Click to join

selenium

Selenium is an automated testing tool for the Web. It was originally developed for automated testing of websites. It is like a button wizard for playing games, which can be automatically operated according to specified commands.

The Selenium testing tool directly controls the browser, just like a real user is operating. Selenium can let the browser automatically load the page according to the instructions, obtain the required data, even take a screenshot of the page, or determine whether certain actions on the website have occurred.

Module installation

pip install selenium

Google Drive download link:

https://npm.taobao.org/mirrors/chromedriver/
http://chromedriver.storage.googleapis.com/index.html

Configure the browser driver:

The downloaded browser driver decompression, unpacked exefile into Python installation directory, that is, and python.exethe same directory.

or

Put the driver and code in the same path~

Determine landing page

Insert picture description here

Selenium is to simulate human behavior to operate~ Follow the process~

Crawl content:

  • Commodity price
  • Product name
  • Sales
  • Shop name
  • Delivery place

1. Get the elements of the search box, and enter the content you want to search. Here we search for keywords for ladies bags

driver.find_element_by_css_selector('#q').send_keys('女式包包')

2. Get the search button element and click search

driver.find_element_by_css_selector('.search-button').click()

3. The landing page will pop up

  • Solution 1 :
    -Obtain the account and password elements, and enter with the code~ Set the delay reasonably, and the verification code will not appear~
  • Option 2 :
    -Get the Alipay login element and click to manually scan the code to log in.
    Insert picture description here
    Here, choose Option 2. Option One is feasible, but you will need your account password, so you can try it yourself
driver.find_element_by_css_selector('#login-form > div.login-blocks.sns-login-links > a.alipay-login').click()

4. Get the product list page data The
Insert picture description here
same as the ordinary crawler parses the website data~ Get the list page label, and then extract it again.

Create a dictionary here, receive data, and save it to a csv file for convenience.

lis = driver.find_elements_by_css_selector('#mainsrp-itemlist .item')
dit = {
    
    }
for li in lis:
    time.sleep(1)
    price = li.find_element_by_css_selector('.ctx-box .price strong').text + '元'  # 商品价格
    dit['商品价格'] = price
    deal = li.find_element_by_css_selector('.ctx-box .deal-cnt').text      # 成交量
    dit['成交量'] = deal
    row = li.find_element_by_css_selector('.ctx-box .row-2 a').text      # 商品名字
    dit['商品名字'] = row
    shop = li.find_element_by_css_selector('.shop > a > span:nth-child(2)').text      # 店铺名字
    dit['店铺名字'] = shop
    city = li.find_element_by_css_selector('.row-3 > div.location').text      # 发货地址
    dit['发货地址'] = city

5. Save data

The last step is to save the data, the basic operation

f = open('淘宝数据.csv', mode='a', encoding='utf-8-sig', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['商品价格', '成交量', '商品名字', '店铺名字', '发货地址'])
csv_writer.writeheader()
csv_writer.writerow(dit)

6. Turn the page and click the next page

def next_page():
    driver.find_element_by_css_selector('#mainsrp-pager > div > div > div > ul > li.item.next > a').click()
    driver.implicitly_wait(10)

7. Running effect chart
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/fei347795790/article/details/109076130