[Python] Collect data information from e-commerce platforms

Environment introduction

  • python 3.8
  • pycharm 2021 Professional Edition
  • selenium >>> pip install selenium==3.141.0 Module operation browser driver in Python
  • Chrome browser
  • Chromedriver browser driver operation browser Let the browser help us to perform some operations

module preparation

from selenium import webdriver      # 操作浏览器的功能
import time
import csv

Implementation code

1. Open your browser

The reason for the error is that you downloaded the Google driver and did not configure it properly

Original code. Click to receive (Note: Su)
driver = webdriver.Chrome()

# 让数据加载完
def drop_down():
    """执行页面滚动的操作"""
    for x in range(1, 12, 2):
        time.sleep(1)
        j = x / 9
        js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight * %f' % j
        driver.execute_script(js)
  • The function called in the third step
def parse():
python学习交流Q群:465688591 ### 源码领取
    # .item-bg    s
    divs = driver.find_elements_by_css_selector('.item-bg')  # 所有别名为item-bg的标签
    # 二次提取: 针对每个商品标签 去提取 价格/名称/商品...
    for div in divs:
        # div: 每一个商品
        # 提取商品价格  .get()   parsel
        price = div.find_element_by_css_selector('.def-price').text
        title = div.find_element_by_css_selector('.title-selling-point a').text
        # print(price)
        # 如何提取 标签的属性内容 <a href="https://www.baidu.com" class="" id=""></a>
        # .get_attribute('href'): 提取标签的属性内容
        comment = div.find_element_by_css_selector('.info-evaluate').text
        store = div.find_element_by_css_selector('.store-stock').text
        img_url = div.find_element_by_css_selector('.sellPoint img').get_attribute('src')
        link_url = div.find_element_by_css_selector('.title-selling-point a').get_attribute('href')
        print(title, price, comment, store, img_url, link_url)
        csv_writer.writerow([title, price, comment, store, img_url, link_url])


for page in range(0, 17):

2. Open the official website of suning.com

driver.get(f'https://**不屏蔽会发不出去的.com/iPhone%2013/&iy=0&isNoResult=0&cp={
      
      page}')
# 调用执行页面滚动
drop_down()

3. Fetch data Google-based Google Drive

parse()

Effect

Please add a picture description

at last

Today's sharing ends here

If you have any questions about the article, or other questions about python, you can leave a message in the comment area or private message me. If you think the
article I shared is good, you can follow me or give the article a thumbs up (/≧▽≦)/

Guess you like

Origin blog.csdn.net/sunanpython/article/details/128272118