Preface
The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.
PS: If you need Python learning materials, you can click on the link below to get it yourself
The complete code can be obtained by clicking the link below~
Python free learning materials and group communication answers Click to join
selenium
Selenium is an automated testing tool for the Web. It was originally developed for automated testing of websites. It is like a button wizard for playing games, which can be automatically operated according to specified commands.
The Selenium testing tool directly controls the browser, just like a real user is operating. Selenium can let the browser automatically load the page according to the instructions, obtain the required data, even take a screenshot of the page, or determine whether certain actions on the website have occurred.
Module installation
pip install selenium
Google Drive download link:
https://npm.taobao.org/mirrors/chromedriver/
http://chromedriver.storage.googleapis.com/index.html
Configure the browser driver:
The downloaded browser driver decompression, unpacked exe
file into Python installation directory, that is, and python.exe
the same directory.
or
Put the driver and code in the same path~
Determine landing page
Selenium is to simulate human behavior to operate~ Follow the process~
Crawl content:
- Commodity price
- Product name
- Sales
- Shop name
- Delivery place
1. Get the elements of the search box, and enter the content you want to search. Here we search for keywords for ladies bags
driver.find_element_by_css_selector('#q').send_keys('女式包包')
2. Get the search button element and click search
driver.find_element_by_css_selector('.search-button').click()
3. The landing page will pop up
- Solution 1 :
-Obtain the account and password elements, and enter with the code~ Set the delay reasonably, and the verification code will not appear~ - Option 2 :
-Get the Alipay login element and click to manually scan the code to log in.
Here, choose Option 2. Option One is feasible, but you will need your account password, so you can try it yourself
driver.find_element_by_css_selector('#login-form > div.login-blocks.sns-login-links > a.alipay-login').click()
4. Get the product list page data The
same as the ordinary crawler parses the website data~ Get the list page label, and then extract it again.
Create a dictionary here, receive data, and save it to a csv file for convenience.
lis = driver.find_elements_by_css_selector('#mainsrp-itemlist .item')
dit = {
}
for li in lis:
time.sleep(1)
price = li.find_element_by_css_selector('.ctx-box .price strong').text + '元' # 商品价格
dit['商品价格'] = price
deal = li.find_element_by_css_selector('.ctx-box .deal-cnt').text # 成交量
dit['成交量'] = deal
row = li.find_element_by_css_selector('.ctx-box .row-2 a').text # 商品名字
dit['商品名字'] = row
shop = li.find_element_by_css_selector('.shop > a > span:nth-child(2)').text # 店铺名字
dit['店铺名字'] = shop
city = li.find_element_by_css_selector('.row-3 > div.location').text # 发货地址
dit['发货地址'] = city
5. Save data
The last step is to save the data, the basic operation
f = open('淘宝数据.csv', mode='a', encoding='utf-8-sig', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['商品价格', '成交量', '商品名字', '店铺名字', '发货地址'])
csv_writer.writeheader()
csv_writer.writerow(dit)
6. Turn the page and click the next page
def next_page():
driver.find_element_by_css_selector('#mainsrp-pager > div > div > div > ul > li.item.next > a').click()
driver.implicitly_wait(10)
7. Running effect chart