Python3 web crawler (6): 618, if you love him/her, empty his/her shopping cart!

Python3 web crawler (6): 618, if you love him/her, empty his/her shopping cart!

This article  has been included on GitHub  https://github.com/Jack-Cherish/PythonPark . There are technical dry goods articles, organized learning materials, and first-line manufacturers' interview experience sharing. Welcome to Star and improve it.

I. Introduction

There are 5 articles in the crawler series.

Regular content downloads such as text, pictures, videos, and API usage should be a breeze for you.

Today, I will explain the advanced skills, "simulated login".

On the occasion of 618, help him/her empty a wave of shopping carts!

Two, simulated login

Learn to crawl, you can always hear the four words "simulated login", what exactly is "simulated login"?

In layman's terms, "simulated login" means that the program automatically logs in to a website with an account and password.

Then, get the website data that can only be downloaded after logging in.

For example, we can only see what is in the shopping cart after logging in to our Taobao account.

In this article, take the "simulated login" Taobao as an example to explain and help him/her empty the shopping cart.

You only need to know his/her Taobao account and password , and have a sufficient wallet , you can run the program, scan the QR code and pay in one go.

Experience the thrill of automatic settlement , the wallet is empty in seconds !

Three, Selenium

Simulated login is nothing more than two methods: request package analysis simulated login, and automated test tool simulated login.

The former requires capturing packets and analyzing requests, parsing various parameters, and may involve some encryption algorithms.

The latter can bypass some tedious analysis processes and directly locate elements for operation, but it will also encounter some anti-climbing strategies.

Both have their own operating skills.

The previous tutorial explained a lot of crawler ideas based on requests request packet analysis.

This article explains a new idea, using the automated testing tool Selenium to simulate login.

The basic usage of Selenium and how to crack Taobao's anti-climbing strategy for Selenium are as follows.

1. Selenium installation

Selenium is an automated testing tool that supports various mainstream browsers, such as Chrome, Safari, Firefox, etc.

It doesn't matter if you don't know what an automated testing tool is, I will explain it slowly through actual combat operations.

Anyway, install Selenium first.

pip install selenium

Use pip to install selenium directly  .

In addition to installing Python's Selenium third-party library, you also need to configure the corresponding browser driver according to the browser.

Take Chrome as an example, download the browser driver.

Driver download address (need to go over the wall): click to view

You need to select the driver download according to the browser version.

Python3 web crawler (6): 618, if you love him/her, empty his/her shopping cart!

It doesn't matter if you can't download over the wall, I have downloaded and uploaded these three versions of the driver to Baidu Cloud.

Baidu cloud link: https://pan.baidu.com/s/1-AfONQGkK8xPwLaW5P-9Bw

Extraction code: cbsu

2. A small test

Use Selenium to log in to Baidu to take a look.

from selenium import webdriver

if __name__ == "__main__":
    browser = webdriver.Chrome('path\to\your\chromedriver.exe')
    browser.get('https://www.baidu.com/')

The path\to\your\chromedriver.exe above is the location of the Chrome driver file just downloaded. Modify it according to your own situation. It is recommended to use the absolute path. The result is shown below:

The program will automatically open the Chrome browser and open www.baidu.com.

Here is a more complicated example.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

if __name__ == "__main__":
    driver = webdriver.Chrome("path\to\your\chromedriver.exe")
    driver.get("https://www.python.org")
    assert "Python" in driver.title
    elem = driver.find_element_by_name("q")
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN)
    print(driver.page_source)

Open www.python.org official website and find the search box according to the name attribute as q, enter pycon and click search.

operation result:

Write the program, the browser automatically operates , is it very simple and cool?

This is the automated testing tool. After the program is written, the browser automatically executes your written operation.

find_element_by_* is a method of locating webpage elements, there are many ways:

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

You can find the element through the id attribute, name attribute, and class_name attribute of the tag, or through xpath.

Among them, the most practical one is xpath, because it is easy to use.

You don't need to think about how to write xpath, you can operate it, which is convenient and easy to use. For example, for example, I want to find the elements of the search box of baidu.com:

 

Right-click in the search box and select copy xpath under copy to directly copy xpath.

Paste it out and you will see the following:

//*[@id="kw"]

In fact, it means to start from the root directory and find the label whose id attribute is kw.

When you locate the search box, you can enter Jack Cui through Baidu to search for my related content.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

if __name__ == "__main__":
    driver = webdriver.Chrome("path\to\your\chromedriver.exe")
    driver.get("https://www.baidu.com")
    elem = driver.find_element_by_xpath('//*[@id="kw"]')
    elem.send_keys("Jack Cui")
    elem.send_keys(Keys.RETURN)

operation result:

You can see, run the program, search for Jack Cui, you can find my personal website, CSDN and Zhihu.

Selenium is so simple and easy.

If you want to learn more about Selenuim's other basic methods and basic knowledge of Xpath, you can read the article I wrote 3 years ago.

Article address: click to view

In detail, about Selenium's API documentation, you can see the official manual.

Official manual: click to view

Well, the basic knowledge is ready.

As long as you can use copy xpath, basic Selenium operations, you can start "simulating login" Taobao with me.

4. Log in to Taobao and empty your wallet

This campaign of emptying the shopping cart for love needs to be completed in two steps:

  • Simulate login Taobao
  • Shopping cart settlement

1. Simulate login to Taobao

Use Selenium to simulate login, just watch and write, and write the code according to human operation steps .

Open Taobao. The first step is to click the login button. If you can't write XPath, then copy the XPath of this label.

So the code for clicking login is:

browser.find_element_by_xpath('//*[@id="J_SiteNavLogin"]/div[1]/div[1]/a[1]').click()

Find the location of the login element and click().

After clicking login, enter the login page, locate the account box and password box, and enter the account and password.

Or simply copy and paste the XPath.

browser.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(username)
browser.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(password)

username and password are the account and password you want to enter.

After entering the password, a verification code sliding window may appear.

This kind of sliding window is also easy to solve, or copy the XPath matching element, and then use Selenium's ActionChains method to drag the slider.

Finally click the login button .

After logging in, read the user name again to see if the login is successful.

After the analysis is complete, go directly to the code.

from selenium import webdriver
import logging
import time
from selenium.common.exceptions import NoSuchElementException, WebDriverException
from retrying import retry
from selenium.webdriver import ActionChains

logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class taobao():
    def __init__(self):
        self.browser = webdriver.Chrome("path\to\your\chromedriver.exe")
        # 最大化窗口
        self.browser.maximize_window()
        self.browser.implicitly_wait(5)
        self.domain = 'http://www.taobao.com'
        self.action_chains = ActionChains(self.browser)

    def login(self, username, password):
        while True:
            self.browser.get(self.domain)
            time.sleep(1)
            
            #会xpath可以简化这几步
            #self.browser.find_element_by_class_name('h').click()
            #self.browser.find_element_by_id('fm-login-id').send_keys(username)
            #self.browser.find_element_by_id('fm-login-password').send_keys(password)
            self.browser.find_element_by_xpath('//*[@id="J_SiteNavLogin"]/div[1]/div[1]/a[1]').click()
            self.browser.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(username)
            self.browser.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(password)
            time.sleep(1)

            try:
                # 出现验证码,滑动验证
                slider = self.browser.find_element_by_xpath("//span[contains(@class, 'btn_slide')]")
                if slider.is_displayed():
                    # 拖拽滑块
                    self.action_chains.drag_and_drop_by_offset(slider, 258, 0).perform()
                    time.sleep(0.5)
                    # 释放滑块,相当于点击拖拽之后的释放鼠标
                    self.action_chains.release().perform()
            except (NoSuchElementException, WebDriverException):
                logger.info('未出现登录验证码')
            
            #会xpath可以简化点击登陆按钮
            #self.browser.find_element_by_class_name('password-login').click()
            self.browser.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
            
            nickname = self.get_nickname()
            if nickname:
                logger.info('登录成功,呢称为:' + nickname)
                break
            logger.debug('登录出错,5s后继续登录')
            time.sleep(5)

    def get_nickname(self):
        self.browser.get(self.domain)
        time.sleep(0.5)
        try:
            return self.browser.find_element_by_class_name('site-nav-user').text
        except NoSuchElementException:
            return ''


if __name__ == '__main__':
    # 填入自己的用户名,密码
    username = 'username'
    password = 'password'
    tb = taobao()
    tb.login(username, password)

The code adds some exception handling and printing of log information. It should be noted here that the slider is not displayed every time, so a judgment must be added.

Enter your account and password, specify the Chrome drive path, run the code, and see if you can log in as we want.

You can see that the account and password have been entered, and the verification code has passed.

However, I just can't log in! Why is this?

2. Taobao anti-Selenium login cracking

Very simple, Taobao has anti-reptiles, and it is specifically for Selenium.

This operation will never log in.

When encountering this kind of anti-climbing, don't panic and think slowly .

Usually, when encountering this kind of anti-reptile, the first reaction is: the verification code slider slips too fast.

It was detected.

I thought the same at the beginning, so I wrote a sliding method myself.

Even speed, acceleration, deceleration, or even trembling sliding, will not work!

Obviously, it has nothing to do with the verification code slider.

At this time, you have to learn to test and analyze its crawler strategy.

Step-by-step testing, you will find that the account password program is input, the slider program slides, and then the program is suspended. We manually click the mouse to log in , and the login is successful.

Amazing, right?

Why is this?

My guess is that it should be Taobao, which has click event listeners for Selenium's find_element_by_* methods.

As long as it is a click event completed using Selenium , Taobao will not let you log in.

I don't know how to implement it, but I know how to crack it.

It's very simple. Selenium's click method doesn't work, so change to a third-party library!

The most important thing about Python is a variety of third-party libraries.

pyautogui to find out.

pyautogui is powerful, can control the computer mouse, and has a function similar to "keystroke wizard".

Some methods of pyautogui are even more powerful than "keystroke wizard".

The installation method is also very simple, just use pip.

python -m pip install pyautogui

The usage is very simple, take the picture of the login button, like this:

Then pyautogui can find the coordinates of the button based on this picture, and then manipulate the computer's mouse to click.

coords = pyautogui.locateOnScreen('1.png')
x, y = pyautogui.center(coords)
pyautogui.leftClick(x, y)

Just ask if you are strong?

Modify the code directly and start!

from selenium import webdriver
import logging
import time
from selenium.common.exceptions import NoSuchElementException, WebDriverException
from retrying import retry
from selenium.webdriver import ActionChains

import pyautogui
pyautogui.PAUSE = 0.5 

logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class taobao():
    def __init__(self):
        self.browser = webdriver.Chrome("path\to\your\chromedriver.exe")
        # 最大化窗口
        self.browser.maximize_window()
        self.browser.implicitly_wait(5)
        self.domain = 'http://www.taobao.com'
        self.action_chains = ActionChains(self.browser)

    def login(self, username, password):
        while True:
            self.browser.get(self.domain)
            time.sleep(1)
            
            #会xpath可以简化这几步
            #self.browser.find_element_by_class_name('h').click()
            #self.browser.find_element_by_id('fm-login-id').send_keys(username)
            #self.browser.find_element_by_id('fm-login-password').send_keys(password)
            self.browser.find_element_by_xpath('//*[@id="J_SiteNavLogin"]/div[1]/div[1]/a[1]').click()
            self.browser.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(username)
            self.browser.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(password)
            time.sleep(1)

            try:
                # 出现验证码,滑动验证
                slider = self.browser.find_element_by_xpath("//span[contains(@class, 'btn_slide')]")
                if slider.is_displayed():
                    # 拖拽滑块
                    self.action_chains.drag_and_drop_by_offset(slider, 258, 0).perform()
                    time.sleep(0.5)
                    # 释放滑块,相当于点击拖拽之后的释放鼠标
                    self.action_chains.release().perform()
            except (NoSuchElementException, WebDriverException):
                logger.info('未出现登录验证码')
            
            # 会xpath可以简化点击登陆按钮,但都无法登录,需要使用 pyautogui 完成点击事件
            #self.browser.find_element_by_class_name('password-login').click()
            #self.browser.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
            # 图片地址
            coords = pyautogui.locateOnScreen('1.png')
            x, y = pyautogui.center(coords)
            pyautogui.leftClick(x, y)
            
            nickname = self.get_nickname()
            if nickname:
                logger.info('登录成功,呢称为:' + nickname)
                break
            logger.debug('登录出错,5s后继续登录')
            time.sleep(5)

    def get_nickname(self):
        self.browser.get(self.domain)
        time.sleep(0.5)
        try:
            return self.browser.find_element_by_class_name('site-nav-user').text
        except NoSuchElementException:
            return ''


if __name__ == '__main__':
    # 填入自己的用户名,密码
    username = 'username'
    password = 'password'
    tb = taobao()
    tb.login(username, password)

Taobao's anti-climbing for Selenium is solved in this way!

3. Empty the shopping cart

Already logged in, emptying the shopping cart is a piece of cake!

Follow the previous steps and analyze it yourself.

It's very simple here, I will post the code directly.

from selenium import webdriver
import logging
import time
from selenium.common.exceptions import NoSuchElementException, WebDriverException
from retrying import retry
from selenium.webdriver import ActionChains

import pyautogui
pyautogui.PAUSE = 0.5 

logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class taobao():
    def __init__(self):
        self.browser = webdriver.Chrome("path\to\your\chromedriver.exe")
        # 最大化窗口
        self.browser.maximize_window()
        self.browser.implicitly_wait(5)
        self.domain = 'http://www.taobao.com'
        self.action_chains = ActionChains(self.browser)

    def login(self, username, password):
        while True:
            self.browser.get(self.domain)
            time.sleep(1)
            
            #会xpath可以简化这几步
            #self.browser.find_element_by_class_name('h').click()
            #self.browser.find_element_by_id('fm-login-id').send_keys(username)
            #self.browser.find_element_by_id('fm-login-password').send_keys(password)
            self.browser.find_element_by_xpath('//*[@id="J_SiteNavLogin"]/div[1]/div[1]/a[1]').click()
            self.browser.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys(username)
            self.browser.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys(password)
            time.sleep(1)

            try:
                # 出现验证码,滑动验证
                slider = self.browser.find_element_by_xpath("//span[contains(@class, 'btn_slide')]")
                if slider.is_displayed():
                    # 拖拽滑块
                    self.action_chains.drag_and_drop_by_offset(slider, 258, 0).perform()
                    time.sleep(0.5)
                    # 释放滑块,相当于点击拖拽之后的释放鼠标
                    self.action_chains.release().perform()
            except (NoSuchElementException, WebDriverException):
                logger.info('未出现登录验证码')
            
            # 会xpath可以简化点击登陆按钮,但都无法登录,需要使用 pyautogui 完成点击事件
            #self.browser.find_element_by_class_name('password-login').click()
            #self.browser.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
            # 图片地址
            coords = pyautogui.locateOnScreen('1.png')
            x, y = pyautogui.center(coords)
            pyautogui.leftClick(x, y)
            
            nickname = self.get_nickname()
            if nickname:
                logger.info('登录成功,呢称为:' + nickname)
                break
            logger.debug('登录出错,5s后继续登录')
            time.sleep(5)

    def get_nickname(self):
        self.browser.get(self.domain)
        time.sleep(0.5)
        try:
            return self.browser.find_element_by_class_name('site-nav-user').text
        except NoSuchElementException:
            return ''
            
    def clear_cart(self):
        cart = self.browser.find_element_by_xpath('//*[@id="J_MiniCart"]')
        if cart.is_displayed():
            cart.click()
        select = self.browser.find_element_by_xpath('//*[@id="J_SelectAll1"]/div/label')
        if select.is_displayed():
            select.click()
        time.sleep(0.5)
        go = self.browser.find_element_by_xpath('//*[@id="J_Go"]')
        if go.is_displayed():
            go.click()
        submit = self.browser.find_element_by_xpath('//*[@id="submitOrderPC_1"]/div/a[2]')
        if submit.is_displayed():
            submit.click()


if __name__ == '__main__':
    # 填入自己的用户名,密码
    username = 'username'
    password = 'password'
    tb = taobao()
    tb.login(username, password)
    tb.clear_cart()

running result:

The rest is to pay.

 

Scan the QR code to pay. As long as you have money, you can even write the payment password and complete the payment directly without looking at the price.

Five, finally

  • Selenium is very convenient to use, but it will also encounter anti-reptiles, so you need to analyze it yourself according to the situation.
  • 618 Ready to start chopping hands! Love him/her, just empty the shopping cart for him!

Like it and then read it, develop a habit, search on WeChat official account【JackCui-AI】 Follow a stalker who is crawling on the Internet

 

Guess you like

Origin blog.csdn.net/c406495762/article/details/106757531