Reptile -selenium module

Problems: a module between the selenium and the associated crawler

  • Convenient access to the site dynamic loading of data
  • Convenient analog landing

What is selenium module

  • selenium is a browser-based automation module

selenium use process

  • Installation Environment: pip install selenium

  • Drivers download a browser

  • Instantiate a browser object bro = webdriver.Chrome(executable_path='./chromedriver.exe')

  • Write operation code browser-based automation

    • Initiate a request: get (url) bro.get('https://www.taobao.com/')
    • Label positioning: find series of methods search_input = bro.find_element_by_id('q')
    • Tags interaction: send_keys ( 'xxx') search_input.send_keys('Iphone')
    • Click the Search button:
    btn = bro.find_element_by_css_selector('.btn-search')
    btn.click()
    • Js program execution: excute_script ( 'jsCode') bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')
    • html_source = bro.page_source The property can obtain the current browser source code of the current page (html)
    • Forward, backward: back (), forward () bro.back()
    • Close the browser: quit () bro.back()

selenium treatment iframe

If present in the tag located iframe tags, you must use the switch_to.frame(id)
pilot package: from selenium.webdriver import ActionChains

  • Examples of a motion chain objects: action = ActionChains (bro)
  • click_and_hold (div): Press and click operation
  • move_by_offset(x,y)
  • perform () allows immediate action Chain Execution
  • action.release () releasing action target strand

Example: iframes operation chain +

from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains     #导入动作链对应的类

bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')

#如果定位的标签是存在于iframe标签之中的则必须通过如下操作在进行标签定位
bro.switch_to.frame('iframeResult')#切换浏览器标签定位的作用域
div = bro.find_element_by_id('draggable')

#动作链
action = ActionChains(bro)
#点击长按指定的标签
action.click_and_hold(div)

for i in range(5):
    #perform()立即执行动作链操作
    #move_by_offset(x,y):x水平方向 y竖直方向
    action.move_by_offset(17,0).perform()
    sleep(0.5)

#释放动作链
action.release()
bro.quit()

Headless browser (no visual interface)

Basic use 1: No visual interfaces

# 老版本的方式
from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
#实现无可视化界面的操作
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
bro = webdriver.Chrome(executable_path='chromedriver.exe',chrome_options=chrome_options)


# 新版本,弃用chrome_options 参数
from selenium import webdriver
from time import sleep
from selenium.webdriver import ChromeOptions

option = ChromeOptions()
#实现无可视化界面的操作
option.add_argument('--headless')
option.add_argument('--disable-gpu')
bro = webdriver.Chrome(executable_path='chromedriver.exe', options=option)

Basic Usage 2: anti-anti-climb policy

from selenium import webdriver
from time import sleep
from selenium.webdriver import ChromeOptions

option = ChromeOptions()
#实现规避检测
option.add_experimental_option('excludeSwitches', ['enable-automation'])
bro = webdriver.Chrome(executable_path='chromedriver.exe', options=option)

Super Eagles basic use

Line coding software Super Eagles

# 下述代码为超级鹰提供的示例代码
import requests
from hashlib import md5

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode('utf8')
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
        return r.json()


chaojiying = Chaojiying_Client('bobo328410948', 'bobo328410948', '899370')
im = open('12306.jpg', 'rb').read()
print(chaojiying.PostPic(im, 9004)['pic_str'])

12306 simulated landing

Use the Super Eagles verification code 12306 Online: crack

Super Eagles: http://www.chaojiying.com/about.html

Super Eagles use the process

  • Register: ordinary users
  • Login: Ordinary users
  • Sub-title query: recharge
  • Creating a software (id)
  • Download the sample code

12306 Log coding process simulation

  • The use of selenium open the login page
  • This page is currently open to the selenium screenshot
  • The current picture of the local area (a CAPTCHA) crop
    • Benefits: simulation and verification picture Sign-one correspondence.
  • Use the Super Eagles identification code images (coordinates)
  • Tap operation achieved using chain action according to the coordinates
  • Input user name and password, click the login button Log achieve

Guess you like

Origin www.cnblogs.com/liuxu2019/p/12112698.html