Pyppeteer library four: Pyppeteer page operation (below)

Execute custom JS script

The Pyppeteer Page object provides a series of evaluate methods, through which you can execute some custom JS code, mainly providing the following three APIs:

(1) page.evaluate(pageFunction [,...args]), returns the result of pageFunction execution, pageFunction represents the function or expression to be executed on the page, args represents the parameters passed in to pageFunction

Example:

await page.goto('https://www.baidu.com')
# 输出字符串
await page.evaluate('alert("在浏览器执行js脚本!")')
# 将元素作为参数传入 page.evaluate
element = await page.J('#ul>a[name="tj_trtieba"]')
print(await page.evaluate('el => el.innerHTML', element))
print(await page.evaluate('el => el.href', element)
# 执行函数
el = await page.evaluate('() => document.querySelector("#su").value')
print(el)

(2) page.evaluateHandle(pageFunction[,...args]), the only difference between this method and page.evaluate is that this method returns the page type (JSHandle)

Example:

await page.goto('https://www.baidu.com')
el = await page.evaluateHandle('() => document.querySelector("#su").value')
print(type(el))
print(el.toString())

(3) page.evaluateOnNewDocument(pageFunction[, ...args]), the specified function is called before the page to which it belongs is created and any script of the page to which it belongs is executed. Often used to modify the page JS environment.

The following is the insertion of the intermediate JS, and the result of modifying the JS that Taobao will call to detect the browser:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch({
        'headless': False,
        'args': ['--no-sandbox', '--window-size=1366,768']
    })
    page = await browser.newPage()
    await page.setViewPort({'width': 1366, 'height': 768})
    await page.evaluateOnNewDocument('''() => {
        Object.defineProperty(navigator, 'webdriver', {get: () => false });
    }''')
    await page.goto('https://login.taobao.com')
    await page.evaluate('alert(navigator.webdriver)')
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

元素操作

ElementHandle represents the DOM element in the page, you can create it through the page.querySelector() method. DOM elements have some of the same methods as page: J(), JJ(), Jeval(), JJeval(), screenshot(), type(), click(), tap(). In addition, there are some useful methods:

(1) Get the coordinates of the element bounding box: boundingBox(), return the bounding box of the element (relative to the main frame) => x coordinate, y coordinate, width, height

(2) Is the element visible: isIntersectingViewport()

(3) Upload file: uploadFile(*filepaths)

(4) The ElementHandle class is converted to the Frame class: contentFrame(), if the handle does not reference the iframe, it returns None.

(5) Focus the element: focus()

(6) Related to the mouse: hover(), hover the mouse over the element

(7) Related to the keyboard: press(key[, options]), key, key represents the name of the key, options can be configured:

text(string)-If specified, use this text to generate input events
delay(number)-the waiting time between keydown and keyup, default is 0

鼠标事件

The Mouse class operates in CSS pixels of the main frame relative to the upper left corner of the viewport.

(1) page.mouse.down([options]) Press the mouse, options can be configured:

button(str) Which key was pressed, the optional value is [left, right, middle], the default is left, which means the left mouse button
clickCount(int) The number of presses, clicks, double clicks or other times

(2) page.mouse.up([options]) release the mouse, the options are the same as above

(3) page.mouse.move(x, y, [options]) Move the mouse to the specified position, options.steps represents the step length of the movement

(4) page.mouse.click(x, y, [options]) mouse click on the specified position, it is actually a shortcut operation of mouse.move and mouse.down or mouse.up

模拟登录的验证码处理

可能用到的方法:

ElementHandle.boundingBox()、ElementHandle.hover()
mouse.down()、mouse.move()、mouse.up()、mouse.click()

Example 1: Drag the slider for Taobao verification code

(1) Taobao's verification code verification module will detect the browser environment and inject JS;

(2) Simulate user operations as much as possible, random numbers slow down the execution speed of Pyppeteer;

Example:

import asyncio
import random
from pyppeteer import launch

async def main():
    browser = await launch({
        'headless': False,
        'args': ['--no-sandbox', '--window-size=1366,768']
    })
    page = await browser.newPage()
    await page.setViewport({'width': 1366, 'height': 768})
    await page.evaluateOnNewDocument('''() => {
        Object.defineProperties(navigator, { webdriver:{ get: () => false}}
    }''')
    await page.evaluateOnNewDocument('''() => {
        window.navigator.chrome = { runtime: {}, };
    }''')
    await page.evaluateOnNewDocument('''() => {
        Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
    }''')
    await page.evaluateOnNewDocument('''() => {
        Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5,6], }); 
    }''')
    await page.goto('https://login.taobao.com')
    await asyncio.sleep(2)
    try:
        await page.click('div.login-links > a.forget-pwd.J_Quick2Static')
    except:
        pass
    await asyncio.sleep(2)
    await page.type('#TPL_username_1', '123123123', {'delay': random.randint(60, 121)})
    await page.type('#TPL_password_1', '1234567890', {'delay': random.randint(100, 151)})
    await asyncio.sleep(1.5)
    try:
        el = await page.querySelector('#nc_1_n1z')
        box = await el.boundingBox()
        await page.hover('#nc_1_n1z')
        await page.mouse.down()
        await page.mouse.move(box['x'] + random.randint(333, 999), box['y'], {'steps': 5})
        await page.mouse.up()
    except:
        pass
    await asyncio.sleep(1.8)
    await page.click('#J_SubmitStatic')
    await asyncio.sleep(5)
    await browser.close()


asyncio.get_event_loop().run_until_complete(main())

Example 2: Railway 12306 touch verification code

(1) Analyze the verification code of 12306; this thing looks like this:

The position of the mouse click can be the center point of each picture:

This value can be calculated:

width: 37, 37 * 3, 37 * 5, 37 * 7; 即37, 111, 185, 259
height(0): 70
height(1): 70 + (190-30)/2, which is 150

When the coordinates of the verification code picture are x, y; the position of the second and seventh pictures can be expressed as (x+111, y+70), (x+185, y+150)

Example:

import asyncio
import random
from pyppeteer import launch

async def main():
    browser = await launch({
        'headless': False,
        'args': [f'--window-size=1366,768', '--no-sandbox']
    })
    page = await browser.newPage()
    await page.goto('https://kyfw.12306.cn/otn/login/init',
                    {'waitUntil': 'networkidle0'})
    await page.setViewport({'width': 1366, 'height': 768})
    # 等待验证码加载
    code = await page.waitForFunction(
        '''() => document.querySelector("img.touclick-image")''')
    # 验证码截图
    await code.screenshot({'path': 'code.png'})
    # 获取验证码坐标
    box = await code.boundingBox()
    await page.waitFor(2 * 1000)
    # 点击第2张图片
    await page.mouse.click(box['x']+111, box['y']+70)
    await page.waitFor(random.randint(567, 3456))
    # 点击第7张图片
    await page.mouse.click(box['x']+185, box['y'] + 150)
    await page.waitFor(3 * 1000)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

(2) Coding platform: 12306

The verification code recognition is a bit anti-human; docking with the coding platform is a better choice; the principle is to send them the verification code picture in bytes and return a string, for example 183,68|193,161:;

Super Eagle coding platform API:

chaojiying.py

#!/usr/bin/env python
# coding:utf-8

import requests
from hashlib import md5

class CodeInfo(object):

    def __init__(self):
        self.username = '用户名'
        self.password = md5('密码'.encode('utf8')).hexdigest()
        self.soft_id = '96001' # 用户中心 >> 软件ID，生成一个替换96001
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def process(self, im, codetype):
        url = 'http://upload.chaojiying.net/Upload/Processing.php'
        params = {'codetype': codetype}
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post(url, data=params, files=files, headers=self.headers)
        return r.json()

    def report(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        url = 'http://upload.chaojiying.net/Upload/ReportError.php'
        params = {'id': im_id}
        params.update(self.base_params)
        r = requests.post(url, data=params, headers=self.headers)
        return r.json()

if __name__ == '__main__':
    im = open('code.png', 'rb').read()
    """
    9004 验证码类型
    参考 http://www.chaojiying.com/price.html
    """
    answer = CodeInfo().process(im, 9004)
    print(answer)

Example of login 12306

import asyncio
import random
from pyppeteer import launch
from chaojiying import CodeInfo

def pic_info():
    im = open('code.png', 'rb').read()
    answer = CodeInfo().process(im, 9004)
    print(answer)
    return answer['pic_str']

async def main():
    browser = await launch({
        'headless': False,
        'args': ['--window-size=1366,768', '--no-sandbox']
    })
    page = await browser.newPage()
    await page.goto('https://kyfw.12306.cn/otn/login/init',
                    {'waitUntil': 'networkidle0'})
    await page.setViewport({'width': 1366, 'height': 768})
    code =await page.waitForFunction(
        '''() => document.querySelector("img.touclick-image")''')
    await code.screenshot({'path': 'code.png'})
    await page.waitFor(2 * 1000)
    await page.type('#username', '[email protected]',
                    {'delay': random.randint(60, 121)})
    await page.waitFor(random.randint(345, 1234))
    await page.type('#password', '1234567890',
                    {'delay': random.randint(100, 151)})
    pic_str = pic_info()
    points = list(set(pic_str.split('|')))
    box = await code.boundingBox()
    for point in points:
        p = point.split(',')
        await page.mouse.click(box['x']+int(p[0]), box['y']+int(p[1]))
        await page.waitFor(random.randint(567, 3456))
    await page.click('#loginSub')
    await page.waitFor(5 * 1000)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

键盘事件

Keyboard provides an interface to manage virtual keyboards. The advanced interface is keyboard.type, which receives original characters and then generates corresponding keydown, keypress/input, and keyup events on your page.

For finer control (virtual keyboard), you can use keyboard.down, keyboard.up and keyboard.sendCharacter to trigger events manually, as if these events were generated by a real keyboard.

Several APIs of the keyboard are as follows:

keyboard.down(key[, options]) triggers the keydown event
keyboard.press(key[, options]) Press a key, key represents the name of the key, such as'ArrowLeft' to the left;
keyboard.sendCharacter(char) enter a character
keyboard.type(text, options) enter a string
keyboard.up(key) triggers the keyup event

Keep pressing shift to select some strings and delete examples:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch({'headless': False})
    page = await browser.newPage()
    await page.goto('https://www.baidu.com', {'waitUntil': 'networkidle0'})
    el = await page.J('#kw')
    await el.focus()
    await page.keyboard.type('Hello, World!')
    await page.keyboard.press('ArrowLeft')
    await page.keyboard.down('Shift')
    for _ in ' World':
        await page.keyboard.press('ArrowLeft')
    await page.keyboard.press('ArrowLeft')
    await page.keyboard.up('Shift')
    await page.keyboard.press('Backspcae')
    # 结果字符串最终为'Hello!'
    await asyncio.sleep(5)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Pressed Aexample:

await page.keyboard.down('Shift')
await page.keyboard.press('KeyA')
await page.keyboard.up('Shift')

The detailed health name mapping can be seen in the source code:

Lib\site-packages\pyppeteer\us_keyboard_layout.py

内嵌框架

It can be obtained through the Page.frames, ElementHandle.contentFrame methods, and has multiple methods with page at the same time;

**other:

childFrames get child frame, return list
parentFrame returns the parent frame
content() returns the html content of the frame
url get url
name get name
title() Get title

example:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch({'headless': False})
    page = await browser.newPage()
    await page.goto('http://www.4399.com', {'waitUntil': 'networkidle0'})
    await page.click('#login_tologin')
    await asyncio.sleep(1)
    frame = page.frames[1]
    await frame.type('#username', '123456789')
    await frame.type('#j-password', '998765433')
    await asyncio.sleep(5)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

or:

await page.click('#login_tologin')
await asyncio.sleep(1)
element = await page.J('iframe')
frame = await element.contentFrame()

Pyppeteer library four: Pyppeteer page operation (below)

Execute custom JS script

Guess you like