Pyppeteer Chinese documentation

introduce

Pyppeteer is an unofficial Python port of the Puppeteer Javascript (headless) chrome/chromium browser automation library. Puppeteer is used in Node.js, and Pyppeteer is dedicated to the Python language.
This document corresponds to the v0.0.25 version of Pyppeteer. Judging from the current situation, Pyppeteer has not been updated for a long time, but it is still no problem to use crawlers and automated tests that are not very demanding.
The current article will introduce some precautions such as installation and use. In subsequent articles, the functions of each API class will be introduced one by one. Pyppeteer currently supports Python 3.5, 3.6, and 3.7, but it is not recommended to use the 3.5 version. The best usage environment is 3.6+.

Insert image description here

Install

When Python3.6+ has been installed, take windows as an example, Win+R -> cmd -> Enter to open the cmd window.
Enter: pip install pyppeteerand wait for the installation to complete.
If you need the latest version (development version) of pyppeteer, you can install it from the github address through the pip command.
Enter: pip install -U git+https://github.com/miyakogi/pyppeteer.git@devand wait for the installation to complete.

use

When using Pyppeteer for the first time, it will automatically download the latest version of Chromium (~170MB Mac, ~282MB Linux, ~280MB Win). If you don't want it to download automatically, then before running any Pyppeteer script, manually run the command pyppeteer-installi.e. Downloadable (it seems that this is useless, under normal circumstances we will use automatic downloading).

Example: Open the page and take a screenshot

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Example: Execute script in page

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': 'example.png'})

    dimensions = await page.evaluate('''() => {
        return {
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
            deviceScaleFactor: window.devicePixelRatio,
        }
    }''')

    print(dimensions)
    #>>> {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Pyppeteer has almost the same API as Puppeteer. For more API details, please refer to subsequent articles.

Differences between Pyppeteer and Puppeteer

Pyppeteer is similar to Puppeteer. Due to some differences in syntax, features, and main development areas between Python and Javascript, there are slight differences in usage and operating efficiency, but this does not affect the results.

Keyword arguments for options

Puppeteer uses objects (Python uses dictionaries) to form options to pass to methods or functions. Pyppeteer accepts dictionary types and keyword parameter types as options.

Dictionary style options (similar to Puppeteer)

browser = await launch({'headless': True})

Keyword parameter style parameters

browser = await launch(headless=True)

Element selector method name ($->querySelector)

In Python, it cannot be used for method names, so use Page . query Selector ( ) , Page . query Selector A ll ( ) , Page . xpath ( ) instead of Page . It cannot be used for method names, so use Page.querySelector(), Page.querySelectorAll(), Page.xpath() instead of Page.Cannot be used for method names, so use P a g e . q u ery S e l ec t or ( ) , P a g e . q u ery S e l ec t or A ll ( ) , P a g e . x p a t h ( ) replaces P a g e . (), Page.$( ), P a g e . (), Page.( ) , P a g e . x(). Pyppeteer also has these method abbreviations Page.J(), Page.JJ(), Page.Jx().

Parameters of Page.evaluate() and Page.querySelectorEval()

Puppeteer's evaluate() uses native Javascript functions or Javascript expression strings, and Pyppeteer uses Javascript strings. Javascript strings can be functions or expressions. Pyppeteer automatically tries to check if a string is a function or expression, but sometimes fails. If expression characters are treated as functions and an error should be raised, you can add the force_expr=True parameter, which will force Pyppeteer to treat strings as expressions.

Example: Get page content

content = await page.evaluate('document.body.textContent', force_expr=True)

Example: Get the inner text of an element

element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)

Guess you like

Origin blog.csdn.net/chy555chy/article/details/133172305