request_html module (lower)
render method:
1, Chromium manually installed, then inside the specified program executablePath
# 于requests-html源代码在714行中加入
executablePath=’path/to/the/chromium‘
2、
from requests_html import HTMLSession
url = 'https://httpbin.org/get'
session = HTMLSession()
res = session.get(url = url)
res.html.render()
print(res.html.html)
3, the input navigator.userAgent can see the browser's request header to copy to him after --user-agent,
Note that no spaces. --nosand is the highest authority
url = 'https://httpbin.org/get'
session = HTMLSession(
browser.args = [
'--no-sand',
'--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"'
])
res = session.get(url = url)
res.html.render()
print(res.html.html)
Startup parameters:
kwargs = {
'headless': False,
'devtools': False, // 打开开发者工具
'ignoreDefaultArgs': // 忽略默认配置
'userDataDir' :'./userdata', //设置用户目录,保存cookie
'args': [
'--disable-extensions',
'--window-size={width},{height}',
'--hide-scrollbars',
'--disable-bundled-ppapi-flash',
'--mute-audio', //页面静音
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--enable-automation',
],
'dumpio': True,
}
render parameters:
- retries retries, the default is 8,
- script, JS script, an optional parameter defaults to None,
str
type, if there is value, returning JS script execution return value - wait wait for the page to load in seconds before, to prevent a timeout, default 0.2 seconds, optional parameters, float
- scrolldown, scroll the page number, integer, defaults to 0,
- sleep, pause the number of seconds after the initial rendering, receiving integer, optional type, default is 0
- reload default
True
, If False, if it isFalse
, it will load content from memory - keep_page, by default
False
, if it isTrue
, you canr.html.page
interact with the page
js injection Example 1:
script = """
() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}
"""
from requests_html import HTMLSession
url = 'https://httpbin.org/get'
session = HTMLSession(
browser_args=[
'--no-sand',
'--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"'
]
)
res = session.get(url = url)
r = res.html.render(script=script)
print(r)
js injection of Example 2 Change navigator.webdriver:
'''() =>{
Object.defineProperties(navigator,{
webdriver:{
get: () => undefined
}
})
}'''
Pay attention to modify the source code:
Interact with the browser
page.screenshot([options])
- options `<object>` 可选配置
- path `<string>` 截图保存路径。截图图片类型将从文件扩展名推断出来。如果是相对路径,则从当前路径解析。如果没有指定路径,图片将不会保存到硬盘。
- type `<string>` 指定截图类型, 可以是 jpeg 或者 png。默认 'png'.
- quality `<number>` 图片质量, 可选值 0-100. png 类型不适用。
- fullPage <boolean> 如果设置为true,则对完整的页面(需要滚动的部分也包含在内)。默认是false
- clip `<object>` 指定裁剪区域。需要配置:
- x `<number>` 裁剪区域相对于左上角(0, 0)的x坐标
- y `<number>` 裁剪区域相对于左上角(0, 0)的y坐标
- width `<number>` 裁剪的宽度
- height `<number>` 裁剪的高度
- omitBackground <boolean> 隐藏默认的白色背景,背景透明。默认不透明
- encoding `<string>` 图像的编码可以是 base64 或 binary 默认为二进制
Screenshot examples:
import asyncio
from requests_html import HTMLSession
url = 'https://httpbin.org/get'
session = HTMLSession(
browser_args=[
'--no-sand',
'--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"'
]
)
res = session.get(url = url)
script = """
() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}
"""
try:
res.html.render(script=script,sleep = 1,keep_page = True)
async def main():
await res.html.page.screenshot({'path':'1.png'}) # 传入参数用字典path 代表路径 值为你要存放的路径
asyncio.get_event_loop().run_until_complete(main())
finally:
session.close()
page.evaluate(pageFunction[, ...args])
- pagFunction (function / string): To pages performed in the above example
js1 = '''() =>{
Object.defineProperties(navigator,{
webdriver:{
get: () => undefined
}
})
}'''
js4 = '''() =>{Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
}'''
await page.evaluate(js1) ## 更改webdriver
await page.evaluate(js4) ##更改语言
page.setViewport()
Set the page size:
page.setViewport({'width':1336,'height':768})
page.cookie()
If you do not specify any url, this method returns the cookie domain name of the current page. If the url is specified, only to return the specified cookie
page.type(selector, text[, options])
selector `<string>` 要输入内容的元素选择器。如果有多个匹配的元素,输入到第一个匹配的元素。
- text `<string>` 要输入的内容
- options `<object>`
- delay `<number>` 每个字符的延时。单位是毫秒,默认是0
page.click(selector[, options])
selector <string> 要给焦点的元素的选择器selector。如果有多个匹配的元素,焦点给第一个元素
page.hover(selector)
selector<string>: 要hover的元素的选择器。如果匹配多个,hover第一个
page.waitFor(selectorOrFunctionOrTimeout[, options[, ...args]])
- selectorOrFunctionOrTimeout <string|number|function> 选择器, 方法 或者 超时时间
- options `<object>` 可选的等待参数
...args <...Serializable|JSHandle> 传给 pageFunction 的参数
如果 selectorOrFunctionOrTimeout 是 string, 那么认为是 css 选择器或者一个xpath, 根据是不是'//'开头, 这时候此方法是 page.waitForSelector 或 page.waitForXPath的简写
如果 selectorOrFunctionOrTimeout 是 function, 那么认为是一个predicate,这时候此方法是page.waitForFunction()的简写
如果 selectorOrFunctionOrTimeout 是 number, 那么认为是超时时间,单位是毫秒,返回的是Promise对象,在指定时间后resolve
否则会报错
Keyboard events:
Keyboard Events
For more keyboard keys grammar
grammar:
res.html.page.keyboard.XXX
keyboard.down(key[, options])
- key
<string>
press the key names, such as ArrowLeft. contains a list of all the key names, see USKeyboardLayout.- - Options
<object>
- text<string>
, if specified, the text input event is generated.
keyboard.up(key)
- key
<string>
to release the key name key, for example ArrowLeft
keyboard.press(key[, options])
- key
<string>
press the key names, such as ArrowLeft. - Options
<object>
- text<string>
, if specified, the text input event is generated. - delay<number>
time and keyup keydown interval, in milliseconds default to 0.
keyboard.type(text, options)
text
<string>
to be input to the focus of the text elements.Options
<object>
- Delay<number>
. time interval key, in milliseconds default to 0.page.keyboardtype('喜欢你啊',{‘delay’:100})
Mouse Events
r.html.page.mouse.XXX
mouse.click(x, y, [options])
- x
<number>
- Y
<number>
- options
<object>
- the Button
<string>
left, right or middle, the default is left. - clickCount
<number>
default is 1. See UIEvent.detail. - delay
<number>
in the millisecond and between mousedown mouseup and waiting time. The default is 0.
mouse.down([options])
- options
<object>
- the Button
<string>
left, right or middle, the default is left. - clickCount
<number>
default is 1.
mouse.up([options])
- options
<object>
- the Button
<string>
left, right, or middle, the default is left. - clickCount
<number>
default is 1.