Python实现Splash爬取网页

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_38038143/article/details/82379051

先开启splash:

sudo docker run -p 8050:8050 scrapinghub/splash

.py代码:

import requests
from urllib.parse import quote
from requests import ConnectionError
lua = '''
function main(splash)
    splash:go("https://www.baidu.com")
    input = splash:select("#kw")
    input:send_text("Python")
    submit = splash:select("#su")
    submit:mouse_click()
    splash:wait(3)
    return splash:jpeg()
end
'''
#将lua脚本转换为url格式并与url地址拼接
url = "http://localhost:8050/execute?lua_source=" + quote(lua)
try:
    #请求url
    response = requests.get(url)
    print(response.status_code)
    #将返回的信息写入文件
    with open('baidu.jpg', 'wb') as f:
        f.write(response.content)
except ConnectionError as e:
    print(e)

其中: lua为lua语言编写的脚本, url中execute为splash中的API.

结果:

这里写图片描述

猜你喜欢

转载自blog.csdn.net/qq_38038143/article/details/82379051