code
Not much to say, go directly to the code
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
def getSource(url):
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400',
'referer':'http://www.taobao.com'
}
#使用copy()防止修改原代码定义dict
cap = DesiredCapabilities.PHANTOMJS.copy()
for key, value in headers.items():
cap['phantomjs.page.customHeaders.{}'.format(key)] = value
# 不载入图片,爬页面速度会快很多
cap["phantomjs.page.settings.loadImages"] = False
driver = webdriver.PhantomJS(desired_capabilities=cap)
driver.get(encodeUrl(url))
Some blog posts mentioned that this method is used to set User-Agent, and it seems to be possible:
cap["phantomjs.page.settings.userAgent"] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36'
an examination
Visit the following URL, you can echo the data you requested to check whether the set header is valid:
https://httpbin.org/get?show_env=1
For example, I use the above code to access this address, followed by two Experimental parameters:
https://httpbin.org/get?show_env=1&q=nihao&bbb=c
web page returns: