Supplement Three of Knowledge Points (selenium)

Preface

The importance of selenium is self-evident. In the face of dynamic webpages, ordinary crawlers are not easy to use, and the interface can be found by luck. Only selenium has the strongest practicality.

The key knowledge points of selenium

Obtaining and using cookies to achieve automatic login

A simple example, scan the code to log in to Douban and save the cookie

browser.get('https://accounts.douban.com/passport/login?source=movie')
time.sleep(15)
# # 获取cookie并通过json模块将dict转化成str
# dictCookies = browser.get_cookies()
# jsonCookies = json.dumps(dictCookies)
# # 登录完成后,将cookie保存到本地文件
# with open('cookies.json', 'w') as f:
#     f.write(jsonCookies)
#  初次建立连接,随后方可修改cookie
browser.get('https://accounts.douban.com/passport/login?source=movie')
# 删除第一次建立连接时的cookie
browser.delete_all_cookies()
# 读取登录时存储到本地的cookie
with open('cookies.json', 'r', encoding='utf-8') as f:
    listCookies = json.loads(f.read())
for cookie in listCookies:
    browser.add_cookie({
        'domain': '.accounts.douban.com',  # 此处xxx.com前,需要带点
        'name': cookie['name'],
        'value': cookie['value'],
        'path': '/',
        'expires': None
    })
# 再次访问页面,便可实现免登陆访问
browser.get('https://movie.douban.com/')

Many web pages require selenium to carry cookies in order to access them, and the timeliness of cookies on different web pages are also different. This example is the most common and practical.
Extension and extension
Use automatic login to realize the initial development of software such as automatic ticket grabbing.
About requests and cookies

Selenium element positioning

After practice, the
absolute path
is the fastest and most convenient. Right-click copy fullxpath in the check, and it can generally be located. When it cannot be located, it may be nested web pages or other difficult points.
Use
element attributes
more positioning methods,
less practice, less reliable experience

scroll bar

example

time.sleep(2)
driver.execute_script('window.scrollTo(0,400)')

More ways to manipulate the scroll bar

Disguise the phone

*To deal with Baidu Library
*

url = "https://wenku.baidu.com/view/aa31a84bcf84b9d528ea7a2c.html"
options = webdriver.ChromeOptions()
# 伪装成iPhone
options.add_argument('--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)

Disguise the mobile terminal to proceed to the next crawler. Otherwise, entering the target URL will first jump to the homepage of the library, and then it will be more difficult to operate the positioning

Commonly reported errors

NoSuchElementException: Message: Unable to locate element:

Positioning fails or the page is not fully loaded

selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted:

The target element is unexpectedly displayed on the page, just operate the scroll bar

Summary of typical examples

  1. Douban
  2. Baidu Library
  3. Baidu Pictures
  4. http://www.impawards.com/ (unsuccessful)
  5. China Railway Network
  6. Know almost

Guess you like

Origin blog.csdn.net/qq_51598376/article/details/113788403