クローラー学習 - Selenium モジュール

- クローラーとの関連付け

Web サイトから動的にロードされたデータへのポータブルなアクセス

シミュレートされたログインのポータブルな実装

ブラウザ自動化に基づくモジュール（キーウィザードスクリプト）

- マニュアル

環境のインストール pip install Selenium

ブラウザドライバーのダウンロード: Google Chrome ドライバーのインストール - Linda のブログ - Blog Park (cnblogs.com)

SeleniumインポートWebドライバーから

ブラウザオブジェクトをインスタンス化する

ブラウザ自動化に基づいてオペレーションコードを作成する

- 一部の自動化された操作

リクエストを開始します: get(url)

タグの配置: find series メソッド

タグインタラクション: send_keys('xxx')

クリック: クリック()

jsプログラムを実行：execute_script('jsCode')

前方、後方: back()、forward()

ブラウザを閉じます: quit()

from selenium import webdriver
from lxml import etree
import time

# 实例化一个游览器对象
bro = webdriver.Chrome(executable_path='chromedriver.exe')
# 让游览器发起一个指定url对应请求
bro.get('https://i.qq.com/')
# 切换作用域
bro.switch_to.frame('login_frame')
a_tag=bro.find_element_by_id('switcher_plogin')
a_tag.click()
userName_tag=bro.find_element_by_id('u')
passWord_tag=bro.find_element_by_id('p')
time.sleep(3)
userName_tag.send_keys('2371964121')
time.sleep(3)
passWord_tag.send_keys('xxxxxx')
time.sleep(3)
btn=bro.find_element_by_id('login_button')
btn.click()
time.sleep(3)
bro.quit()

- iframe をハンドルする

位置指定タグが iframe タグ内に存在する場合は、switch_to.frame(id) を使用する必要があります。

アクションチェーン

selenium.webdriver からアクションチェーンをインポート

action=ActionChains(bro): アクションチェーンをインスタンス化します。

action.click_and_hold (指定ラベル): 指定ラベルをクリックして長押しします。

action.move_by_offset(x, y).perform(): 特定のピクセルによるオフセット、x は水平方向、y は垂直方向

perfrom(): アクションチェーン操作を即時に実行します

action.release(): アクションチェーンを解放します。

ビジュアルインターフェイスなし (ヘッドレスブラウザ)

from selenium import webdriver
# 无可视化界面操作
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('headless')
chrome_options.add_argument('disable-gpu')
# 实现让selenium规避被检测到的风险
from selenium.webdriver import ChromeOptions
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])

bro = webdriver.Chrome(executable_path='chromedriver.exe', chrome_options=chrome_options, options=option)
bro.get('https://www.baidu.com/')
print(bro.page_source)

場合

12306 ログイン

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains  # 动作链
# 实现规避检测
from selenium.webdriver import ChromeOptions

import time


def login():
    driver.find_element(By.ID, 'J-userName').send_keys('zh')
    driver.find_element(By.ID, 'J-password').send_keys('mm')

    driver.find_element(By.ID, 'J-login').click()
    time.sleep(2)
    # 滑动模块
    clock = driver.find_element(By.CLASS_NAME, 'nc_iconfont')

    action = ActionChains(driver)
    # 点击长按滑动模块
    action.click_and_hold(clock).perform()
    for i in range(5):
        action.move_by_offset(60, 0)
        time.sleep(0.1)
    action.release().perform()


if __name__ == '__main__':
    url = 'https://kyfw.12306.cn/otn/resources/login.html'
    options = ChromeOptions()
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option('excludeSwitches', ['enable-automation'])
    driver = webdriver.Chrome(executable_path='./chromedriver.exe', options=options)
    # 设置浏览器,防止selenium被检测出来
    driver.get(url)
    login()

クローラー学習 - Selenium モジュール

クローラーとの関連付け

マニュアル

一部の自動化された操作

iframe をハンドルする

おすすめ