The Five Realms of Extracting Google Chrome Cookies

Children's shoes who often play with crawlers know the importance of cookies. So far, most websites still use cookies to identify the login status, and only a few websites have upgraded to use jwt to record the login status.

The function of extracting cookies is self-evident, so what are the high-end operations for extracting cookies? Please watch:

Pure manual extraction of Google Chrome cookies

This should be the solution for any children's shoes who have played with crawlers, and it may be the most difficult solution for children's shoes that do not know crawlers at all.

The method is to open the developer tools with F12 first, then visit the website where you want to extract the cookie, and then select the request you just visited in the network. Specific steps are as follows:

image-20220123163535458

Then find the cookie item in the request header and copy the corresponding value:

image-20220123163853695

This should be the operation of any children's shoes who have learned crawling, so is there any way we can use code to increase the effect of automation?

My solution is to use the code to extract directly after copying the curl request. The operation is as follows:

image-20220123164148589

After copying the request to the clipboard in the form of a curl command, we can directly extract the cookie through the code. The code is as follows:

import re
import pyperclip


def extractCookieByCurlCmd(curl_cmd):
    cookie_obj = re.search("-H \$?'cookie: ([^']+)'", curl_cmd, re.I)
    if cookie_obj:
        return cookie_obj.group(1)


cookie = extractCookieByCurlCmd(pyperclip.paste())
print(cookie)

After comparison, it is found that the printed cookie is exactly the same as the directly copied cookie.

selenium manually login and get cookies

Take saving the login cookie of station B as an example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time
import json

browser = webdriver.Chrome()
browser.get("https://passport.bilibili.com/login")
flag = True
print("等待登录...")
while flag:
    try:
        browser.find_element(By.XPATH,
                             "//div[@class='user-con signin']|//ul[@class='right-entry']"
                             "//a[@class='header-entry-avatar']")
        flag = False
    except NoSuchElementException as e:
        time.sleep(3)
print("已登录,现在为您保存cookie...")
with open('cookie.txt', 'w', encoding='u8') as f:
    json.dump(browser.get_cookies(), f)
browser.close()
print("cookie保存完成,游览器已自动退出...")

Execute the above code, selenium will control the Google browser to open the login page of the B station, wait for the user to manually log in to the B station, and the cookie will be automatically saved after login.

So how to use the cookie directly when the subsequent selenium program starts without logging in again? The demo is as follows:

import json

from selenium import webdriver

browser = webdriver.Chrome()
with open('cookie.txt', 'r', encoding='u8') as f:
    cookies = json.load(f)
browser.get("https://www.bilibili.com/")
for cookie in cookies:
    browser.add_cookie(cookie)
browser.get("https://www.bilibili.com/")

The process is to first visit the website where the cookie is to be loaded, then quickly add the cookie to it, and then revisit the website to load the cookie to take effect.

selenium headless mode get non-login cookies

For example, if a website like Douyin wants to download the video in it, it must have an initial cookie, but the algorithm generated by this cookie is more complicated, and it is difficult to simulate pure requests. At this time, we can use selenium to load the webpage and get the cookie. Save time analyzing js. Since we don't need to do manual operations, it's better to use headless mode.

The following demonstrates how to obtain cookies from Douyin website in headless mode:

from selenium import webdriver
import time


def selenium_get_cookies(url='https://www.douyin.com'):
    """无头模式提取目标链接对应的cookie,代码作者:小小明-代码实体"""
    start_time = time.time()
    option = webdriver.ChromeOptions()
    option.add_argument("--headless")
    option.add_experimental_option('excludeSwitches', ['enable-automation'])
    option.add_experimental_option('useAutomationExtension', False)
    option.add_argument(
        'user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36')
    option.add_argument("--disable-blink-features=AutomationControlled")
    print("打开无头游览器...")
    browser = webdriver.Chrome(options=option)
    print(f"访问{
      
      url} ...")
    browser.get(url)
    cookie_list = browser.get_cookies()
    # 关闭浏览器
    browser.close()
    cost_time = time.time() - start_time
    print(f"无头游览器获取cookie耗时:{
      
      cost_time:0.2f} 秒")
    return {
    
    row["name"]: row["value"] for row in cookie_list}


print(selenium_get_cookies("https://www.douyin.com"))

The print result is as follows:

打开无头游览器...
访问https://www.douyin.com ...
无头游览器获取cookie耗时:3.28 秒
{'': 'douyin.com', 'ttwid': '1%7CZn_LJdPjHKdCy4jtBoYWL_yT3NMn7OZVTBStEzoLoQg%7C1642932056%7C80dbf668fd283c71f9aee1a277cb35f597a8453a3159805c92dfee338e70b640', 'AB_LOGIN_GUIDE_TIMESTAMP': '1642932057106', 'MONITOR_WEB_ID': '651d9eca-f155-494b-a945-b8758ae948fb', 'ttcid': 'ea2b5aed3bb349219f7120c53dc844a033', 'home_can_add_dy_2_desktop': '0', '_tea_utm_cache_6383': 'undefined', '__ac_signature': '_02B4Z6wo00f01kI39JwAAIDBnlvrNDKInu5CB.AAAPFv24', 'MONITOR_DEVICE_ID': '25d4799c-1d29-40e9-ab2b-3cc056b09a02', '__ac_nonce': '061ed27580066860ebc87'}

Get cookies in local google chrome

We know that when we directly control the Google browser through selenium, the original Google browser cookie is not loaded. Is there a way to directly get the cookies that Google Chrome has logged in?

In fact, it is very simple, as long as we use the debug remote debugging mode to run the local Google browser, and then use selenium control to extract the previously logged in cookies:

import os
import winreg
from selenium import webdriver
import time


def get_local_ChromeCookies(url, chrome_path=None):
    """提取本地谷歌游览器目标链接对应的cookie,代码作者:小小明-代码实体"""
    if chrome_path is None:
        key = winreg.OpenKey(winreg.HKEY_CLASSES_ROOT, r"ChromeHTML\Application")
        path = winreg.QueryValueEx(key, "ApplicationIcon")[0]
        chrome_path = path[:path.rfind(",")]
    start_time = time.time()
    command = f'"{
      
      chrome_path}" --remote-debugging-port=9222'
    # print(command)
    os.popen(command)
    option = webdriver.ChromeOptions()
    option.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
    browser = webdriver.Chrome(options=option)
    print(f"访问{
      
      url}...")
    browser.get(url)
    cookie_list = browser.get_cookies()
    # 关闭浏览器
    browser.close()
    cost_time = time.time() - start_time
    print(f"获取谷歌游览器cookie耗时:{
      
      cost_time:0.2f} 秒")
    return {
    
    row["name"]: row["value"] for row in cookie_list}


print(get_local_ChromeCookies("https://www.douyin.com"))

The above code has the ability to directly obtain the cookie of any website previously visited in Google Chrome. Here is the result of Douyin:

DevTools listening on ws://127.0.0.1:9222/devtools/browser/9a9c19da-21fc-42ba-946d-ff60a91aa9d2
访问https://www.douyin.com...
获取谷歌游览器cookie耗时:4.62 秒

{'THEME_STAY_TIME': '10718', 'home_can_add_dy_2_desktop': '0', '_tea_utm_cache_1300': 'undefined', 'passport_csrf_token': 'c6bda362fba48845a2fe6e79f4d35bc8', 'passport_csrf_token_default': 'c6bda362fba48845a2fe6e79f4d35bc8', 'MONITOR_WEB_ID': 'd6e51b15-5276-41f3-97f5-e14d51242496', 'MONITOR_DEVICE_ID': 'd465b931-3a0e-45ba-ac19-263dd31a76ee', '': 'douyin.com', 'ttwid': '1%7CsXCoN0TQtHpKYiRoZnAKyqNJhOfkdJjNEJIdPPAibJw%7C1642915541%7C8a3308d87c6d2a38632bbfe4dfc0baae75162cedf6d63ace9a9e2ae4a13182d2', '__ac_signature': '_02B4Z6wo00f01I50-sQAAIDADnYAhr3RZZCOUP5AAEJ0f7', 'msToken': 'mUYtlAj8qr_9fuTIekLmAThy9N_ltbh0NJo05ns14o3X5As496_O5n7XT4-I81npZuGrIxt0V3JadDZlznmwgzwxqT6GZdIOBozEPC-WAZawQR-teML5984=', '__ac_nonce': '061ed25850009be132678', '_tea_utm_cache_6383': 'undefined', 'AB_LOGIN_GUIDE_TIMESTAMP': '1642915542503', 'ttcid': '3087b27658f74de9a4dae240e7b3930726'}

The above code gets the path of Google Chrome by reading the registry entry:

image-20220123181457905

Individuals cannot confirm that the location of Google Chrome can also be obtained through the registry on other computers, so the parameters of the active incoming path are provided.

Add --remote-debugging-port=9222 to the startup parameters of Google Chrome to enable remote debug mode, and selenium can debuggerAddressconnect to an existing Google browser with remote debug mode enabled through parameters.

So we can successfully obtain any website cookies that have been logged in Google Chrome before.

Note, however, that if our daily shortcut for running Google Chrome has a --user-data-dirparameter, the above code should also add this parameter, but the Google Chrome on the host machine does not:

image-20220123184921986

Parse the file that stores Google Chrome's cookie data and extract it

Finally, I will introduce an ultimate trick, which is to directly decrypt the cookies file. In versions of Google Chrome prior to 97, the cookie-saving file was stored in %LOCALAPPDATA%\Google\Chrome\User Data\Default\Cookies. After version 97, it was moved to %LOCALAPPDATA%\Google\Chrome\User Data\Default\Network\Cookies.

However, it is unknown whether the storage location and encryption method will continue to change in the future versions above 97. So far key data has always been stored in %LOCALAPPDATA%\Google\Chrome\User Data\Local Statefiles. (Before version 80, use win32crypt.CryptUnprotectData(encrypted_value_bytes,None,None,None,0)[1] to decrypt directly, no key required)

image-20220123183557545

Actually the Cookies file is a SQLlite database that can be viewed directly with Navicat Premium 15:

image-20220123184212976

But the value is encrypted, see the encryption algorithm after version 80: https://github.com/chromium/chromium/blob/master/components/os_crypt/os_crypt_win.cc

The file Local State involved in decrypting key storage is a JSON file:

image-20220123185817864

The complete extraction code is as follows:

"""
小小明的代码
CSDN主页:https://blog.csdn.net/as604049322
"""
__author__ = '小小明'
__time__ = '2022/1/23'

import base64
import json
import os
import sqlite3

import win32crypt
from cryptography.hazmat.primitives.ciphers.aead import AESGCM


def load_local_key(localStateFilePath):
    "读取chrome保存在json文件中的key再进行base64解码和DPAPI解密得到真实的AESGCM key"
    with open(localStateFilePath, encoding='u8') as f:
        encrypted_key = json.load(f)['os_crypt']['encrypted_key']
    encrypted_key_with_header = base64.b64decode(encrypted_key)
    encrypted_key = encrypted_key_with_header[5:]
    key = win32crypt.CryptUnprotectData(encrypted_key, None, None, None, 0)[1]
    return key


def decrypt_value(key, data):
    "AESGCM解密"
    nonce, cipherbytes = data[3:15], data[15:]
    aesgcm = AESGCM(key)
    plaintext = aesgcm.decrypt(nonce, cipherbytes, None).decode('u8')
    return plaintext


def fetch_host_cookie(host):
    "获取指定域名下的所有cookie"
    userDataDir = os.environ['LOCALAPPDATA'] + r'\Google\Chrome\User Data'
    localStateFilePath = userDataDir + r'\Local State'
    cookiepath = userDataDir + r'\Default\Cookies'
    # 97版本已经将Cookies移动到Network目录下
    if not os.path.exists(cookiepath) or os.stat(cookiepath).st_size == 0:
        cookiepath = userDataDir + r'\Default\Network\Cookies'
    # print(cookiepath)
    sql = f"select name,encrypted_value from cookies where host_key like '%.{
      
      host}'"
    cookies = {
    
    }
    key = load_local_key(localStateFilePath)
    with sqlite3.connect(cookiepath) as conn:
        cu = conn.cursor()
        for name, encrypted_value in cu.execute(sql).fetchall():
            cookies[name] = decrypt_value(key, encrypted_value)
    return cookies


if __name__ == '__main__':
    print(fetch_host_cookie("douyin.com"))

result:

{'ttcid': '3087b27658f74de9a4dae240e7b3930726', 'MONITOR_DEVICE_ID': 'd465b931-3a0e-45ba-ac19-263dd31a76ee', 'MONITOR_WEB_ID': '70892127-f756-4455-bb5e-f8b1bf6b71d0', '_tea_utm_cache_6383': 'undefined', 'AB_LOGIN_GUIDE_TIMESTAMP': '1642915542503', 'passport_csrf_token_default': 'c6bda362fba48845a2fe6e79f4d35bc8', 'passport_csrf_token': 'c6bda362fba48845a2fe6e79f4d35bc8', '_tea_utm_cache_1300': 'undefined', 'msToken': 'e2XPeN9Oe2rvoAwQrIKLvpGYQTF8ymR4MFv6N8dXHhu4To2NlR0uzx-XPqxCWWLlO5Mqr2-3hwSIGO_o__heO0Rv6nxYXaOt6yx2eaBS7vmttb4wQSQcYBo=', 'THEME_STAY_TIME': '13218', '__ac_nonce': '061ed2dee006ff56640fa', '__ac_signature': '_02B4Z6wo00f01rasq3AAAIDCNq5RMzqU2Ya2iK.AAMxSb2', 'home_can_add_dy_2_desktop': '1', 'ttwid': '1%7CsXCoN0TQtHpKYiRoZnAKyqNJhOfkdJjNEJIdPPAibJw%7C1642915541%7C8a3308d87c6d2a38632bbfe4dfc0baae75162cedf6d63ace9a9e2ae4a13182d2'}

You can see that the cookie has been perfectly extracted from the local file.

Guess you like

Origin blog.csdn.net/as604049322/article/details/122656048