[100 Examples of JS Reverse] WebSocket Protocol Crawler, Wisdom Tree Scan Code Scanning Case Analysis

Pay attention to the WeChat public account: Brother K crawler, continue to share technical dry goods such as advanced crawler, JS/Android reverse!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized. Commercial and illegal uses are strictly prohibited. Otherwise, all consequences arising therefrom have nothing to do with the author. If there is any infringement , please contact me to delete immediately!

reverse goal

  • Goal: Smart tree scan code login, the interface uses the WebSocket communication protocol
  • Homepage:aHR0cHM6Ly9wYXNzcG9ydC56aGlodWlzaHUuY29tL2xvZ2luI3FyQ29kZUxvZ2lu

Introduction to WebSockets

WebSocket is a protocol for full-duplex communication over a single TCP connection. WebSocket makes data exchange between clients and servers much simpler. In the WebSocket API, the browser and the server only need to complete a handshake, and a persistent connection can be created directly between the two, and two-way data transmission can be performed.

The WebSocket protocol is abbreviated as WS or WSS (WebSocket Secure), and the URL for sending a request starts with ws://or wss://. WSS is an encrypted version of WS, similar to HTTP and HTTPS.

The biggest feature of the WebSocket protocol is that the server can actively push information to the client, and the client can also actively send information to the server. It is a true two-way equal dialogue and belongs to a kind of server push technology. The comparison with HTTP is shown in the following figure:

01.png

Packet capture analysis

Go to the scan code login page of Wisdom Tree, capture packets and select WS to filter WebSocket requests, as shown in the following figure:

02.png

There are some special parameters that are not available in HTTP/HTTPS requests:

  • Upgrade: websocket: Indicates that this is a WebSocket type request;
  • Sec-WebSocket-Version: Tells the Websocket Draft (protocol version) used by the server, which must be 13;
  • Sec-WebSocket-Extensions: Protocol extension, a certain type of protocol may support multiple extensions, through which protocol enhancement can be achieved;
  • Sec-WebSocket-Key: is a base64-encoded cipher text sent by the WebSocket client, which is randomly generated by the browser. It is required that the server must return a corresponding encrypted Sec-WebSocket-Acceptresponse client will throw Error during WebSocket handshakean error and close the connection.

We first scan the code to log in, and then select the Messages tab. You can see that there is some data interaction. The green arrow is the data sent by the client to the server, and the red arrow is the data returned by the server response to the client, as shown in the following figure:

03.png

Let's observe the entire interaction process. When we open the QR code page, that is, when the QR code is loaded, the WebSocket connection is established. Every 8 seconds or so, the client actively sends a string of strings, and the server It also returns the same string, but in dictionary format. When we scan the code successfully, the server will return the scan code success information. When we click to log in, the client will return the scan code result. If successful, there will be A one-time password oncePasswordand one uuid, these two parameters will definitely be used in subsequent requests. If you do not scan the code for a long time, the message that the QR code has expired will be returned after a period of time, and a message will be sent every 8 seconds, just to maintain the connection and obtain the QR code status message.

So here are two problems:

  1. How did you get the string of strings sent back and forth interactively?

  2. How should WebSocket requests be implemented in Python?

  3. How can the client receive information from the server in real time while sending data every 8 seconds? (Observe that the scan code result of the request is returned in real time, so it cannot be received every 8 seconds)

parameter acquisition

First solve the first problem, how did the string sent by the client come from? The way to find the encrypted string here is the same as the HTTP/HTTPS request. In this example, we can directly search for this string and find that It is passed through an interface, where img is the base64 value of the QR code image, and qrToken is the string sent by the client, as shown in the following figure:

04.png

It should be noted here that not all WebSocket requests are so simple. The data sent by some clients is Binary Message (binary data) or more complex encryption parameters, which cannot be obtained by direct search. For this situation , we also have a workaround:

  1. The known statement to create a WebSocket object is: var Socket = new WebSocket(url, [protocol] );, so we can search for and new WebSocketlocate the location where the request was created.

  2. Knowing that a WebSocket object has the following related events, we can search for the corresponding event handler code to locate:

event event handler describe
open Socket.onopen Triggered when a connection is established
message Socket.onmessage Triggered when the client receives data from the server
error Socket.onerror Triggered when a communication error occurs
close Socket.onclose Fired when the connection is closed
  1. Knowing that a WebSocket object has the following related methods, we can search for the corresponding method to locate:
method describe
Socket.send() send data using connection
Socket.close() close the connection

Python implements WebSocket request

Next, the second question, how should a WebSocket request be implemented in Python? There are many Python libraries for connecting to WebSockets. The more commonly used and stable ones are websocket-client (non-asynchronous), websockets (asynchronous), and aiowebsocket (asynchronous). In this case, websocket-client is used, and the third problem should be paid attention to here. For the client, data should be sent every 8 seconds. For the server, we need to receive the information of the server in real time. You can observe the request, scan the The result of the code is returned in real time. If we also receive data every 8 seconds, data may be lost, and the response of the entire program will not be timely and the efficiency will become lower.

The official websocket-client document provides us with a demo of a long connection, which realizes sending data three times in a row and monitoring the data returned by the server in real time, in which websocket.enableTrace(True)indicates whether to display the connection details:

import websocket
import _thread
import time


def on_message(ws, message):
    print(message)


def on_error(ws, error):
    print(error)


def on_close(ws, close_status_code, close_msg):
    print("### closed ###")


def on_open(ws):
    def run(*args):
        for i in range(3):
            time.sleep(1)
            ws.send("Hello %d" % i)
        time.sleep(1)
        ws.close()
        print("thread terminating...")
    _thread.start_new_thread(run, ())


if __name__ == "__main__":
    websocket.enableTrace(True)
    ws = websocket.WebSocketApp(
        "ws://echo.websocket.org/", on_open=on_open,
        on_message=on_message, on_error=on_error, on_close=on_close
    )

    ws.run_forever()

Let's modify it appropriately. In the run method, the client still sends qr_token every 8 seconds, and receives the message from the server in real time. When the word "scan code successfully" appears in the message, the obtained oncePasswordand uuidstored. Then close the connection, the logic code is as follows, and then you only need to connect the acquisition logic of the QR code. (It has been desensitized and cannot be run directly)

import json
import time
import _thread
import websocket


web_socket_url = "wss://appcomm-user.脱敏处理.com/app-commserv-user/websocket?qrToken=%s"
qr_token = "ca6e6cfb70de4f2f915b968aefcad404"
once_password = ""
uuid = ""


def wss_on_message(ws, message):
    print("=============== [message] ===============")
    message = json.loads(message)
    print(message)
    if "扫码成功" in message["msg"]:
        global once_password, uuid
        once_password = message["oncePassword"]
        uuid = message["uuid"]
        ws.close()


def wss_on_error(ws, error):
    print("=============== [error] ===============")
    print(error)
    ws.close()


def wss_on_close(ws, close_status_code, close_msg):
    print("=============== [closed] ===============")
    print(close_status_code)
    print(close_msg)


def wss_on_open(ws):
    def run(*args):
        while True:
            ws.send(qr_token)
            time.sleep(8)
    _thread.start_new_thread(run, (qr_token,))


def wss():
    # websocket.enableTrace(True)  # 是否显示连接详细信息
    ws = websocket.WebSocketApp(
        web_socket_url % qr_token, on_open=wss_on_open,
        on_message=wss_on_message, on_error=wss_on_error,
        on_close=wss_on_close
    )
    ws.run_forever()

Realize scan code login

The most important part of the WebSocket request has been solved. After scanning the code to get oncePasswordand uuid, the subsequent processing steps are relatively simple. Now let's take a look at the complete steps:

  1. Request the homepage, get the cookie for the first time, including: INGRESSCOOKIE, JSESSIONID, SERVERID, acw_tc;
  2. Request the QR code interface to get the base64 value and qrToken of the QR code;
  3. Establish a WebSocket connection, scan the QR code, and get the one-time password oncePassword and uuid (it seems useless);
  4. Request a login interface, 302 redirection, need to carry a one-time password, get the cookie for the second time, including: CASLOGC, CASTGC, and update SERVERID at the same time;
  5. Request the 302 redirect address in step 4, and obtain the cookie for the third time, including: SESSION;
  6. Carry the complete cookie, request the user information interface, and obtain the real user name and other information.

In fact, after the WebSocket connection ends, there are many requests, which seem to be all right, but after testing by Brother K, only two redirects are useful. The packet capture is as follows:

05.png

full code

GitHub pays attention to Brother K's crawler and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/

The following only demonstrates some key codes and cannot be run directly! Complete code repository address: https://github.com/kgepachong/crawler/

Python login code

import time
import json
import base64
import _thread
import requests
import websocket
from PIL import Image


web_socket_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
get_login_qr_img_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
login_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
user_info_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"

headers = {
    "Host": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
    "Pragma": "no-cache",
    "Referer": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
}

qr_token = ""
once_password = ""
uuid = ""
cookie = {}


def get_cookies_first():
    response = requests.get(url=login_url, headers=headers)
    global cookie
    cookie = response.cookies.get_dict()


def get_login_qr_img():
    response = requests.get(url=get_login_qr_img_url, headers=headers, cookies=cookie).json()
    qr_img = response["img"]
    global qr_token
    qr_token = response["qrToken"]
    with open('code.png', 'wb') as f:
        f.write(base64.b64decode(qr_img))
    image = Image.open('code.png')
    image.show()
    print("请扫描验证码! ")


def wss_on_message(ws, message):
    print("=============== [message] ===============")
    message = json.loads(message)
    print(message)
    if "扫码成功" in message["msg"]:
        global once_password, uuid
        once_password = message["oncePassword"]
        uuid = message["uuid"]
        ws.close()


def wss_on_error(ws, error):
    print("=============== [error] ===============")
    print(error)
    ws.close()


def wss_on_close(ws, close_status_code, close_msg):
    print("=============== [closed] ===============")
    print(close_status_code)
    print(close_msg)


def wss_on_open(ws):
    def run(*args):
        while True:
            ws.send(qr_token)
            time.sleep(8)
    _thread.start_new_thread(run, (qr_token,))


def wss():
    # websocket.enableTrace(True)  # 是否显示连接详细信息
    ws = websocket.WebSocketApp(
        web_socket_url % qr_token, on_open=wss_on_open,
        on_message=wss_on_message, on_error=wss_on_error,
        on_close=wss_on_close
    )
    ws.run_forever()


def get_cookie_second():
    global cookie
    params = {
        "pwd": once_password,
        "service": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    }
    headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    response = requests.get(url=login_url, params=params, headers=headers, cookies=cookie, allow_redirects=False)
    cookie.update(response.cookies.get_dict())
    location = response.headers.get("Location")
    return location


def get_cookie_third(location):
    global cookie
    headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    response = requests.get(url=location, headers=headers, cookies=cookie, allow_redirects=False)
    cookie.update(response.cookies.get_dict())
    location = response.headers.get("Location")
    return location


def get_login_user_info():
    headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Origin"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    params = {"time": str(int(time.time() * 1000))}
    response = requests.get(url=user_info_url, headers=headers, cookies=cookie, params=params)
    print(response.text)


def main():
    # 第一次获取 cookie,包含 INGRESSCOOKIE、JSESSIONID、SERVERID、acw_tc
    get_cookies_first()
    # 获取二维码
    get_login_qr_img()
    # websocket 扫码登录,返回一次性密码
    wss()
    # 第二次获取 cookie,更新 SERVERID、获取 CASLOGC、CASTGC
    location1 = get_cookie_second()
    # 第三次获取 cookie,获取 SESSION
    get_cookie_third(location1)
    # 获取登录用户信息
    get_login_user_info()


if __name__ == '__main__':
    main()

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4585873/blog/5346839