Python crawler simple js reverse case
Since the learning task requires crawlers to obtain data, I learned the basics of python crawlers.
But when I started writing the crawler program, there was a problem, and the solution is recorded as follows.
brief introduction
Requirements: Crawling the data of a financial blockchain website https://www.oklink.com/btc/tx-list .
There is a problem: crawling data first needs to send a request to get the response data. Through web page analysis, we know that the data that needs to be obtained comes from ajax dynamic loading, so I chose to get the response data by sending a request to the ajax url. The parameter in the request header of the request is the primary problem that needs to be solved. However, the ajax request header of the request to the data contains dynamically changing and encrypted parameters similar to the following
xapiKey:LWIzMWUtNDU0Ny05Mjk5LWI2ZDA3Yjc2MzFhYmEyYzkwM2NjfDI3MTk0ODY0MDUwNzA2Mjk=
Solution: Analyze the related js files requested by the browser to obtain the x-apiKey generation function, and rewrite it with python (you don’t need to rewrite, you can also execute js code by calling the relevant js library in python).
1. Find the ajax data package that contains the required data
-
Open the website, open the browser tool with the shortcut key Ctrl+shift+I, select Network->XHR, refresh the webpage, you can see that there is only one request packet:
-
After opening, you can find the requested url:
https://www.oklink.com/api/explorer/v1/btc/transactionsNoRestrict?t=1608475589424&limit=20&offset=0
-
Explanation of the parameters in the ajax url:
1. get请求 2. t=1608475589424 为 时间戳 3. limit=20 为每页的交易数量 4. offset=0 为每页的起始交易位置
-
The parameters contained in the request header can be found below:
-
The x-apiKey is the encrypted parameter in the request header, and it changes every time it is refreshed.
x-apiKey: LWIzMWUtNDU0Ny05Mjk5LWI2ZDA3Yjc2MzFhYmEyYzkwM2NjfDI3MTk1ODY3MDA1MzExODU=
2. Keyword positioning through browser tools
- Find Search at the top right:
- Input: x-apiKey, three results are found, as shown in the figure
- You can see the header is included after the third js file, open it.
Here you can click the following {} to format the code:
- Press the shortcut key Ctrl+F again to find the x-apiKey in the js file and locate the x-apiKey
, then take out the associated code segment:
function(t, e, n) {
"use strict";
n(115),
n(57),
n(20),
n(60);
function r(t, e) {
for (var n = 0; n < e.length; n++) {
var r = e[n];
r.enumerable = r.enumerable || !1,
r.configurable = !0,
"value"in r && (r.writable = !0),
Object.defineProperty(t, r.key, r)
}
}
var o = new (function() {
function t() {
!function(t, e) {
if (!(t instanceof e))
throw new TypeError("Cannot call a class as a function")
}(this, t),
this.API_KEY = "a2c903cc-b31e-4547-9299-b6d07b7631ab"
}
return function(t, e, n) {
e && r(t.prototype, e),
n && r(t, n)
}(t, [{
key: "encryptApiKey",
value: function() {
var t = this.API_KEY
, e = t.split("")
, n = e.splice(0, 8);
return t = e.concat(n).join("")
}
}, {
key: "encryptTime",
value: function(t) {
var e = (1 * t + 1111111111111).toString().split("")
, n = parseInt(10 * Math.random(), 10)
, r = parseInt(10 * Math.random(), 10)
, o = parseInt(10 * Math.random(), 10);
return e.concat([n, r, o]).join("")
}
}, {
key: "comb",
value: function(t, e) {
var n = "".concat(t, "|").concat(e);
return window.btoa(n)
}
}, {
key: "getApiKey",
value: function() {
var t = (new Date).getTime()
, e = this.encryptApiKey();
return t = this.encryptTime(t),
this.comb(e, t)
}
}]),
t
}())
, i = window.utils.ont
, c = Object.assign({
}, i);
c.interceptors.request.use(function(t) {
return t.url.indexOf("api/explorer/v1") > -1 && (t.headers.common["x-apiKey"] = o.getApiKey()),
t
});
e.a = c
}
3. Analyze related js files to find out the specific implementation
Obviously, x-apiKey is the variable o obtained by calling the getApiKey() function
t.headers.common["x-apiKey"] = o.getApiKey()
1. getApiKey() function
Find the function definition:
key: "getApiKey",
value: function() {
var t = (new Date).getTime()
, e = this.encryptApiKey();
return t = this.encryptTime(t),
this.comb(e, t)
}
- The variable t is the current time, and the variable e is obtained by calling encryptApiKey();
- After passing the variable t as a parameter to the encryptTime() function, a new variable t is obtained;
- Finally, the variable t and variable e are passed as parameters to the comb() function to obtain the final return value x-apiKey.
2. The encryptApiKey() function
key: "encryptApiKey",
value: function() {
var t = this.API_KEY
, e = t.split("")
, n = e.splice(0, 8);
return t = e.concat(n).join("")
}
-
The variable t is a fixed string, which can be found above
this.API_KEY = "a2c903cc-b31e-4547-9299-b6d07b7631ab"
-
The variable e is to slice the variable t to get each character in t, and the variable n to get the first 8 characters in e, namely:
"a", "2", "c", "9", "0", "3", "c", "c"
-
The final variable t is to remove the n part of e and add n to it to get the new string return value
"-b31e-4547-9299-b6d07b7631aba2c903cc"
This value is the e parameter of the comb() function in the getApiKey() function.
3. The encryptTime() function
key: "encryptTime",
value: function(t) {
var e = (1 * t + 1111111111111).toString().split("")
, n = parseInt(10 * Math.random(), 10)
, r = parseInt(10 * Math.random(), 10)
, o = parseInt(10 * Math.random(), 10);
return e.concat([n, r, o]).join("")
}
-
t is the incoming current time (13-bit timestamp), which is passed to e after string processing;
-
The variables n, r, o are generated random integers from 0 to 10, combined with e to generate a new string about the current time and 3 random numbers, such as
"2", "7", "1", "9", "4", "9", "2", "4", "3", "8", "8", "9", "9"]
This value is the t parameter of the comb() function in the getApiKey() function.
4. Comb() function
key: "comb",
value: function(t, e) {
var n = "".concat(t, "|").concat(e);
return window.btoa(n)
}
-
The variable n is the incoming parameters t and e after processing
2719492921233667|-b31e-4547-9299-b6d07b7631aba2c903cc
-
window.btoa(n) is the base64 encryption function, get the encrypted n
MjcxOTQ5MzI5ODc0MjA2NXwtYjMxZS00NTQ3LTkyOTktYjZkMDdiNzYzMWFiYTJjOTAzY2M=
-
Compare the x-apiKey in the browser request header
LWIzMWUtNDU0Ny05Mjk5LWI2ZDA3Yjc2MzFhYmEyYzkwM2NjfDI3MTk0ODc1Mzk0ODMwOTE=
It can be seen that there are more obvious inconsistencies.
5. Find the reason for the inconsistency
-
Decode the x-apiKey obtained in the browser through base64
-b31e-4547-9299-b6d07b7631aba2c903cc|2719487539483091
-
And the string before encryption obtained through the above process is
2719492921233667|-b31e-4547-9299-b6d07b7631aba2c903cc
-
It can be easily observed that the difference between the data before encryption and the correct data generated by ourselves is the string order before and after "| ";
-
After modifying the sequence, the correct result is obtained.
6. Rewrite the above js code into python code
# 获取动态变化且加密的x-apiKey
def get_x_apikey():
# API_KEY固定字符串
API_KEY = "a2c903cc-b31e-4547-9299-b6d07b7631ab"
Key1 = API_KEY[0:8]
Key2 = API_KEY[8:]
# 交换API_KEY部分内容
new_Key = Key2 + Key1
# 获取当前时间,毫秒级
cur_time = round(time.time() * 1000)
# 处理获得的时间
new_time = str(1 * cur_time + 1111111111111)
# 生成三个0-9的随机整数
random1 = str(random.randint(0, 9))
random2 = str(random.randint(0, 9))
random3 = str(random.randint(0, 9))
# 再次处理时间字符串
cur_time = new_time + random1 + random2 + random3
# 将包含API_KEY和时间串的内容合并
this_Key = new_Key + '|' + cur_time
# 转码
n_k = this_Key.encode('utf-8')
# base64加秘
x_apiKey = base64.b64encode(n_k)
# 将加密后的x_apiKey返回
return str(x_apiKey, encoding='utf8')
So far, this js reverse work has been successfully solved.