1. Write in front
Reverse technology is indeed very challenging. I often see all kinds of tricks and tricks in various crawlers and reverse groups. In the field of reptiles, more sharing is needed in order to be able to achieve a better self. The encryption technology used by this website to be discussed in this issue is relatively difficult! When I was looking for a case, I was also learning the ideas and skills of other big guys. There is no shortcut to learning reverse engineering. We can only rely on our own analysis and accumulation.
target site :
aHR0cHM6Ly9xaWthbi5jcXZpcC5jb20vUWlrYW4vSm91cm5hbC9TdW1tYXJ5P2tpbmQ9MSZnY2g9OTUyNDNYJmZyb209UWlrYW5fSm91cm5hbF9TdW1tYXJ5
2. Target Analysis
First, open the case website and click to turn the page. The main purpose of this time is to obtain more encrypted analysis of the content loaded in it. The search in this website is also an encrypted area
. You can get more data content through this url
There have been many people on the Internet in the search area of this website have analyzed this, some of which are hard-encrypted and encrypted, and some of which are based on RPC technology.
Open this url on other browsers and find that no data has been received. This is because the VIP journal website has not only encrypted this parameter, but also made special processing so that we cannot directly obtain data from the URL. We must obtain data from VIP Journal websites can only obtain data by opening the url, which protects the data on VIP journals to a certain extent, so as to achieve the purpose of anti-crawling
previous part:
https://*.com/Journal/RightArticle?
Encrypted part:
X2sCXRB4=0IsPQ0alqEtID3EakumIk.NZiLQ9yQeCdROMNRaLtSO0U74P8MpIcElFncx8UMr8l36GA6zSqfi_DEzmmnT2wbnwsgTuYbvql
After several page-turning and loading, it is found that the former part is fixed, while the encrypted part is constantly changing, and the string X2sCXRB4 is also fixed
It is obvious that the entire URL is encrypted and then spliced with the previous part to generate a piece of ciphertext
The page turning action website uses ajax technology to achieve, no matter how the ajax request is packaged and sent, the bottom layer must be implemented with XMLHttpRequest technology, otherwise the request cannot be sent to the server without refreshing, and then the web page will return data without refreshing
The process of triggering page turning to return data is as follows:
Click to turn the page -> after some initialization -> encrypt and splicing the complete URL -> send to the server -> server accepts the request -> render to the web page
The next thing to do is to pull out the code of the link of encrypted splicing URL and call it, splicing the complete page-turning URL, so as to continuously obtain more data
Encryption codes are generally confusing on many websites, just like valuable web data will require account login ( later, a detailed explanation will be given for various schemes of simulated login ), and direct search is generally not available
And this website cannot directly locate the encrypted code, and there are no keywords that can be searched, so we can only locate from the interface
Via XHR breakpoints:
Because each page turning is through Journal/RightArticle , so we break it directly, and then click other page turning will be broken at the Ajax request
Of course, you can also use the Initiator
In the XMLHttpRequest technology, positioning the send function send in the stack is the last step in realizing communication with the server. Through this function, the requested URL is sent to the server. The last function in the stack is the encryption function, however, since encryption functions are usually obfuscated, it is not recommended to study this function first. Instead, it is recommended to analyze the logic of the send function first
It can be seen from this request server process that encryption must be implemented before send is sent
XMLHttpRequest implementation process:
xhr = XMLHttpRequest()
xhr.open('get', 'http://*.com...', false)
xhr.send(data)
Locate the send function:
Because the jquery.js file is a third-party js framework, encryption functions are generally not written in this framework. According to the previous analysis, the website will rewrite the open function
Analyze the open function, set a breakpoint on the line where the open function is located, and click to turn the page again:
Put the mouse on the open function, you can find that the open function has been rewritten, we click the link to enter the rewritten function:
Follow up with the execution code and enter n.apply(this, arguments)
This function is likely to be an encryption function entry, let's continue down and look at the console
The above guess that this function is an encryption function entry
_$8f(arguments[1])
Rewrite and send the request to the encrypted url, and the result is the same as now. It can be confirmed that the encrypted url is the url we want to reverse
Obviously it is a plain text string, after the function _ KaTeX parse error: Expected group after '_' at position 20: ..., it becomes an object, and the _̲ bt value in the object is the encrypted url
return _$6y[_$V8[36]](this, arguments)
_$6y in this line of code is the open function, as shown in the figure below:
From this, it can be determined that the open function has been rewritten, and the final return is still the open function
Analyze encryption function_ $8f
function _$8f(_$H_, _$wH) {
var _$Pv, _$pq = null;
var _$8_ = _$H_;
function _$Rv(_$tx, _$65) {
var _$xW = [];
var _$DY = '';
var _$dF = _$u3(_$4i());
_$xW = _$xW[_$V8[9]](_$65, _$tx, _$wH || 0, _$dF);
var _$eq = _$4b(923, _$2l[186], true, _$xW);
var _$Q6 = _$rW + _$eq;
_$pq = _$Tt(_$XH(_$Q6), _$2l[27]);
return _$hT[_$V8[5]](_$DY, _$ph, _$V8[26], _$Q6);
}
function _$_L(_$tx) {
if (_$tx._$V8) {
var _$xW = _$ht(_$ht(_$tx._$wa, _$V8[38])[0], _$V8[78])[1];
if (_$xW[_$V8[3]](_$Z_) >= 0 && _$xW[_$V8[3]](_$ph) >= 0) {
return true;
}
}
return false;
}
function _$Hc() {
try {
if (typeof _$H_ !== _$V8[0])
_$H_ += '';
_$Pv = _$wa(_$H_);
if (_$_L(_$Pv)) {
return;
}
if (_$45) {
_$H_ = _$MJ(_$H_, _$Pv);
}
} catch (_$xW) {
return;
}
if (_$Pv === null || _$Pv._$uF > _$2l[40]) {
_$4b(953, _$2l[186]);
return;
}
if (_$_u(_$Pv)) {
_$4b(953, _$2l[186]);
return;
}
_$H_ = _$Pv._$iB + _$Pv._$fp;
var _$DY = _$s3(_$Pv);
var _$dF = _$DY ? _$V8[78] + _$DY : '';
var _$eq = _$3v(_$zB(_$iL(_$Pv._$nu + _$dF)));
var _$Q6 = 0;
if (_$Pv._$9h) {
_$Q6 |= 1;
}
if (_$bP & _$2l[49]) {
_$Q6 |= _$2l[40];
}
_$H_ += _$V8[78] + _$Rv(_$Q6, _$eq, _$wH);
if (_$DY.length > 0) {
if (_$24 && _$24 <= _$2l[149]) {
_$H_ = _$e7(_$H_);
}
if (!(_$bP & _$2l[9])) {
_$DY = _$e7(_$DY);
}
_$DY = _$V8[66] + _$zL(_$DY, _$pq, _$2l[40]);
}
_$H_ += _$DY;
}
function _$xa(_$tx) {
_$5H(_$2l[27], _$Ka());
if (_$pq === null || _$lh(_$Pv) === false) {
return _$tx;
}
if (typeof _$tx === _$V8[0] || typeof _$tx === _$V8[447] || typeof _$tx === _$V8[347]) {
_$tx = '' + _$tx;
if (_$tx.length <= _$pC) {
_$tx = _$zL(_$tx, _$pq, _$2l[178]);
}
}
return _$tx;
}
function _$BZ() {
return _$pq !== null;
}
function _$dF(_$tx, _$65) {
if ((_$tx === 'get' || _$tx === _$V8[106]) && _$BZ() && (_$4o & 1) && (_$bP & _$2l[49]) && _$Pv && _$Pv._$uF < _$2l[178] && _$P4(_$Pv)) {
if (_$Pv._$9h) {
this._$Tt = true;
} else {
if (_$65 === _$Sc || _$65 === null || _$65 === '') {
_$65 = _$V8[105];
}
if (_$65 === _$V8[105]) {
this._$Tt = true;
return _$65;
}
}
}
return '';
}
_$Hc();
return {
_$ni: _$8_,
_$bt: _$H_,
_$rI: _$xa,
_$K$: _$dF,
_$Wu: _$Qw,
_$Tt: false
};
}
Several functions and variables are defined in the function, and the Hc() function is called at the end. Why is it called here? It can be seen from the above that the result of the 8f function has a return value, and the encrypted url is returned, so calling the function _$Hc() here is likely to generate encrypted parameters
Finally, an object is returned, and the key is bt in the object , and the value corresponding to this key is found to be the encrypted url, that is to say, $H is an encrypted url
Where did this H come from? In the entire 8f function, only the _Hc() function call is executed, and the other functions just declare the function without calling it. It can be guessed that the url value of H must be generated in the _$Hc() function, because other functions are not called.
Analysis_ $Hc() function
What actually generates the url in this function is the following line of code:
_$H_ += _$V8[78] + _$Rv(_$Q6, _$eq, _$wH)
V8[78] is a question mark, and the following _$Rv function is the function that actually generates the url encryption parameters
Now it is further found that the function _$Rv that actually generates the encrypted parameters is the first function defined in the above large function _$8f
Analysis_ $Rv function
In this function, the real encrypted function, _$4b is the real encrypted function, and the following code is to splice the encrypted code through the function concat
return _$hT[_$V8[5]](_$DY, _$ph, _$V8[26], _$Q6)
_$ph is:
X2sCXRB4
Isn’t this string the first string following the question mark in the url? From this point, it can also be determined that the function that actually generates the encrypted string is _$4b
Analyze the _$4b function, enter this function and find thousands of lines of code, the _$4b function is an anti-climbing and obfuscation code that controls the flattened structure of the flow
The url encryption parameter is encrypted and generated in this flat stream code. The simplified _$4b code is as follows
var _$1V, _$IF, _$f4 = _$P2, _$je = _$oY[0];
function _$4b(_$xI, _$H_, _$wH, _$_M) {
function _$bc() {
}
function _$Lp() {
}
function _$gd() {
}
function _$q9() {
}
function _$xQ() {
}
function _$Po() {
}
var _$Pv, _$Xl, _$Rv, _$BZ, _$_L, _$pq, _$gy, _$xp, _$Xt, _$xW, _$8_, _$cy, _$iO, _$Xi, _$bX, _$DY, _$dF, _$Q6, _$p4, _$aB, _$9P, _$eq, _$zH, _$Hc, _$ss, _$xq, _$Hu, _$ya, _$xa, _$hI;
var _$LP, _$vc, _$Rb = _$xI, _$eS = _$oY[1];
while (1) {
_$vc = _$eS[_$Rb++];
if (_$vc < 256) {
...
}
}
}
Omit a lot of if logic
The specific encryption codes have been found so far, and the following is how to extract these obfuscated codes. The flat flow structure code is the key point of difficulty, and time is limited. Find time to make up later
If you use RPC, you don’t need to dig out the code, just find the encryption function entry and call it directly
In addition, interested friends can take a look at this website. The functions and variable names in the article change dynamically, because the code generated by Chrome on the virtual host will regenerate the changes and function names as long as the browser is refreshed.
Well, it's time to say goodbye to everyone here again. It's not easy to create, please give me a like before leaving. Your support is the driving force for my creation, and I hope to bring you more high-quality articles