A picture-stop resource dynamic password reverse course record

Foreword

The reverse is a "you know everything," Domain Pictures stations, this station is currently limited registration, non-members can not access; two days ago accidentally got invitations code, after entering quality can also be found, so try crawling in reptiles found during the preparation of this site uses a lot of means to prevent automated scripts (or replay attacks), can be used as a more representative reptile reverse case, it is recorded here.

Analysis process

After logging in, found the page shows a segment of Loading animation before it is loaded from top to bottom, right to view the page source code

<!DOCTYPE html> <html lang="zh-Hans"> <head> <title>Loading... - Poi</title> <meta charset="utf-8">.......

  

Here basically you can determine, it is loaded asynchronously resources such station, and in the bottom of the code as well as vendor.js, probably with a vue development, the traditional page element positioning method does not apply here, should be to find the interface a. Because I want to integrate into the system (see previous article E station reptiles) reptile entrance is a link to the album page, so no need to index pages for analysis, open Charles, casually points to open one, find the link to the corresponding entry, the focus Watch headers and cookies

 

 

 

 First, try ctrl c + v Dafa, direct copy headers and cookies constructed exactly the same request, which move in a number of login authentication sites are useful, but not acting on this site: Web site return a friendly error page, and prompts Do not do things, obviously cookies or headers, there are some time-varying field or dynamically generated by an algorithm locally. In fact, cookie in the st it is clear that a time stamp, and the remaining few are basically field id or password mean, you want to understand the generation of these fields, and perhaps to log analysis from the beginning, too.

 

From the results, capture, login divided into two processes, https: //xxxx.com/auth/login first GET, and POST, POST submit them to the user name and password data json serialization, GET in response We have set cookie operation, assigned a value for the st and poi_session, and the request to carry when POST cookie is still the two (there are three Google statistics cookies), so the cookies do not need to be too concerned. But headers has increased a key field:

 

 

 

Even the string in two equal sign, which is the basic base64 encoded mark, but has appeared in front of the equal sign, it should be made a reverse, the reverse decoding

 

Still no law, with a high probability js dynamically generated, so you want to resolve the need to find the generating function and the function of mass participation. Back to Charles, after the timing GET method login page, before POST, there are three page file and a js / env directory is requested, the three js files are manifest.js, vendor.js and app.js, real take webpack hammer is packed, and there are thousands of lines, put it aside; / env discovery request header field is not unusual request, indicating x-api-key generation is likely after it returned to this page a json, one of the more interesting field is the client_secret

 

 

Hey loud noise so familiar ...... compared with the above we base64 decoded string, although the order is different, but the basic character seems to be the same, there is certainly a link between the two.

(In fact, I did not find this a beginning point, to find the back of generating functions to react)

/ Env this has no clue, login page itself <script> tag inside there is no information, x-api-key generating function may only be located within three js file, app.js is the program entry file from here start the analysis is more reasonable, and this is obviously homemade encrypted fields is unlikely to be third-party libraries. First thought was interrupted point, but this is not a click event, and basically no stepping operability, so try to keyword targeting function; app.js search for x-api-key, but ...... He did not, but the search authtoken be able to find out the correlation function definitions and multiple calls, such as function features similar no reason scattered in different files.

Think from another perspective, js statement To add headers in an addition to the field names, the statement will also appear in the "headers" word, that search for "headers" it? Occurrences are not many, so some results found in about 3/4 of the position

(function(e){return e.headers.common[atob(atob("V0MxaFVHa3RTMFY1")).toUpperCase()]=t.e()

atob is base64 decoding function, the "V0MxaFVHa3RTMFY1" decoded twice to see

 

Found, this passage is the core statement x-api-key calculation! Then take a look at what t and e are.

Looking up, recently associated with t is this period

function() {
    var t = this,
        e = arguments.length > 0 && void 0 !== arguments[0] ? arguments[0] : 0;
    this.initUserState()
        .then((function() {
            return t.initialized = !0
        }))
        .catch((function() {
            e < 2 && setTimeout((function() {
                return t.initUser(++e)
            }), 2e3)
        }))
}

t = this,所以重点还在e上。本段提到的e显然是一个数值类型,不是方法,继续寻找

 

由于app.js内大量的变量名重用,通过调用关系定位e()很困难,但根据js内的函数定义风格,e的定义一定是这样的

e:function(){.....

果然查找到了

e: function() {
    var t = this.env.client_secret,
        e = this.$moment()
        .unix() + this.serverTimeOffset,
        n = (Math.pow(e, 2) + Math.pow(navigator.userAgent.length, 2))
        .toString()
        .split("")
        .map((function(e) {
            return t[e]
        }))
        .join("");
    return btoa(n)
        .split("")
        .reverse()
        .join("")
}

看到这里涉及到了取当前时间戳,浏览器头"user-agent"长度,平方运算,最后把得到的整数分割成单个数字,map取到client_secret的值,而client_secret之前已经获取到了,还差一个serverTimeOffset,搜索后找到它的定义函数

setServerTimeOffset: function() {
    var t = Math.floor((window.performance.timing.responseEnd - window.performance
        .timing.responseStart) / 1e3) || 0;
    t = t >= 0 ? t : 0, this.serverTimeOffset = Number(cookies.get("st")) +
        t - this.$moment()
        .unix()
}

t由请求报文的时延决定,几百毫秒的延时,运算结果认为是0即可(不严谨,但大多数时候没问题),所以serverTimeOffset就是cookies的st值减去当前时间,到此x-api-key的所有运算参数都获得了,用Python写就是

client_secret = self.env.get("client_secret")
serverTimeOffset=int(self.session.cookies.get("st"))+0-int(time.time())
e = int(time.time())+serverTimeOffset 
n = "".join(map(lambda x: client_secret[int(x)], str(pow(e,2)+pow(len(head['user-agent']),2))))
x_api_key = str(base64.b64encode(n.encode("utf-8")), "utf-8")[::-1]

至此x-api-key的构造分析完毕,接下来进入画册详情页的分析。


 

详情页的headers和cookies未有特别之处,sentinel和auth_token分别在login的POST和GET index页时由set cookie添加。

详情页同样是异步加载,内容的接口如下图,用GET方法获取。

 

 

 

 headers部分除了x-api-key外,多了authorization,值就是"Bearer "+auth_token,很简单,但它返回json里的数据有些不是明文

 

 

 等号在前,果断逆序解码,获得标题。如果没想到逆序的话,在app.js里搜索"encrypt"或"title",也能搜到加解密函数的定义,思想与上面其实是一致的。

图片资源列表也在此json中,以明文储存,虽然不能直接用所给的地址下载图片,但用正则提取出特征码后,即可拼接出真正的图片地址。

 

最后一个坑在心跳包上,因为笔者发现此网站的每个页面都会隔120s往/heartbeat发一个心跳包,一开始并没在意,后来才发现,heartbeat会更新cookies里的st字段值,x-api-key是用st值算出来的,而每个带x-api-key字段的请求发生时,x-api-key要重新运算更新!如果st的值小于当前时间120秒,那算出来的x-api-key就会非法!表现为在下载完一本漫书(通常耗时超两分钟)后,访问新页面就会401,解决的话倒也不用真2分钟发一次,只需要在请求新页面前几秒发一个心跳包,令st得到更新即可。


 

最后,笔者设计的爬虫方案是,用requests包的session来维持每次cookies,第一次爬取表单提交模拟登陆,之后在每一次新页面GET结束时,将session对象序列化保存至本地磁盘(pickle包),下次程序启动时,如果检测到有保存文件,就直接把session加载进内存,不再登录。经实验此爬虫可稳定地下载选中资源。 

总结

逆向此网站花了一天时间,非专业人员,手法比较生疏,如果说有一些感受,那就是对前后端分离设计的网站,抓包时注意包的时序;定位js函数时,功能相近的很多时候也会写在一起;有些字段找不到时,编码成base64再试试,以及细心观察。

 

Guess you like

Origin www.cnblogs.com/qjfoidnh/p/12329621.html