使用Python爬取天猫商品详情与评论(包含sign加密分析)

【作者主页】:吴秋霖
【作者介绍】:Python领域优质创作者、阿里云博客专家、华为云享专家。长期致力于Python与爬虫领域研究与开发工作!
【作者推荐】:对JS逆向感兴趣的朋友可以关注《爬虫JS逆向实战》,对分布式爬虫平台感兴趣的朋友可以关注《分布式爬虫平台搭建与开发实战》
还有未来会持续更新的验证码突防、APP逆向、Python领域等一系列文章

1. 接口分析

  首先找到我之前双十一购买的肉铺商品,开发者工具打开,点击商品的评价,可以看到发包请求参数如下:

在这里插入图片描述

如上参数一眼看过去的话应该是只有sign是加密生成的,32位的加密有经验的基本盲猜它是MD5加密的,之前我的文章有说到对于JS加密函数定位的各种技巧,可以去看看:JS逆向中快速搜索定位加密函数技巧总结

搜索MD5加密关键词,能够找到一个h函数,原生的加密算法,如下所示:

在这里插入图片描述

2. 断点分析

  经验只是辅助,我们还得实战。为了验证上述对加密算法的猜测,我们开始更近一步的分析,全局在搜搜请求的参数,一般运气不差的话能节省很多时间,找到相关参数sign,直接断点刷新请求,如下所示:

在这里插入图片描述

OK!j就是我们分析的目标,往上一点看j的生成,这样看的话我想h是加密函数,这里的话我们先不管它,直接看看被加密之前的明文是什么,将h里面的代码拿到控制台执行,如下所示:

在这里插入图片描述

以上则是加密之前的明文信息,可以看到又带着h执行了一下,出来的不就是sign嘛,稍后看h加密函数,现在要看的主要是明文的组成部分,token应该是固定值:

在这里插入图片描述

i的话是一个13位的时间戳,g是请求内的appKey参数值(固定的)之后就是data也是请求参数可以拿到的

在调试过程中,根据上面分析发现传入MD5参数是通过对几个参数进行拼接而得的,具体总结如下:

var token = ''; // 固定的
var data = '{"itemId":"746109770909","bizCode":"ali.china.tmall","channel":"pc_detail","pageSize":20,"pageNum":2}'; // 评论相关的参数,包括商品的ID、评论的页数等
var timestamp = 1700909312000; // (new Date()).getTime(),时间戳
var appKey = ''; // 固定的参数,在JS上面能看到是固定的,多次请求也能发现

var concatenatedString = token + "&" + timestamp + "&" + appKey + "&" + data; // 进行拼接

将上述拼接后的值进行MD5加密,然后与网页生成的MD5进行比对,从而验证MD5加密是否为原生的,而没有经过修改

3. 算法实现

  上面我们已经知道了加密之前的所有参数明文,现在我们再去看h函数,直接鼠标放置点击,如下所示:

在这里插入图片描述

跳转后发现h就一个原生的MD5加密算法,代码如下所示:

function h(a) {
    
    
    function b(a, b) {
    
    
        return a << b | a >>> 32 - b
    }
    function c(a, b) {
    
    
        var c, d, e, f, g;
        return e = 2147483648 & a,
        f = 2147483648 & b,
        c = 1073741824 & a,
        d = 1073741824 & b,
        g = (1073741823 & a) + (1073741823 & b),
        c & d ? 2147483648 ^ g ^ e ^ f : c | d ? 1073741824 & g ? 3221225472 ^ g ^ e ^ f : 1073741824 ^ g ^ e ^ f : g ^ e ^ f
    }
    function d(a, b, c) {
    
    
        return a & b | ~a & c
    }
    function e(a, b, c) {
    
    
        return a & c | b & ~c
    }
    function f(a, b, c) {
    
    
        return a ^ b ^ c
    }
    function g(a, b, c) {
    
    
        return b ^ (a | ~c)
    }
    function h(a, e, f, g, h, i, j) {
    
    
        return a = c(a, c(c(d(e, f, g), h), j)),
        c(b(a, i), e)
    }
    function i(a, d, f, g, h, i, j) {
    
    
        return a = c(a, c(c(e(d, f, g), h), j)),
        c(b(a, i), d)
    }
    function j(a, d, e, g, h, i, j) {
    
    
        return a = c(a, c(c(f(d, e, g), h), j)),
        c(b(a, i), d)
    }
    function k(a, d, e, f, h, i, j) {
    
    
        return a = c(a, c(c(g(d, e, f), h), j)),
        c(b(a, i), d)
    }
    function l(a) {
    
    
        for (var b, c = a.length, d = c + 8, e = (d - d % 64) / 64, f = 16 * (e + 1), g = new Array(f - 1), h = 0, i = 0; c > i; )
            b = (i - i % 4) / 4,
            h = i % 4 * 8,
            g[b] = g[b] | a.charCodeAt(i) << h,
            i++;
        return b = (i - i % 4) / 4,
        h = i % 4 * 8,
        g[b] = g[b] | 128 << h,
        g[f - 2] = c << 3,
        g[f - 1] = c >>> 29,
        g
    }
    function m(a) {
    
    
        var b, c, d = "", e = "";
        for (c = 0; 3 >= c; c++)
            b = a >>> 8 * c & 255,
            e = "0" + b.toString(16),
            d += e.substr(e.length - 2, 2);
        return d
    }
    function n(a) {
    
    
        a = a.replace(/\r\n/g, "\n");
        for (var b = "", c = 0; c < a.length; c++) {
    
    
            var d = a.charCodeAt(c);
            128 > d ? b += String.fromCharCode(d) : d > 127 && 2048 > d ? (b += String.fromCharCode(d >> 6 | 192),
            b += String.fromCharCode(63 & d | 128)) : (b += String.fromCharCode(d >> 12 | 224),
            b += String.fromCharCode(d >> 6 & 63 | 128),
            b += String.fromCharCode(63 & d | 128))
        }
        return b
    }
    var o, p, q, r, s, t, u, v, w, x = [], y = 7, z = 12, A = 17, B = 22, C = 5, D = 9, E = 14, F = 20, G = 4, H = 11, I = 16, J = 23, K = 6, L = 10, M = 15, N = 21;
    for (a = n(a),
    x = l(a),
    t = 1732584193,
    u = 4023233417,
    v = 2562383102,
    w = 271733878,
    o = 0; o < x.length; o += 16)
        p = t,
        q = u,
        r = v,
        s = w,
        t = h(t, u, v, w, x[o + 0], y, 3614090360),
        w = h(w, t, u, v, x[o + 1], z, 3905402710),
        v = h(v, w, t, u, x[o + 2], A, 606105819),
        u = h(u, v, w, t, x[o + 3], B, 3250441966),
        t = h(t, u, v, w, x[o + 4], y, 4118548399),
        w = h(w, t, u, v, x[o + 5], z, 1200080426),
        v = h(v, w, t, u, x[o + 6], A, 2821735955),
        u = h(u, v, w, t, x[o + 7], B, 4249261313),
        t = h(t, u, v, w, x[o + 8], y, 1770035416),
        w = h(w, t, u, v, x[o + 9], z, 2336552879),
        v = h(v, w, t, u, x[o + 10], A, 4294925233),
        u = h(u, v, w, t, x[o + 11], B, 2304563134),
        t = h(t, u, v, w, x[o + 12], y, 1804603682),
        w = h(w, t, u, v, x[o + 13], z, 4254626195),
        v = h(v, w, t, u, x[o + 14], A, 2792965006),
        u = h(u, v, w, t, x[o + 15], B, 1236535329),
        t = i(t, u, v, w, x[o + 1], C, 4129170786),
        w = i(w, t, u, v, x[o + 6], D, 3225465664),
        v = i(v, w, t, u, x[o + 11], E, 643717713),
        u = i(u, v, w, t, x[o + 0], F, 3921069994),
        t = i(t, u, v, w, x[o + 5], C, 3593408605),
        w = i(w, t, u, v, x[o + 10], D, 38016083),
        v = i(v, w, t, u, x[o + 15], E, 3634488961),
        u = i(u, v, w, t, x[o + 4], F, 3889429448),
        t = i(t, u, v, w, x[o + 9], C, 568446438),
        w = i(w, t, u, v, x[o + 14], D, 3275163606),
        v = i(v, w, t, u, x[o + 3], E, 4107603335),
        u = i(u, v, w, t, x[o + 8], F, 1163531501),
        t = i(t, u, v, w, x[o + 13], C, 2850285829),
        w = i(w, t, u, v, x[o + 2], D, 4243563512),
        v = i(v, w, t, u, x[o + 7], E, 1735328473),
        u = i(u, v, w, t, x[o + 12], F, 2368359562),
        t = j(t, u, v, w, x[o + 5], G, 4294588738),
        w = j(w, t, u, v, x[o + 8], H, 2272392833),
        v = j(v, w, t, u, x[o + 11], I, 1839030562),
        u = j(u, v, w, t, x[o + 14], J, 4259657740),
        t = j(t, u, v, w, x[o + 1], G, 2763975236),
        w = j(w, t, u, v, x[o + 4], H, 1272893353),
        v = j(v, w, t, u, x[o + 7], I, 4139469664),
        u = j(u, v, w, t, x[o + 10], J, 3200236656),
        t = j(t, u, v, w, x[o + 13], G, 681279174),
        w = j(w, t, u, v, x[o + 0], H, 3936430074),
        v = j(v, w, t, u, x[o + 3], I, 3572445317),
        u = j(u, v, w, t, x[o + 6], J, 76029189),
        t = j(t, u, v, w, x[o + 9], G, 3654602809),
        w = j(w, t, u, v, x[o + 12], H, 3873151461),
        v = j(v, w, t, u, x[o + 15], I, 530742520),
        u = j(u, v, w, t, x[o + 2], J, 3299628645),
        t = k(t, u, v, w, x[o + 0], K, 4096336452),
        w = k(w, t, u, v, x[o + 7], L, 1126891415),
        v = k(v, w, t, u, x[o + 14], M, 2878612391),
        u = k(u, v, w, t, x[o + 5], N, 4237533241),
        t = k(t, u, v, w, x[o + 12], K, 1700485571),
        w = k(w, t, u, v, x[o + 3], L, 2399980690),
        v = k(v, w, t, u, x[o + 10], M, 4293915773),
        u = k(u, v, w, t, x[o + 1], N, 2240044497),
        t = k(t, u, v, w, x[o + 8], K, 1873313359),
        w = k(w, t, u, v, x[o + 15], L, 4264355552),
        v = k(v, w, t, u, x[o + 6], M, 2734768916),
        u = k(u, v, w, t, x[o + 13], N, 1309151649),
        t = k(t, u, v, w, x[o + 4], K, 4149444226),
        w = k(w, t, u, v, x[o + 11], L, 3174756917),
        v = k(v, w, t, u, x[o + 2], M, 718787259),
        u = k(u, v, w, t, x[o + 9], N, 3951481745),
        t = c(t, p),
        u = c(u, q),
        v = c(v, r),
        w = c(w, s);
    var O = m(t) + m(u) + m(v) + m(w);
    return O.toLowerCase()
}

这里我们可以通过扣JS代码来还原加密算法,直接调用上面的JS代码。但是核心只需要做一个MD5加密,完全不用这么麻烦,直接使用Python一行代码即可实现。代码实现如下所示:

import time
import hashlib

token = '' # 自行添加
app_key = '' # 自行添加
data = '{"itemId":"746109770909","bizCode":"ali.china.tmall","channel":"pc_detail","pageSize":20,"pageNum":1}'
current_time = int(time.time() * 1000)
to_be_hashed = f'{
      
      token}&{
      
      current_time}&{
      
      app_key}&{
      
      data}'
# sign加密实现
sign = hashlib.md5(string.encode()).hexdigest()

请求调用部分代码如下:

import requests

# data就是上面的data
# sign也是上面计算出来的sign
def make_request(data, sign):
    url = 'https://h5api.m.tmall.com/h5/mtop.alibaba.review.list.for.new.pc.detail/1.0/'
    
    cookies = {
    
    
        'Cookie': '...'  # cookies
    }

    headers = {
    
    
        'authority': 'h5api.m.tmall.com',
        'accept': '*/*',
        'accept-language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
        'referer': 'https://detail.tmall.com/',
        'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"macOS"',
        'sec-fetch-dest': 'script',
        'sec-fetch-mode': 'no-cors',
        'sec-fetch-site': 'same-site',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
    }

    params = {
    
    
        'jsv': '2.7.0',
        'appKey': '12574478',
        't': int(time.time() * 1000),
        'sign': sign,
        'api': 'mtop.alibaba.review.list.for.new.pc.detail',
        'v': '1.0',
        'isSec': '0',
        'ecode': '0',
        'timeout': '10000',
        'ttid': '2022@taobao_litepc_9.17.0',
        'AntiFlood': 'true',
        'AntiCreep': 'true',
        'preventFallback': 'true',
        'type': 'jsonp',
        'dataType': 'jsonp',
        'callback': 'mtopjsonp4',
        'data': data,
    }

    response = requests.get(url, params=params, cookies=cookies, headers=headers)
    
    print(response.text)

在终端执行上述代码测试效果如下:

在这里插入图片描述

这里提醒一点,是有滑块验证的:

在这里插入图片描述

商品详情接口是一样的,参数有点变化,如下所示:

在这里插入图片描述

  好了,到这里又到了跟大家说再见的时候了。创作不易,帮忙点个赞再走吧。你的支持是我创作的动力,希望能带给大家更多优质的文章

猜你喜欢

转载自blog.csdn.net/qiulin_wu/article/details/134643581
今日推荐