Acting capture

APP packet capture

Earlier we learned some knowledge about Python reptile, but are crawling content-based PC client browser page. App now use mobile phones more and more, and many did not end pages, such as vibrato would be no web version, then the above video batch would not be able to crawl yet?

The answer is No! For App for pages in the communication process and is similar to the application, sends a request to the background is acquired data. In the browser we open the debugging tool, you can see the contents of a specific request, we can not directly in the App see. So we're going to get the information through the App request and response capture tool. About packet capture tools Wireshark, Fiddler, Charles and so on. Today we talk about how to make phone App packet capture with Fiddler.

Works Fiddler's the equivalent of a proxy, after configured, we will send out a request from the mobile phone App sent by Fiddler, the information returned by the server will transfer once the Fiddler. So we can see and respond to the request sent to the server App server by Fiddler.

Fiddler installation configuration

After we installed Fiddler, first in the menu Tool> Options> Https following two places selected.

Figure a .png

Connections tab and then check on the following Allow remote computers to connect, allowing other devices to accept the request Fiddler.
Also keep in mind here is the port number, the default is 8088, when the need to fill the phone side.

Figure II .png

Configuration is complete, save, be sure to turn off the Fiddler reopened.

End mobile phone configuration

To ensure mobile phones and computers in the same LAN, we look under the computer's IP address, type ipconfig in cmd can be seen. My computer using a wireless network, the IP address is 192.168.1.3.

Figure III .png

Open the phone a wireless connection, select the hotspot you want to connect. Press and select Modify network, we fill the computer's IP address and port in the Fiddler proxy agent. As shown below:

Figure IV

Figure V

保存后,在手机原生浏览器打开 http://192.168.1.3:8008 ,就是上面我们的计算机 IP 和端口。这一步我在夸克浏览器中打开是不行的,一定要到手机自带的浏览器打开。

打开后,点击下图链接,下载证书,然后安装证书。

电脑端浏览器也需要打开此地址,安装证书,方便以后对浏览器的抓包操作。

Figure VI

安装后就万事 OK 了,可以用手机打开 App ,在 Fiddler 上愉快的抓包了。

抓包

我们打开抖音 App,会发现 Fiddler 上出来很多连接。我们先清空没用的连接信息,然后滑动到某个人的主页上,来查看他发布过的所有视频,同时在 Fiddler 上找到视频链接。

Figure VII

经过观察筛选我们可以看出上图就是我们需要的请求地址,这个地址其实是可以在浏览器上打开的,但是我们需要改一下浏览器的User-Agent,我用的是Firefox的插件,打开后和 Fiddler 右边的信息是一致的。我们看下 Fiddler 右边该请求的响应信息。

Figure VIII

看到返回了一个 JSON 格式的信息,其中aweme_list 就是我们需要的视频地址,has_more=1 表示往上滑动还会加载更多。之后就可以写代码了。

代码

代码很简单,和我们前几篇讲的一样,直接用 requests 请求相应链接即可。

代码仅做为一个简单的例子,仅仅下载当前页面的内容,如果要下载全部的视频,可以根据当次返回 JSON 结果中的 has_more 和 max_cursor 参数构造出新的 URL 地址不断的下载。

URL 中的 user_id 可以根据自己要爬取的用户更改,可以通过把用户分享到微信,然后在浏览器中打开链接,在打开的 URL 中可以看到用户的 user_id。

import requests
import urllib.request
def get_url(url):
    headers = {'user-agent': 'mobile'}
    req = requests.get(url, headers=headers, verify=False)
    data = req.json()
    for data in data['aweme_list']:
        name = data['desc'] or data['aweme_id']
        url = data['video']['play_addr']['url_list'][0]
        urllib.request.urlretrieve(url, filename=name + '.mp4')


if __name__ == "__main__":
    get_url('https://api.amemv.com/aweme/v1/aweme/post/?max_cursor=0&user_id=98934041906&count=20&retry_type=no_retry&mcc_mnc=46000&iid=58372527161&device_id=56750203474&ac=wifi&channel=huawei&aid=1128&app_name=aweme&version_code=421&version_name=4.2.1&device_platform=android&ssmix=a&device_type=STF-AL10&device_brand=HONOR&language=zh&os_api=26&os_version=8.0.0&uuid=866089034995361&openudid=008c22ca20dd0de5&manifest_version_code=421&resolution=1080*1920&dpi=480&update_version_code=4212&_rticket=1548080824056&ts=1548080822&js_sdk_version=1.6.4&as=a1b51dc4069b2cc6252833&cp=dab7ca5f68594861e1[wIa&mas=014a70c81a9db218501e1433b04c38963ccccc1c4cac4c6cc6c64c')

 

Run after you can get a list of videos:

Figure IX

Guess you like

Origin www.cnblogs.com/lowen107/p/11230356.html