Engineers will learn --App Python reptile crawling combat ✌✌

Python Reptile engineers will learn to crawl actual data --App

With the gradual expansion of market share in the mobile Internet, mobile phone APP has occupied our lives, past data analysis by means of both reptiles crawling web data analysis, but some new products only APP, and did not want to this page end extracting data we encountered some problems, beans, fruit dishes APP chapter to give you an example that demonstrates how to extract the phone's data.

Installation Fiddler

Fiddler's official website Download: HTTP: //www.fiddler2.com/fiddl ... , I downloaded version of Baidu search directly

The installation process is the next step to finalize the next step, we need to configure some of the content installed after

Crawl HTTPS setting allows packets
to open the download good fiddler, find the Tools -> Options, then check Decrpt HTTPS traffic in HTTPS toolbar, select Ignore server certificate errors in the new pop-up options bar. In this way, fiddler will crawl to HTTPS packets
image description

Setting allows the external device to send HTTP / HTTPS to fiddler
check Allow remote computers in the Connections option bar to connect
image description

Phone and computer communication

Want to grab the data on your phone APP a major difficulty is that you do not know the address of the interface is how much of their data request, the data on the PC side you want to crawl a site just visit our Web site, you can know with a packet capture tool so we put the first step in a good environment configuration, it is to visit the address (to send any network request) on the phone can be crawled by Fiddler on the computer.

The first step: to protect mobile phones and computers connected to the network above, I have here is a computer connected to the network cable, I installed a separate Wi-Fi Sharing Wizard, phone (iphone6s) shared out wifi connection
image description

第二步:查看电脑IP地址
先在电脑上打开cmd,输入ipconfig查看IP地址
image description
这里要注意IP地址用的是无线网络连接这个IP地址,不是本地连接的IP地址(坑点)

第三步:手机设置HTTP代理
打开手机无线网络连接,选择已经连接的网络连接,点击一个小圆圈叹号进入可以看到下图,选择配置代理,进入后把刚刚的IP地址输入进去,端口就是Fiddler中设置的8888即可。
image description

第四步:手机和电脑端安装证书
电脑端访问:http://localhost:8888/进行安装
image description
手机访问电脑的IP地址加端口8888即可,我这里的地址是:http://192.168.23.1:8888
image description

第五步:测试通过
最后就是来测试下,打开手机随便一个APP,去访问里面的内容,这时打开fiddler可以看到所发出的网络请求,我这里打开的是豆果美食APP
image description

分析手机APP请求地址

通过观察fiddler中的请求可以发现http://api.douguo.net/persona...,这个就是请求首页中的部分数据,直接把地址复制到网页中可以看到返回的JSON数据
image description

其实这部分内容是最重要也是最困难的一个环节,考验你工作年限的时候到了,要从中剥离出正确的API请求,并分析API中的数据结构,为后续数据分析做准备。

Python3.x爬虫获取数据

这里直接通过urllib.request进行请求即可,这里并没有使用框架,代码如下:

import urllib.request

# 向指定的url地址发送请求,并返回服务器响应的类文件对象
response = urllib.request.urlopen("http://api.douguo.net/personalized/home/0/20")

# 服务器返回的类文件对象支持Python文件对象的操作方法
# read()方法就是读取文件里的全部内容,返回字符串
html = response.read()
 
# 打印响应内容 print(html.decode("unicode_escape"))

Print run the code as data
image description
subsequent to this data is stored, or the analysis is the follow-up operation, and to this we have completed the steps to extract data from the mobile phone APP

Guess you like

Origin www.cnblogs.com/itye2/p/11653443.html