Crawling data within the app! Getting started with mitmproxy! Practical introduction to Python crawler

How to obtain data source information in the mobile app? Next, take the taptap mobile app as an example to obtain stand-alone ranking data.

Step 1: Configure the environment​

First, install mitmproxy on your computer. For the installation method, please refer to the official website. The following takes macOS as an example.

brew install mitmproxy

python3, requests library, and openpyxl library can be installed using mirrors in China. Please refer to the following.

pip3 install openpyxl -i http://pypi.douban.com/simple --trusted-host=pypi.douban.com
pip3 install requests -i http://pypi.douban.com/simple --trusted-host=pypi.douban.com

Also introduce some system libraries
import requests
from openpyxl import Workbook
from openpyxl.drawing.image import Image
import os
import random
import time

Open the webpage with a browser on your mobile phone and install the certificate. 

Step 2: Data acquisition

After installing mitmproxy, execute mitmproxy directly in the computer command terminal.


Then open the TapTap app on your phone and select Discover->Stand-alone. You can see many http requests in the computer terminal.

After clicking in one by one, select Response to find the link to the data we need. 

Click Request to see the request link and parameters. These are the links and parameters for obtaining the data source. 

Turn a few more pages on your mobile phone and click on a few more link details. You can find that the from parameter is a page turning parameter. So how to get it in python3? Reference is as follows
















Part 3: Data Analysis

By looking at the content in the returned json and the display data in the mobile application, you can roughly find the fields corresponding to the data.

Let’s take a look at how to handle it in python.

data_list=content['data']['list']
for data in data_list:
  data_id = data['id']
  data_title = data['title']
  data_stat = data['stat']
  link = f'https://www.taptap.com/app/{data_id}/'
  tags = ','.join([tag['value'] for tag in data['tags']])
  icon_url = data['icon']['url']
  score = data_stat['rating']['score']
  fans_count = data_stat['fans_count'] #关注
  hits_total = data_stat['hits_total'] #下载

Part 4: Data Storage

This time we use Excel to save the data and use the openpyxl library for processing. In addition, we can also insert icon icons. We can first download the pictures to the icon folder, and then insert the pictures into the table when reading the data.

First initialize the content of the first row of the table and create a new icon folder.

ICON_TEMP='icon'
if os.path.isdir(ICON_TEMP)==False:
  os.mkdir(ICON_TEMP)

TITLE_LIST=['Ranking','id','Game Name','Address','Rating','Follow','Downloads','icon'] wb = Workbook() dest_filename = '
taptap_rank.xlsx
'
ws1 = wb.active
for col in range(0, len(TITLE_LIST)):
    _ = ws1.cell(column=col+1, row=1, value="{0}".format(TITLE_LIST[col]))
row_count = 1
Then when reading each piece of data, download the icon image and insert the corresponding data into the table

row_count = row_count+1    
icon_path = os.path.join('.',ICON_TEMP, f'{data_id}.png')
if os.path.isfile(icon_path)==False:
  time.sleep(random.random()*2)
  icon_r = requests.get(icon_url);
  with open(icon_path, 'wb') as fd:
      fd.write(icon_r.content)
_ = ws1.cell(column=1, row=row_count, value="{0}".format(row_count-1))
_ = ws1.cell(column=2, row=row_count, value="{0}".format(data_id))
_ = ws1.cell(column=3, row=row_count, value="{0}".format(data_title))
_ = ws1.cell(column=4, row=row_count, value="{0}".format(link))
_ = ws1.cell(column=5, row=row_count, value="{0}".format(score))
_ = ws1.cell(column=6, row=row_count, value="{0}".format(fans_count))
_ = ws1.cell(column=7, row=row_count, value="{0}".format(hits_total))
img = Image(icon_path)
img.width=img.height=50
ws1.add_image(img, f'H{row_count}')
结果预览:

Summarize:

First, obtain the data link and parameters through the mitmproxy proxy, then search the parameters we need using the mobile phone data, write the corresponding processing code, and save it in an excel table.

The above is the latest thing I learned. If there are any mistakes or new ideas, please leave a message to point out! If I learn something new, I will share it with you as soon as possible!

The picture materials used in this article are from the Internet! The copyright belongs to the original author. If there is any infringement, please contact us!

This article is only for personal communication and learning, please do not use it for other purposes.

 

Guess you like

Origin blog.csdn.net/m0_68353775/article/details/127618316