Python gets the hot search data of a short video and saves it in Excel

Python gets the hot search data of a short video and saves it in Excel

1. Obtain the goal and prepare for the work

1. Acquisition target: This acquisition tutorial target: a short video hot search

2. Preparation

  • environment python3.x
  • requests
  • pandas

       requests and pandas are the libraries required for this tutorial. requests is used to simulate http requests, and pandas is used for data processing (save the results as Excel).

  • Open the requested page in the Chrome browser, and press F12 to open the browser console. Click Network to select the network, and then click XHR. Find the corresponding XHR request, and you can get the hot search data interface.

2. Start coding

  1. Import dependent libraries
import requests
import pandas as pd
  1. Construct a request header:
browse_header = {
    
    
    "Accept": "application/json, text/plain, */*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
    "Host": "www.douyin.com",
    "Referer": "https://www.douyin.com/discover",
    "Cookie": "_xsrf=Pd0NpG6J8kZdHtzBVnNyQP1g0rO7NKeg; _zap=d7f27b9f-4fe3-4ef4-9376-df278af16940;"
}
  1. Define a request interface, that is, the data address
url = "https://www.douyin.com/aweme/v1/web/hot/search/list/?device_platform=webapp&aid=6383&channel=channel_pc_web"
  1. Send the request, since the interface returns JSON format, so here is one step, and the response result is also converted into JSON format.
res = requests.get(url, headers=browse_header).json()
  1. Extract the hot search data list.
# 实时上升热点列表
content_list = res['data']['word_list']
  1. Then perform json analysis separately, corresponding fields (title, ranking, hot search index, description, link address).
df = pd.DataFrame(  # 拼装爬取到的数据为DataFrame
	{
    
    
		'热搜标题': title_list,
		'热搜排名': order_list,
		'热搜指数': score_list,
		'描述': desc_list,
		'链接地址': url_list
	}
)
df.to_excel('百度热搜榜.xlsx', index=False)  # 保存结果数据

Note: In this code, the returned link address is a bit different, we have to make some adjustments: the adjustments are as follows:
url_list.append(f"https://www.douyin.com/hot/{content['sentence_id']}")

Full code:

import requests
import pandas as pd


browse_header = {
    
    
    "Accept": "application/json, text/plain, */*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
    "Host": "www.douyin.com",
    "Referer": "https://www.douyin.com/discover",
    "Cookie": "_xsrf=Pd0NpG6J8kZdHtzBVnNyQP1g0rO7NKeg; _zap=d7f27b9f-4fe3-4ef4-9376-df278af16940;"
}

url = "https://www.douyin.com/aweme/v1/web/hot/search/list/?device_platform=webapp&aid=6383&channel=channel_pc_web"

res = requests.get(url, headers=browse_header).json()
# 实时上升热点
content_list = res['data']['word_list']
title_list = []
order_list = []
score_list = []
desc_list = []
url_list = []
index = 0
for content in content_list:
    index += 1
    order_list.append(content['position'])
    title_list.append(content['word'])
    score_list.append(content['hot_value'])
    desc_list.append(content['word'])
    url_list.append(f"https://www.douyin.com/hot/{
      
      content['sentence_id']}")

df = pd.DataFrame({
    
    
    '热搜标题': title_list,
    '热搜排名': order_list,
    '热搜热度': score_list,
    '描述': desc_list,
    '链接地址': url_list
})
df.to_excel('抖音热搜榜.xlsx', index=False)  # 保存结果数据

Finally, check the obtained data:
insert image description here
a total of 50 pieces of data.

Guess you like

Origin blog.csdn.net/qq_43762932/article/details/131307467