How to crawl down the daily real-time ranking works and addresses of Xiaopo Station?
Open station B and click on the ranking list of the works of station B to enter the ranking list.
Right-click to check the source code and find the corresponding source code position of the work. At this time, we know the corresponding approximate position of the work in the code.
Then import requests and BeautifulSoup in Pycharm
import requests
from bs4 import BeautifulSoup
Because the approximate position corresponding to the work is in the div block (class='info'), use find_all to find this position of all works. (Station B does not need to obtain headers to disguise the header to be able to access successfully)
url = 'https://www.bilibili.com/v/popular/rank/all'
res = requests.get(url) # B站不用获取headers伪装头
name_list = []
b_list = BeautifulSoup(res.text, 'lxml').find_all('div', class_='info')
print(b_list)
At this point, the source code in info is obtained. The
work and address we need are also in it, so we need to get it a second time and set the variable q to traverse the code. Receive with the empty list name_list created at the beginning, set the kind put address href.
for q in b_list:
name_list.append(q.a.text)
kind = q.find('a')
Finally, sort the display.
for i, x in enumerate(name_list):
print(i+1, x+"\t"+'地址:'+kind['href']+'\n')
Crawl succeeded! The sequence is serial number + work name + address.