Get the name and address of the top 100 works in the ranking of station B by Crawler

How to crawl down the daily real-time ranking works and addresses of Xiaopo Station?

Open station B and click on the ranking list of the works of station B to enter the ranking list.
Insert picture description here
Right-click to check the source code and find the corresponding source code position of the work. At this time, we know the corresponding approximate position of the work in the code.
Insert picture description here
Then import requests and BeautifulSoup in Pycharm

import requests
from bs4 import BeautifulSoup

Insert picture description here
Because the approximate position corresponding to the work is in the div block (class='info'), use find_all to find this position of all works. (Station B does not need to obtain headers to disguise the header to be able to access successfully)

url = 'https://www.bilibili.com/v/popular/rank/all'
res = requests.get(url)	# B站不用获取headers伪装头
name_list = []	
b_list = BeautifulSoup(res.text, 'lxml').find_all('div', class_='info')
print(b_list)

At this point, the source code in info is obtained. The
Insert picture description here
work and address we need are also in it, so we need to get it a second time and set the variable q to traverse the code. Receive with the empty list name_list created at the beginning, set the kind put address href.

for q in b_list:
    name_list.append(q.a.text)
    kind = q.find('a')

Finally, sort the display.

for i, x in enumerate(name_list):
    print(i+1, x+"\t"+'地址:'+kind['href']+'\n')

Insert picture description here
Crawl succeeded! The sequence is serial number + work name + address.

Guess you like

Origin blog.csdn.net/JasonZ227/article/details/109962293