Python crawling Cool Dog soared chart top ten (100) First, write CSV file

Cool Dog soared list, write CSV file

Cool Dog Music soaring crawling list of the top ten song, artist, time, is a good example of crawling web content, reptiles are not familiar to readers familiar with how the crawler is crawling web content based on this example.

Need to use the library: Requests library, BeautifulSoup library, time library;

请求头:'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

URL: https://www.kugou.com/yy/rank/home/1-6666.html?from=rank

 

Run the complete code:

 1 import requests
 2 from bs4 import BeautifulSoup
 3 import time
 4 
 5 # 请求头
 6 headers = {
 7     'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
 8 }
 9 
10 def requests_list(url):
11     wb_data = requests.get(url,headers=headers)
12     soup = BeautifulSoup(wb_data.text,'lxml' )
 13 is      the nums = soup.select ( ' span.pc_temp_num ' ) # rank 
14      the titles soup.select = ( ' div.pc_temp_songlist> UL> Li> A ' ) # Title 
15      Times = soup.select ( ' span.pc_temp_tips_r> span ' ) # song time 
16      # define a convenient n determines whether to take only the top ten list of the songs soaring 
. 17      n = 0
 18 is      # data taken each cycle to climb into an empty dictionary 
. 19      data = []
 20 is      data .append ([ ' NUM ' ,'singer','song','time'])
21     for num,title,time in zip(nums,titles,times):
22         data.append([
23             num.get_text().strip(),
24             title.get_text().split('-')[0],#用"-"分割歌手和歌名
25             title.get_text().split('-')[1],
26             time.get_text().strip()
27         ])
28         n=n+1
29          IF n-> = 10 :
 30              BREAK 
31 is      Print (Data)
 32      return Data
 33 is  
34 is  DEF save_to_csv (Data):
 35      # open kugou.csv file, the data is written into crawling 
36      fr = Open ( " kugou.csv " , " W " )
 37 [      for S in Data:
 38 is          fr.write ( " , " .join (S) + " \ n- " )
 39      
40  IF  the __name__ == '__main__':
41     urls = "https://www.kugou.com/yy/rank/home/1-6666.html?from=rank"
42     save_to_csv(requests_list(urls))

Note: If the crawling process, there is a problem, you can blog comments below, small series will be answered oh

Guess you like

Origin www.cnblogs.com/cyt99/p/12041399.html