Cool Dog soared list, write CSV file
Cool Dog Music soaring crawling list of the top ten song, artist, time, is a good example of crawling web content, reptiles are not familiar to readers familiar with how the crawler is crawling web content based on this example.
Need to use the library: Requests library, BeautifulSoup library, time library;
请求头:'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
URL: https://www.kugou.com/yy/rank/home/1-6666.html?from=rank
Run the complete code:
1 import requests 2 from bs4 import BeautifulSoup 3 import time 4 5 # 请求头 6 headers = { 7 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' 8 } 9 10 def requests_list(url): 11 wb_data = requests.get(url,headers=headers) 12 soup = BeautifulSoup(wb_data.text,'lxml' ) 13 is the nums = soup.select ( ' span.pc_temp_num ' ) # rank 14 the titles soup.select = ( ' div.pc_temp_songlist> UL> Li> A ' ) # Title 15 Times = soup.select ( ' span.pc_temp_tips_r> span ' ) # song time 16 # define a convenient n determines whether to take only the top ten list of the songs soaring . 17 n = 0 18 is # data taken each cycle to climb into an empty dictionary . 19 data = [] 20 is data .append ([ ' NUM ' ,'singer','song','time']) 21 for num,title,time in zip(nums,titles,times): 22 data.append([ 23 num.get_text().strip(), 24 title.get_text().split('-')[0],#用"-"分割歌手和歌名 25 title.get_text().split('-')[1], 26 time.get_text().strip() 27 ]) 28 n=n+1 29 IF n-> = 10 : 30 BREAK 31 is Print (Data) 32 return Data 33 is 34 is DEF save_to_csv (Data): 35 # open kugou.csv file, the data is written into crawling 36 fr = Open ( " kugou.csv " , " W " ) 37 [ for S in Data: 38 is fr.write ( " , " .join (S) + " \ n- " ) 39 40 IF the __name__ == '__main__': 41 urls = "https://www.kugou.com/yy/rank/home/1-6666.html?from=rank" 42 save_to_csv(requests_list(urls))
Note: If the crawling process, there is a problem, you can blog comments below, small series will be answered oh