python crawler cool dog 500TOP

import requests
from bs4 import BeautifulSoup
import time
headers = {"User-Agent":"Mozilla/5.0"}
def get_info(url):
    wb_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(wb_data.text,'lxml')
    ranks = soup.select('span.pc_temp_num')
    titles = soup.select('div.pc_temp_songlist>ul>li>a')
    times = soup.select('span.pc_temp_tips_r>span')
    for rank,title,time in zip(ranks,titles,times):
        data = {
            'rank':rank.get_text().strip(),
            'singer':title.get_text().split('-')[0],
            'song':title.get_text().split('-')[1],
            'time':time.get_text().strip()
        }
        print(data)
if __name__=='__main__':
    urls = ['https://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(str(i)) for i in range(1,24)]
    for url in urls:
        get_info(url)
    time.sleep(1)

When you see this program, OK, don't continue reading. If you are a little ignorant, please continue reading.


  1. Import library Import the library required by the program, this does not need too much explanation. The requests library is used to request web pages to obtain web page data. BeautifulSoup is used to parse web page data, and the sleep () method of the time library can make the program pause.
  2. Simulated browser
    Disguised as a browser, it makes the crawler more stable.
  3. function

3.1 select function

  1. soup.select ('div'): All elements named <div>.
  2. soup.select ('# author'): An element named id with an id attribute.
  3. soup.select ('. notice'): All elements named notice using CSS class attribute
  4. soup.select ('div span'): all elements within the <div> element
  5. soup.select ('div> span'): All <span> elements directly within the <div> element, with no other elements in the middle
  6. soup.select ('input [name]'): All elements named <input> and have a name attribute whose value does not matter
  7. soup.select ('input [type = “button”]'): All named <input> and have a type attribute whose value is button.

3.2 strip () function

The strip () string method will return a new string with no white space at the beginning or end.
The lstrip () and rstrip () methods will delete the left or right blank characters accordingly.
3.3
split () function split () uses a space as a delimiter to split a string into multiple parts and store these parts in a list.

  1. Main function The
    main entrance of the program.
Published 19 original articles · Likes2 · Visits 1100

Guess you like

Origin blog.csdn.net/qq_42692319/article/details/102711695
Dog