The basic routine of python using beautifulsoup to crawl - Code World

The basic routine of python using beautifulsoup to crawl

Others 2022-04-28 05:04:23 views: 0

Using python3, for example, to climb the kugo list:

import requests
from bs4 import BeautifulSoup
import time

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}

def get_info(url):
    wb_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(wb_data.text,'lxml')
    ranks = soup.select('span.pc_temp_num')
    titles = soup.select('div.pc_temp_songlist > ul > li > a')
    times = soup.select('span.pc_temp_tips_r > span')
    for rank,title,time in zip(ranks,titles,times):
        data = {
            'rank':rank.get_text().strip(),
            'singer':title.get_text().split('-')[0],
            'song':title.get_text().split('-')[0],
            'time':time.get_text().strip()
        }
        print(data)

if __name__ == '__main__':
    urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html'.format(str(i)) for i in range(1,2)]
    for url in urls:
        get_info(url)
        time.sleep(5)

In the above code from bs4 import BeautifulSoup first import;
then set headers,
then soup = BeautifulSoup(wb_data.text,'lxml') In, call BeautifulSoup,
set lxml parser;
then in
ranks = soup.select('span. pc_temp_num')
titles = soup.select('div.pc_temp_songlist > ul > li > a')
These, XPATH use the CHROME browser's check function to check it;
then a loop, print out the data, pay attention to the use of strip to remove spaces;
then
urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html'.format(str(i)) for i in range(1,2)]
It is a very distinctive syntax in python. Set a URL template, where {} is to be replaced with the content in the format;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326112612&siteId=291194637

The basic routine of python using beautifulsoup to crawl

Python uses BeautifulSoup to crawl CSDN blogs (1)

Python modules using Beautifulsoup of reptiles

Use Python's BeautifulSoup to crawl Ganji.com

How to crawl Weibo comments using python?

Using Python library reptile BeautifulSoup4

Under Python parsing HTML using the BeautifulSoup

BeautifulSoup4 basic tutorial of python crawler

Python crawler advanced article-use the beautifulsoup library to crawl web page article content practical demonstration

Python crawler | Use Selenium and BeautifulSoup to crawl xxxticket information and save it to an Excel file

The basic function of using Python

The basic function using python

The basic function of using Python

Python crawler actual combat using Airtest and mitmdump to crawl app data

A super useful function using python reptile requests BeautifulSoup

Getting started with python crawler basics - using requests and BeautifulSoup

Practical tips for extracting web page data using Python and BeautifulSoup

【Python】BeautifulSoup

Basic Operations of BeautifulSoup

The basic function of using Python--

Basic instructions for using Python pkgutil

python reptiles combat: crawler base (using BeautifulSoup4 like) reptiles python combat: crawler base (using BeautifulSoup4 etc.)

The reason for using scrapy framework to crawl

Using Scrapy to crawl Douban movies

Use BeautifulSoup to crawl foreign exchange data and organize its usage

Reptile Combat: Using Scrapy and BeautifulSoup

BeautifulSoup4 basic use

Basic usage of BeautifulSoup4

Python crawler combat | Using multi-threading to crawl LOL HD wallpaper

Using Python to Crawl the Keywords of Baidu's Today's Hot Event Ranking List

Recommended

Ranking

[Algorithm] greedy _ program scheduling issues

Spring 控制反转（IOC）

Data structure-6.6 figure

Indicates that the class or member method has abstract properties

Huawei v5 server installed Linux operating system

Postgresql source code analysis - creating ordinary tables

Chapter 10 Evaluation Classification Results

Cloud service Ubuntu 20.04 version uses Nginx to deploy static web pages

Java Exercise 17.1

Solve the problem that git cannot automatically push submission in IDEA Push failed: Failed with error: Could not read from remote repository.

Daily

More

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)

2024-05-01(4)

2024-04-30(36)