Python web crawler - crawling network keyword information

 

This code uses the requests library and the BeautifulSoup library to obtain and parse the title information in the Sogou search results page. The specific steps are as follows:

  1. Import the required libraries: requests and BeautifulSoup.
  2. defines a functionget_search_results() for obtaining keyword search results.
  3. Construct the URL of the search keyword and splice the keyword into the URL.
  4. Set request header information, including User-Agent.
  5. Use the get() method of the requests library to send an HTTP request and obtain the corresponding content.
  6. Check whether the response status code is 200. If so, the request is successful. Use the BeautifulSoup library to parse the response HTML document.
  7. Use the find_all() method to find all "h3" tag elements with a "class" attribute of "res-title", which contain the title of the search results.
  8. Traverse all found title elements, use the get_text() method to get the title text, and print it out.
  9. If the response status code is not 200, a "request failed" message will be printed.

At the end of the code, call theget_search_results() function and pass in the keywords you want to search for, such as "National University Intelligent Fishery Competition".

Please make sure you have installed the required libraries (requests and BeautifulSoup), and modify the request header information and keywords in the code appropriately as needed.

This code is used to obtain the title information in the search results page from Sogou search.

  1. Import library
import requests
from bs4 import BeautifulSoup

Imported the required libraries requests and BeautifulSoup.

  1. Define functions to obtain search results get_search_results(keyword)
def get_search_results(keyword):
    url = f"https://www.so.com/s?q={keyword}"

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")

        results = soup.find_all("h3", class_="res-title")

        for result in results:
            title = result.get_text()
            print(title)
    else:
        print("请求失败")

This function receives a keyword parameter keyword, which is used to specify the search keyword. Inside the function, build the URL of the search results page based on the keywords. Then, set the request header information, including User-Agent. Next, get the HTML content of the search results page by sending a GET request. If the request is successful (status code is 200), use BeautifulSoup to parse the HTML content. Use the find_all() method to find all  tags with res-title class names. These tags contain the title information of the search results. Then, iterate through each title, use the  method to get the title text, and print it out. If the request fails, a prompt message is output. h3get_text()

  1. Call the function and pass in the keyword
get_search_results("全国高校智慧渔业大赛")

In this part of the code, the get_search_results function is called, and the keyword "National University Smart Fishery Competition" is passed in. This will output the title information in the search results page.

If you have other questions, you can continue to ask.

import requests
from bs4 import BeautifulSoup


def get_search_results(keyword):
    url = f"https://www.so.com/s?q={keyword}"

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")

        results = soup.find_all("h3", class_="res-title")

        for result in results:
            title = result.get_text()
            print(title)
    else:
        print("请求失败")


# 在这里调用函数,并传入你想要搜索的关键字
get_search_results("全国高校智慧渔业大赛")

Guess you like

Origin blog.csdn.net/weixin_66547608/article/details/134126912