[Xiaobai must see] Python crawls an example of NBA player data


insert image description here

foreword

Sample code for crawling NBA player data using Python. By sending an HTTP request, parse the HTML page, then extract the required ranking, name, team and score information, and save the result to a file.

Import required libraries and modules

insert image description here

import requests
from lxml import etree
  • Use requeststhe library to send HTTP requests.
  • Use lxmla library for HTML parsing.

Set request header and request address

insert image description here

url = 'https://nba.hupu.com/stats/players'
headers ={
    
    
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'
}
  • Set request header information, including User-Agent.
  • Set the address of the request to 'https://nba.hupu.com/stats/players'.

Send HTTP request and get response

insert image description here

resp = requests.get(url, headers=headers)
  • Use requeststhe library to send an HTTP GET request, and pass in the request URL and request header information.
  • Save the returned response in a variable resp.

Handle the response result

insert image description here

e = etree.HTML(resp.text)
  • Use etree.HTMLfunctions to parse the returned response text into an actionable HTML element tree object.
  • Save the parsed result in a variable e.

Analytical data

insert image description here

nos = e.xpath('//table[@class="players_table"]//tr/td[1]/text()')
names = e.xpath('//table[@class="players_table"]//tr/td[2]/a/text()')
teams = e.xpath('//table[@class="players_table"]//tr/td[3]/a/text()')
scores = e.xpath('//table[@class="players_table"]//tr/td[4]/text()')
  • Use XPath expressions to extract the required data from the HTML element tree.
  • Save the ranking (nos), names (names), teams (teams) and scores (scores) in the corresponding variables.

save the result to a file

with open('nba.txt', 'w', encoding='utf-8') as f:
    for no, name, team, score in zip(nos, names, teams, scores):
        f.write(f'排名:{
      
      no} 姓名:{
      
      name}  球队:{
      
      team} 得分:{
      
      score}\n')
  • Open a file nba.txtfor write mode ('w') and UTF-8 encoding.
  • Use zipa function to iterate over rankings, names, teams, and scores at the same time, combining them into a tuple.
  • Write the data of each line to the file according to the specified format.

full code

# 引入 requests 库,用于发送 HTTP 请求
import requests
# 引入 lxml 库,用于解析 HTML
from lxml import etree

# 设置请求的地址
url = 'https://nba.hupu.com/stats/players'
# 设置请求头信息,包括用户代理(User-Agent)
headers ={
    
     
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'
}

# 发送HTTP GET请求,并传入请求地址和请求头信息,将返回的响应保存在变量resp中
resp = requests.get(url, headers=headers)

# 使用etree.HTML函数将返回的响应文本解析为一个可操作的HTML元素树对象
e = etree.HTML(resp.text)

# 使用XPath表达式从HTML元素树中提取需要的数据
nos = e.xpath('//table[@class="players_table"]//tr/td[1]/text()')
names = e.xpath('//table[@class="players_table"]//tr/td[2]/a/text()')
teams = e.xpath('//table[@class="players_table"]//tr/td[3]/a/text()')
scores = e.xpath('//table[@class="players_table"]//tr/td[4]/text()')

# 打开一个文件`nba.txt`,以写入模式('w')进行操作,编码方式为UTF-8
with open('nba.txt', 'w', encoding='utf-8') as f:
    # 使用zip函数同时遍历排名、姓名、球队和得分,将它们合并成一个元组
    for no, name, team, score in zip(nos, names, teams, scores):
        # 将每一行的数据按照指定格式写入文件中
        f.write(f'排名:{
      
      no} 姓名:{
      
      name}  球队:{
      
      team} 得分:{
      
      score}\n')

detailed analysis

# pip install requests
import requests

Import requeststhe library, which is used to send HTTP requests.

# pip install lxml
from lxml import etree

Import lxmlthe library, which is used to parse HTML.

# 发送的地址
url = 'https://nba.hupu.com/stats/players'

Set the address that needs to send the request.

headers ={
    
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'}

Set request header information, including User-Agent. This information tells the server that our request is sent from a browser, not a crawler, so as to avoid being blocked by the anti-crawler mechanism.

# 发送请求
resp = requests.get(url,headers = headers)

Use requests.getthe method to send an HTTP GET request, and pass in the request URL and request header information. Save the returned response in a variable resp.

e = etree.HTML(resp.text)

Use etree.HTMLthe function to parse the returned response text into an actionable HTML element tree object. etree.HTMLAccepts a parameter of type string, used here resp.textto get the text content of the response.

nos = e.xpath('//table[@class="players_table"]//tr/td[1]/text()')
names = e.xpath('//table[@class="players_table"]//tr/td[2]/a/text()')
teams = e.xpath('//table[@class="players_table"]//tr/td[3]/a/text()')
scores = e.xpath('//table[@class="players_table"]//tr/td[4]/text()')

Use XPath expressions to extract the required data from the HTML element tree. Here four XPath expressions are used to extract the ranking, name, team and score data, and save them in the nos, names, teamsand scoresvariables respectively.

with open('nba.txt','w',encoding='utf-8') as f:
    for no,name,team,score in zip(nos,names,teams,scores):
        f.write(f'排名:{
      
      no} 姓名:{
      
      name}  球队:{
      
      team} 得分:{
      
      score}\n')

Open a nba.txtfile named in write mode ('w') and use UTF-8 encoding. Then, use zipthe function to iterate over the rank, name, team, and score all at once, combining them into a tuple. By looping through each tuple, write the data of each row to the file according to the specified format.

In this way, the code realizes the crawling of NBA player data and saves the result to nba.txta file.

running result

insert image description here

conclusion

Through the sample code in this article, you can learn how to use Python to crawl NBA player data. We use the requests library to send HTTP requests, the lxml library for HTML parsing, and the XPath expression to extract the required data. Finally save the result to a file. This example can help you understand the basic principles and operation steps of the crawler, and also be able to obtain data about NBA players. I hope this article will help you understand and master Python crawler technology.

Guess you like

Origin blog.csdn.net/qq_33681891/article/details/131974796