Teach you to use Python to crawl Baidu search results and save them

Click on " Python crawler and data mining " above to follow

Reply to " Books " to receive a total of 10 e-books of Python from beginner to advanced

now

day

Chickens

soup

The resignation of Emperor Baidi to the colorful clouds, thousands of miles of Jiangling will be returned in one day.

I. Introduction

Hi everyone, this is Cui Yanfei. As we all know, direct search for keywords on Baidu will result in a lot of things, often accompanied by advertisements, accidentally clicked in, and it takes time to exit, which is a bit laborious.

Recently, a small partner in the group put forward a request to obtain the title and link of relevant speech articles on Baidu on food. It happens that the editor is learning crawlers recently, so I want to use this demand to practice hands. We all know that for Python, there are a large number of libraries available, and it is not difficult to implement, let's get started.

2. Project goals

Crawl the search results with the keyword "food" on Baidu, save them, and submit them to customers for further analysis of my country's food policy.

3. Project preparation

Software: PyCharm

Required libraries: json, requests, etree

Four, project analysis

1) How to search for keywords?

Use the response library to directly Get the URL to obtain the search results. The URL is as follows:

https://www.baidu.com/s?wd=粮食

2) How to get the title and link?

After using etree to standardize the original code, locate the article title and href through Xpath, and obtain the title and article link.

3) How to save search results?

Create a new txt file, write the search results circularly, and save it.

Five, project realization

1. The first step is to import the required libraries

import json
import requests
from lxml import etree

2. The second step uses requests to search for requests

headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
}
response = requests.get('https://www.baidu.com/s?wd=粮食&lm=1', headers=headers)

3. The third step is to organize and analyze the obtained source code, and locate the required resources through Xpath

   r = response.text
   html = etree.HTML(r, etree.HTMLParser())
   r1 = html.xpath('//h3')
   r2 = html.xpath('//*[@class="c-abstract"]')
   r3 = html.xpath('//*[@class="t"]/a/@href')

4. The fourth step is to read and save useful resources cyclically

for i in range(10):
    r11 = r1[i].xpath('string(.)')
    r22 = r2[i].xpath('string(.)')
    r33 = r3[i]
    with open('ok.txt', 'a', encoding='utf-8') as c:
         c.write(json.dumps(r11,ensure_ascii=False) + '\n')
         c.write(json.dumps(r22, ensure_ascii=False) + '\n')
         c.write(json.dumps(r33, ensure_ascii=False) + '\n')
    print(r11, end='\n')
    print('------------------------')
    print(r22, end='\n')
    print(r33)

    

Six, effect display

1. The results of the program operation are as shown in the figure below:

2. The final result of the file saved as txt is shown in the figure below:

Seven, summary

This article introduces how to use Python to crawl and save Baidu search results. It is a small crawler. This is also a fun place for Python. There are a large number of free libraries available that can help you achieve various needs. A lot of work, learn to use Python!

Finally, those who need the project code of this article, please reply to the keyword " grain " in the backstage of the official account to obtain it. If you encounter any problems during the operation, please feel free to leave a message or add friends to the editor. The editor will help you if you see it. Fix the bug!

------------------- End -------------------

Recommendations of previous wonderful articles:

Welcome everyone to like , leave a message, forward, reprint, thank you for your company and support

If you want to join the Python learning group, please reply in the background [ Enter the group ]

Thousands of rivers and mountains are always in love, can you click [ Looking ]

/Today's message topic/

Just say a word or two~~

Guess you like

Origin blog.csdn.net/pdcfighting/article/details/113840170