ChatGPT helps me automatically write Python crawler scripts

We all know that the ChatGPT chat robot has exploded recently, and I also tried my best to register an account. It is said that there will be a fee later.

ChatGPT is a generative AI based on a large language model. In other words, it can automatically generate text similar to human language, and present you with a logical answer, which is completely different from traditional search tools.

ChatGPT can not only answer traditional questions about humanities, science, emotion, etc., but also write codes and fix bugs. Programmers are in a hurry, and they are simply grabbing jobs, so there are various anxious comments on the Internet that ChatGPT will make you unemployed.

As the saying goes, "seeing is better than hearing a hundred times", I tried to let ChatGPT use Python to write crawler scripts to see if it works?

technology upgrade

Technology must learn to share and communicate, and it is not recommended to work behind closed doors. A person can go fast, a group of people can go farther.

Good technical articles are inseparable from the sharing and recommendation of fans, dry data, data sharing, data, and technical exchange improvement, all of which can be obtained by adding the communication group, which has more than 2,000 members. The best way to add notes is: Source+ The direction of interest makes it easy to find like-minded friends.

Method ①, add WeChat account: pythoner666, remarks: from CSDN + chatgpt
Method ②, WeChat search official account: Python learning and data mining, background reply: add group

1. Crawl the column articles on Zhihu

Ask:

Help me write code to crawl websites with python

ChatGPT:
picture

Put the given code into PyCharm and run it again, and found that no error was reported, and the content was printed.

import requests
from bs4 import BeautifulSoup

url = "https://zhuanlan.zhihu.com/p/595050104"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

title = soup.find("h1", class_="Post-Title").text.strip()
body = soup.find("div", class_="Post-RichText").text.strip()

print("Title:", title)
print("Body:", body)

picture

Although the code given by ChatGPT can be executed, it also mentioned that the crawled website will change at any time, that is, the HTML will change, so the code may need to be adjusted to work properly.

Anyone who has written reptiles should be able to understand that the reptile code written manually cannot be done once and for all, and needs to be changed at any time.

This is what ChatGPT suggests makes sense.

Later, I tested the articles on medium and Baijia. The code format provided by ChatGPT is almost the same as the above. It cannot be directly executed to obtain the results, and it needs to be fine-tuned before running.

2. Crawl the comments of a product on JD.com

In order to increase the difficulty of ChatGPT, I tried to let it crawl the user comments of an e-commerce website

Ask:

Please use python to write code to crawl all user reviews of this Jingdong product https://item.jd.com/13652780.html

ChatGPT:

picture

Maybe this webpage is a dynamic page, and the method provided by ChatGPT cannot crawl comments.

I then asked:

What should I do if the crawled result is a null value?

ChatGPT:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-nnDP8sph-1676473447460)(null)]

ChatGPT provided 3 possible reasons, but it didn't help me to modify the code.

So I asked again:

Still empty, please help me rewrite the code to crawl

ChatGPT:

picture

This time it’s awesome, it re-wrote the crawler code with Selenium, and told me that crawling dynamic web pages needs to simulate browser behavior, so selenium technology must be used.

I didn't run it to test whether the code is correct or not, but ChatGPT really surprised me, being able to correlate the content of the conversation before and after, and give the correct solution.

3. Continue with more tests

The above is just a superficial play, ChatGPT has already attracted me,

I am going to spend more time testing ChatGPT's solutions to various crawlers and its ability to fix bugs.

Just from the level of writing code, ChatGPT is already comparable to the level of intermediate and senior programmers, and its knowledge scope far exceeds that of the most powerful programmers in human beings.

ChatGPT can generate the content that people want according to the dialogue. This is a huge breakthrough in AI, and its wide application in the future is unimaginable.

Guess you like

Origin blog.csdn.net/m0_59596937/article/details/129052428