ChatGPT implements HTML web page text extraction

Web Automation Tools

Since ChatGPT has a very powerful understanding of programming languages, can it be used to automatically process web pages? The answer is yes. ChatGPT can use machine learning algorithms to identify text in web page elements and extract useful information.

For example, we provide a relatively complex HTML code with many layers, as shown in the following figure:

The part we marked with the red box in the picture is the text that needs to be extracted. We can ask ChatGPT to help us extract the text and see if it is as we expected, and only the part in the red box is extracted. As shown below:

We see that ChatGPT successfully extracted the text in the red box, and did not extract the HTML fragment

<img data-v-ae3ef2f2="" data-v-28d01aa9="" src="https://static001.infoq.cn/resource/image/c1/ab/c1a96a0372f54a63493051b05b3d5aab.png" alt="Image default text: Musk's open source Twitter algorithm! The recommendation mechanism is officially released, and the number of GitHub stars has exceeded 10,000" class="article-image">

The default display text of the image label is extracted. It can be said that the recognition success rate for the text part is still good, and we can carry out further processing based on these extracted texts.

ChatGPT's understanding of HTML is not only about text extraction, it can recognize the entire HTML structure. Based on this ability, we can ask ChatGPT to help us realize some more interesting functions. For example, we hope that by entering text commands, we can control the elements in the web page to perform some operations, such as click, input, scroll, etc., then we can analyze the commands through ChatGPT and help us generate them according to the template according to the preset operations and requirements. Corresponding instruction fragments, so that we can analyze and process accordingly according to these instruction fragments in a unified format. For example, we can first set a prompt (prompt) for ChatGPT, tell him what to do, and output according to the template:

You are a browser page automation assistant.

Actions you can use include:
openLink (element href attribute)
click(elementId)

You will receive a task to execute and a DOM string. You need to choose the most appropriate Action, and you can retry failed operations at most once.
Here's an example of how you respond when you receive a task:
<Thought>Should I click the Add to Cart button</Thought>
<Action>click(223)</Action>
You must always include <Thought> and <Action> opening/closing tags, otherwise your response will be marked as invalid.

After ChatGPT receives this prompt, it will set the context of this prompt and output its own understanding of the prompt. We can see if the understanding of ChatGPT is correct, as shown in the figure below:

It can be seen that ChatGPT successfully understood the prompt we set for him, and then we will test whether he can execute our instructions correctly. We enter the following in the input box:

The user initiates the following tasks:
Please open the article link

Here is the page content:
<div data-v-7ce5c5d7="" class="list">
<div data-v-28d01aa9="" data-v-7ce5c5d7="" article-item="" class="article-item image-position-right">
    <div data-v-28d01aa9="" item-main="" class="item-main">
    <div data-v-28d01aa9="" data-icon="" data-video="" class="image"><img data-v-ae3ef2f2="" data-v-28d01aa9=""
...

Next, let's see how ChatGPT responds to this task, as shown in the figure below:

We see that ChatGPT correctly identified the HTML text

<a data-v-65bacb95="" data-v-28d01aa9="" com-article-title="" href="https://www.infoq.cn/news/3OOPEivwhT0gLcKP0Nwl" target="_blank" rel=""  class="com-article-title">

The href attribute in the link tag is passed as a parameter to the openLink() function, and the output template will be correctly output to the page according to the output template we set, so that we can put these unified format instructions The fragments are parsed, and then processed accordingly according to the requirements.

We can foresee that this kind of webpage automation technology realized by ChatGPT will have more applications and innovations in the future, which can make webpage-based tools more intelligent and convenient, and the application scenarios will become more and more extensive.

Guess you like

Origin blog.csdn.net/shiyunzhe2021/article/details/130426844