ChatGPT principle and front-end field practice | JD Cloud technical team

1. Introduction to ChatGPT

The popularity of ChatGPT

As a web application, ChatGPT has accumulated 100 million monthly active users in less than 3 months since it was released in December 2022. Prior to this, it took the fastest record holder 9 months to reach 100 million MAUs.

Anti-climbing of ChatGPT

https://chat.openai.com ChatGPT is currently inaccessible in China due to various policy & preference issues. And it is so popular, so there are a large number of users to experience ChatGPT through agents, crawlers and other forms.

OpenAI is not a company specializing in network services, so it entrusts the anti-crawling to the third-party company CloudFlare.

CloudFlare is currently the world's largest CDN service provider, accounting for 16%; while OpenAI's traffic accounted for the top two in CloudFlare.

Typing effect of ChatGPT

It can be seen that the output of ChatGPT is the typing effect of verbatim output, which is applied to the SSE (SeverSideEvent) server push technology. A screenshot of a Chrome DevTools network for an SSE service:

SSE compares common Websockets as follows:

So is this typing effect intentional or accidental?

2. The core principle of ChatGPT

We can disassemble ChatGPT into four parts: Chat, G, P, and T. Before the follow-up content, let's add a few concepts that are easy to understand in machine learning:

1. Model: The so-called model is essentially a program (function), similar to y=ax+bx^2, where a and b are parameters. For example, the parameter amount of GPT3 is 175B, which is a program with 175 billion parameters. ChatGLM The parameter amount of -6B is 6 billion.

2. Machine learning: The functions we usually write are the logic and parameters controlled by humans, while machine learning refers to the machine confirming the parameters in a certain way (training). The process of finding a function with specific parameters generally consists of 3 steps:

  • Determine the function set: try to exhaust the possibility of all parameters, such as the common CNN, RNN, Transform, etc. in the article are function sets;
  • Data: Through the data set, the way to get the evaluation function is good or bad;
  • Parameters in the execution process: such as how many times each function is executed in each batch, the maximum number of executions, etc. These parameters are generally called "super parameters", which are different from the parameters in the function (algorithm engineers generally refer to self-deprecating parameter tuning engineers as is this "super parameter")

Generative Word Solitaire

ChatGPT is essentially a generative function that is executed recursively. Let's look at two examples:

Case1: Radish and greens

When you see the 4 words radish and green vegetables, what is in your mind?

I think there is a high probability that each has his own love.

When given to GPT, GPT speculates that the probability of the next word is each based on these 4 words and commas.

Then GPT will input the radish and green vegetables to itself again, and guess that the next word has a high probability.

This is why ChatGPT outputs text verbatim. This form is most in line with the underlying principle of LLM operation. In terms of user experience, it also allows users to see the first word faster, and the experience is close to chatting rather than reading. It's on purpose. Here we get the first conclusion:

The operating principle of ChatGPT (model/Fn) is to input text each time (including the content returned last time), and predict the next 1 word.

Case2: Nerd

Take an example from [Original] Front-end Technology Ten Years Review article:

In this example, why is the output "bully like"?

From the full text, the main body here should be the front-end technology. We can think of "promotion" and "popularization" if we simply consider the front-end technology and be like being in elementary school. , and it is possible to speculate that the following text is "education" and "praise". It is difficult to think of "bullying".

"Bullying" appears here, largely due to bullying in the previous article (like, the impact of these keywords is far greater than the front-end technology. So we get the second conclusion:

In the generative language model, the farther the above words are, the less impact on the generated result

Word Solitaire VS Cloze

Here I add the BERT similar to GPT. They are all based on the Transform structure mentioned later. Their comparison is as follows. This first principles thing.

Transform attention mechanism

Case3: Oasis

In this case, the appearance of the oasis is not because of the recent search for new ones, but the desert and camels before 3 sentences. Here I have to mention the famous Transform structure, which is a kind of neural network structure first proposed by Google in a paper "Attention is all you need" in 2017. Its core is the self-attention mechanism, which is used to solve long-term problems. The Weighting Problem for Distance Text.

The author is not a machine learning major, so I won’t start talking about it. It is recommended to read related papers and explained articles.

Pretrained pre-training

Through the previous text solitaire mode, the pre-training model fed with a large amount of data makes it have general language ability. The pre-training here has two meanings:

  • Can complete various general NLP tasks (classification, sorting, induction, etc.)
  • With a little fine-tuning training, domain-specific language tasks can be completed (without having to start from scratch)

Chat dialogue (realized by Finetuning)

Because the pre-training is without human supervision, the general model does not necessarily return text in the form of chat, because its training materials are all-encompassing. For example, if I say that the weather is bad today, it is based on historical experience: there are the following ways to express the bad weather today , it will output different expressions of this sentence, instead of complaining with me like chatting. Below is the output of OpenAI's GPT3 model for today's bad weather:

To make GPT3 output like a chat, it needs to be fine-tuned (fine-tuning) training, such as corpus training through a specific question-and-answer structure:

After being able to chat, if you want to go online, you must put shackles on the model, and you cannot answer content that is inconsistent with human values, otherwise the iron fist of capitalism will come

OpenAI improves the quality of ChatGPT answers and corrects its value tendency through manual labeling and intensive training. If you want to know more about this content, you can learn about the algorithm model behind ChatGPT.

3. Application of ChatGPT

OpenAI officially gave 49 common ChatGPT application scenarios:
https://openai.com/blog/chatgpt

In general, it can be divided into:

  • copywriting
  • Summary
  • code writing
  • Language beautification / cross-language conversion
  • role play

For front-end development students, the most concerned about its code ability. Just experienced the ability of ChatGPT in a small program to taro reconstruction project:

1. Able to understand the template syntax of the applet and convert the taro component of ts

2. Understand the logic of the applet page and correct the props

The page logic page.js of the applet is independent of index.wxml. After getting the taro component generated by pure wxml, merge the code of page.js into it

3. It can supplement knowledge and teach it the unique grammar by analogy

HiBox Fusion ChatGPT

How should such a good ability be precipitated?

We first thought of the VSCode plug-in, just as HiBox itself has the ability to log in, customize Webview, and remote configuration, then integrate ChatGPT into HiBox (too cool), and the Node end is connected to the interface of ChatGPT, which is realized through the front end of Webview A chat window, and then integrate the commonly used Prompt through the configuration system, so that front-end development can easily use the ability of ChatGPT through VSCode. The overall structure is as follows:

In terms of data sources, it has also gradually switched from the crawler version ChatGPT to the API proxy service. The proxy service is connected to the model capability of GPT3.5, and the overall experience is very close to ChatGPT. Proxy service document:
https://joyspace.jd.com/pages/yLnDY3B5UJ1rXP8UYrN6

HiBox's ChatGPT is currently free to use with only erp login, more usage and installation methods: HiBox Quick Start

Private Domain Data Integration

In the process of using ChatGPT, I also noticed 2 problems:

  • The company's sensitive code and information cannot be transmitted to ChatGPT
  • Non-sensitive knowledge in specific fields, such as water drop templates, ChatGPT has not learned

The first thing that comes to mind is to use fine-tuning to integrate private domain data into a large language model (LLM), and then privatize and deploy it on the company's server, so that any code and document can be sent to it , we tried the following 2 ways:

GPT3 fine-tuning

One is to pass the private domain data to OpenAI through the fine-tuning interface of GPT3 mentioned on the OpenAI official website. OpenAI fine-tunes the training in their server, and then deploys it in the OpenAI server. The whole process is a black box.

ChatGLM-6B fine-tuning

The second is to use Tsinghua’s open source ChatGLM-6B as the basic model, apply for a GPU machine on the company’s Jiushu platform, fine-tune the private domain data through LORA to obtain LORA weights, and then deploy it by yourself. The whole process is completely privatized.

GPT3.5 langChain

Generally speaking, the reasoning effect of the above two methods after deployment is difficult to achieve the effect of GPT3.5-API, so we finally tried the embedding plug-in knowledge base method. Use the open source langchain to handle document cutting, vectorized storage, vectorized matching, etc. The data will still be exposed to OpenAI.

4. Current status and prospects of LLM

LLM explosion

In fact, after GPT3 came out in 20 years, most of the heads of machine learning realized the feasibility of this route and actively followed up:

Here I will specifically talk about Baidu. According to public and reliable documents, Baidu launched Ernie as early as 2019 (compared to Google Bard, Ernie and bard are a pair of brothers in the animation Muppet), and it is indeed the first player in China to access LLM. Baidu is taking the same route as Google, which is BERT’s cloze route, because at the time point of 2018~2019, the GPT generation has just come out, and the first generation of GPT is not as good as BERT in all aspects, and Baidu and Google are the same There is a lot of precipitation in search engines, so the chosen route is BERT.

Recently, the alpaca series and domestic large language models have also exploded:

LLM Application Status & Trends

platform

The role-playing ability of LLM may be the key point of the next transformation of human-computer interaction. OpenAI also launched the Plugin model. Through the plug-in, users can buy a plane ticket and search for articles they want to read through a natural language chat. Some say this is an iPhone moment similar to the AppStore release:

Self-driving, capability integration

Similar to Auto-GPT, langchain, etc., ChatGPT can return the text that executes a specific command by agreeing on the template of the characteristics. For example, if you agree with ChatGPT to search, return [search: search content], and then pass regular matching on the client side / [search:(.*?)]/, get the corresponding content to search, and then return the result to ChatGPT to sort out the final answer.

A dummy example:

1. user: 深圳明天的天气怎么样?
2. chatgpt(触发知识限制2021年,返回约定的搜索格式):[search:2023年4月27日的深圳天气]
3. user接收到正则匹配触发搜索,打开无头浏览器搜索百度并取第1条结果:2023年4月27日星期四深圳天气:多云,北风,风向角度:0°风力1-2级,风速:3km/h,全天气温22℃~27℃,气压值:1006,降雨量:0.0mm,相对湿度:84%,能见度:25km,紫外线指数:4, 日照...
4. user(将搜索的内容连带问题第二次发给ChatGPT): 深圳明天的天气怎么样?可参考的数据:2023年4月27日星期四深圳天气:多云,北风,风向角度:0°风力1-2级,风速:3km/h,全天气温22℃~27℃,气压值:1006,降雨量:0.0mm,相对湿度:84%,能见度:25km,紫外线指数:4, 日照...
5. chatgpt(根据问题和上下文,输出人类语言的表达): 深圳明天的天气还可以,整体多云为主,气温22℃~27℃

multimodal

The GPT4 released in April already has the ability of image recognition. The following Case is the process of the host generating the front-end page with a consistent design draft. Classic "front end is dead" moment:

Limitations of LLMs

Although we see that ChatGPT's technology is powerful, we must also take a cautious look at its limitations. It is essentially an empirical imitation of human text output functions based on historical data.

For example, ChatGPT can't do 4-digit multiplication at all. It has a high probability of multiplying 6 and 7 to equal the two key information. The answer ends with 2. According to the two key information of 4 and multiplying by 3, the answer is obtained. It starts with 1, and the randomness in the middle does not converge to the correct answer at all. It is the same situation whether it is ChatGPT or GPT4:

For another example, if you ask it about professional domain knowledge that is particularly niche and easy for ordinary people to make mistakes, it will also output wrong answers based on the wrong answers of most ordinary people:

For example, in the comprehensive interpretation of the V8 Promise source code, in fact, you know nothing about Promise is a very strange topic in the article, what will the following code print?

Promise.resolve().then(() => {
    console.log(0);
    return Promise.resolve(4)
}).then(res => {
    console.log(res);
})

Promise.resolve().then(() => {
    console.log(1);
}).then(() => {
    console.log(2);
}).then(() => {
    console.log(3);
}).then(() => {
    console.log(5);
}).then(() => {
    console.log(6);
})

Most people will answer: 0, 1, 4,
2, 3, 5, 6 Answers for GPT3.5: 0,
1, 4, 2, 3, 5, 6 Answers for GPT4: 0, 1, 2, 3, 4, 5, 6

Only GPT-4's answer is correct, but even if its answer is correct, its specific analysis is wrong, because it may have learned a similar answer in a certain scene, but it does not "understand", and the subsequent analysis content is also the same as most people fallible analysis

end

Finally, use the lines of Zhou Zhezhi in Wandering Earth 2 as the ending.

Regarding the arrival of AI, we should not overestimate it strategically. AI itself has its limitations. Be optimistic, and the front end is not so easy to die; tactically attach importance to and pay attention to its development, try to apply it in our work and life, and technological changes The tide will not change with the will of the individual.

It’s not easy to write all night, the code words are not easy, please give me a like if you see this classmate, Thanks♪(・ω・)ノ

Author: Jingdong Retail Chen Longde

Content source: JD Cloud developer community

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/8881118