Don't always think about training large models yourself, your business may not need it

At least that's the case for enterprise knowledge bases.

I want to train large models, I want to do private deployment, I want to do model fine-tune, and I want to do vertical GPT. I have heard a lot of words like this recently, and it reminds me of a video I saw before:

I want to fly a plane when I grow up, I want to be an astronaut, I want to have a big house hahahahaha

The title of the video is "Remember your childhood dream?" 》, after the cute and happy calf and lamb, the camera changed to steak skewers.

Yes, we have all had dreams, and realized ideals were called dreams in the past, but many people's dreams are actually just beautiful imaginations.

55f0c79e0c1bcfea007d061c555821a9.jpeg

Two days ago, Yu You posted an article a few days ago , quoting the sharing of Tsinghua East teacher:

Some teams claim that their model scale is very small. For example, 7B has reached the level of GPT3.5. From the current situation announced by Zhipu, this bragging is really possible.

This remark aroused the resonance of several students in the group, and also pointed out another threshold for private deployment of large models after computing power purchase and model deployment: pre-training.

This article will not discuss pre-training. I will go back to the essence of the business, combine the current development of the large model and the enterprise knowledge base, and talk about some of our views on the current technology and industry development.

In short, we believe that the big model, like any other technology, should serve the business, and if it is not necessary, don't use the model .

Whether you are hesitant to privately deploy large models, or are already playing, I hope this article can inspire you.

If you don’t have time to read the full text, just know the following points:

  1. Although there have been changes in the development of large models, there are still challenges. The miniaturization of the model continues to break through, but it is not realistic in the short term to achieve the equivalent level of GPT-3.5 for a very small model. At the same time, the hallucinations of the large model and the emergence of intelligence are like two sides of the same coin, which is a problem that is difficult to solve in principle.

  2. Prioritizing Embedding is the correct posture for enterprises to use large models. In the current period of rapid development of large models, the most important thing is to polish your own business. Only when it is feasible in business is it necessary to fine-tune or even privately deploy a large model. This is especially true for AI-Native's intelligent applications.

  3. In the era of large models, a new paradigm of enterprise knowledge services has emerged. The provision of knowledge has changed from web browsing in the past to serving people in the form of Chat, serving AI in the form of knowledge API, and may further develop into a knowledge exchange market based on a federated architecture. This may be the focus of future intelligent transformation.

1. Current status and challenges of large model capabilities

The strong AI wind brought by the large model has already passed various human tests [1] .

30acb09a5677642e88eb5c602ff50be2.jpeg
GPT-4 Exam Results

Slowly, no one doubts whether it can pass the Turing test. This also means that the services provided by large-scale AI through Chat dialogue have been comparable to or even surpassed humans in many scenarios.

But we must realize that many of these latest capabilities only exist in ChatGPT, and many large-model AI capabilities are still in the development stage.

Even Google, which invented the Transformer architecture, Bard's effect is still far from ChatGPT.

In terms of large Chinese models, you can refer to the evaluation benchmark list released by SuperCLUE in June , and the model gap can be seen.

139eb24ce32e17cc71b1f585b5b9cff1.png
Chinese large-scale model evaluation benchmark list in June

What needs to be reminded is that it seems that there is only a few points difference in the evaluation benchmark, but in fact it may be the difference between OK and not OK.

The emergence of large model capabilities requires large parameters

Research on the emergence of large model capabilities [2] has long suggested that the capabilities of large models will only emerge when the scale parameters are large enough to a certain extent. Although the mechanism is not completely clear, studies have shown that the Instruction Following capability used by the intelligent customer service [3] we mentioned before begins to emerge when the scale parameter is about 68B (the training computing power is about 10^23 FLOPs).

33d3965935ad4f4507aff705b13a491b.png
The Scale at which Large Model Capabilities Emerge

How to train large models to achieve capacity emergence on a smaller scale is the focus of current research. Although we can expect new breakthroughs in the future, it will take time.

The important thing is that the high cost of privately deploying an available large model and the high training requirements are not affordable by ordinary enterprises. This is also one of the reasons why some voices have recently begun to lament that the application of large models is difficult to implement .

Whether a retarded mockup is a mockup or a retard, that is the question.

But taking a step back, even with this ability, we still need to think about whether it is the most suitable solution.

According to our simple observation, if a matter has a more economical and lower-cost solution, then this solution is more likely to win the competition, which has nothing to do with how much the company is willing to invest.

Currently, in the application of large models, the method of using Embedding + pedestal large model is a lower cost and better performance solution in many cases than the solution of deploying large models privately.

Made-up big models can't explain

The problem with large models is of course much more than that.

The more we understand the mechanism of large models, the more we feel that the problem of large model illusions is a problem that cannot be solved in principle. And gradually, we even began to accept this kind of nonsense, because people also make mistakes, and people also lie.

Yeah, it just makes the same mistakes that every other big model makes.

We found out when we started to implement intelligent customer service [4] . At that time, we used special prompt word instructions to constrain ChatGPT, which can solve this problem to a certain extent, but this solution requires high ability of large models to follow instructions.

However, in practical applications, we can increase the tolerance of hallucinations through product design. On the one hand, we will choose professional scenarios, that is, let AI act as a human assistant to provide advice and support to professionals, or let professionals act as content verifiers and check before final output.

The use of human-machine collaboration mode in intelligent customer service is an example, allowing customer service personnel to do quality inspection and supplementation of AI. The implementation of AI assistants within the enterprise is another example, because enterprise employees have professional knowledge about the business and products, and they will naturally scrutinize the content generated by AI.

This also leads to another problem with large models, interpretability.

Originally, the interpretability of AI models is a difficult problem, but it is even more difficult in large language models. The large model can give answers to questions, fill in blanks, and provide decisions, but it cannot tell why the answers are answered and where the knowledge comes from. This is the same as we know the architecture and algorithms, but we don’t understand why capabilities emerge.

But if the information needs to be screened, it becomes necessary to explain the AI ​​answer. Especially in terms of enterprise knowledge base, we need to trace the source of knowledge in order to learn more or perform updates.

To deal with these problems, the large model itself cannot be solved for the time being, but an enterprise knowledge base solution that integrates the use of large models and knowledge tracking can do it, as we will describe in detail below.

To sum up, the large model technology itself has made breakthroughs in technology, but there are still many challenges, which means that when using large models, these problems should be avoided or solved in a targeted manner. This also requires us not to stick to the large model itself, and flexibly use external infrastructure such as vector databases to complete the overall solution.

2. New paradigm of enterprise knowledge base management

We have been talking about large models and GPT, but there are two other things that cannot be ignored.

One is the external service in the form of API, which is consistent with the change of cloud service to the industry in the past few years [5] , but this time it is the turn of AI service, as we mentioned before [6] .

Another thing that is not so easy to understand is the service interaction in the form of chat.

ChatGPT = Chat + GPT

In the past, the way humans interacted with computers was that professional programmers learned the language of computers to interact with machines. Now, computers have learned human languages, and the interaction with machines can be carried out through natural language. This will inevitably lead to the reconstruction of the application form. Intelligent customer service is an example , as is the enterprise knowledge base.

When it comes to changes in interaction, some people may think of the impact of WeChat’s voice interaction on the original text social network, but this time it may be far greater than that. The real comparison may be the change of the GUI to the original command line CLI interaction, and the product change at the level of the Windows operating system.

The way of providing services with Chat, that is, the conversational user interface CUI, was said to be Conversational UI before, but I feel more and more that ChatUI is more appropriate.

These two things are the reason why we do Chat AI Cloud, and they are also the basis of the new paradigm of enterprise knowledge base. Below we will elaborate on the three aspects of mode, principle and product.

Three Patterns for Using Large Model Services

There are three modes of usage from light to heavy on large models:

  1. Pure prompt word mode (PromptOnly) : directly use the prompt word to call the large model API, which is the easiest way to get started;

  2. Embedding vector mode (Embedding) : store knowledge preprocessing into the vector database, find related knowledge through similarity query when asking questions, then add prompt words together with the question, and then call the large model API;

  3. Fine-tune model mode (Fine-tune) : store the knowledge into the large model through Fine-tune training, and then call the prompt when using it;

When discussing these options, we often hear two misconceptions:

Misconception 1: Although the context is limited now, Claude has already released it to 100K. In the future, all major models will definitely be released. At that time, all content can be directly added to the prompt word and called. Embedding vector patterns is the transition solution.

One of the biggest problems with this view is that it ignores costs. Calculated by expanding to 100K contexts, if mode 1 brings all the knowledge text with each call, and mode 2 we choose 4K knowledge fragments, then each call of mode 1 will cost 25 times the cost of mode 2 calls.

This problem becomes more serious when the context is larger, and the larger the single API call, the slower the processing response of the large model, and sooner or later it will reach an unbearable delay for users.

There is an algorithmic reason behind this, that is, the algorithm complexity of OpenAI to perform Attention is O (n^2)  , which means that as the sequence length increases, the computing resources required to perform Attention calculations increase exponentially.

This also leads to the second problem, that is, the context supported by the large model can only reach the MB level for a long time, while the knowledge base size supported by Mode 2 can easily exceed the GB level.

Misconception 2: The Embedding mode does not have the ability of Chain of Thought, and it will not be able to make complete inferences based on knowledge when answering questions.

We talked about the problem of the large model in the previous section. Although the third mode has a chain of thinking, it has no solution to all the problems of the large model. The second mode can be easily resolved.

And we still can't ignore the cost issue, because we still have a simple point of view that if AI wants to do what we imagine now and penetrate into all aspects of applications, then its cost must be low enough. The cost of mode 3 is nearly two orders of magnitude higher than that of mode 2.

Let's compare it with the calling price of OpenAI.

In the Embedding mode, the Adav2 model is called during training, and the cost is $0.0001 / 1K tokens. The ChatGPT model is called during use, and the cost is question $0.0015 / 1K tokensand answer $0.002 / 1K tokens.

In Fine-tune mode, the Davinci model is called during training, and the cost is $0.0300 / 1K tokens, and the Davinci model is also used, and the cost is $0.1200 / 1K tokens.

In other words, the training cost of the latter is 300 times that of the former, and the cost of use is nearly 80 times that of the former .

This is still a comparison of a single training cost. Considering that the data in Scheme 3 cannot be withdrawn, any update will trigger model retraining. The time cost and resource consumption implied by this part are huge.

How good can the Embedding solution achieve? This depends on the principle of Embedding.

Embedding embedding principle

The specific operating principle of the Embedding scheme is actually divided into three steps:

  1. preprocessing

The knowledge document is segmented first, and then the segmented knowledge segment Segment is obtained by calling the large model API to obtain its corresponding vector. This vector is what we call embedding, that is, Embedding.

Then store the Embedding-Segment key-value pair obtained after the above call into the vector database, and the preprocessing is completed. Therefore, this stage will also generate large model call costs.

  1. Get Associated Knowledge Fragments

After the user asks a question, the enterprise knowledge base needs to obtain the corresponding vector by calling the large model API for the user's question. Then pass this vector to the vector database to obtain the most similar TopK knowledge fragment through the similarity algorithm.

  1. Combination question

When the large model is finally invoked, the service will combine three parts to form the final prompt word, namely the preset prompt word, the knowledge fragment obtained in the previous step, and the user's question.

Yes, you read that right, the final calls are still combined into prompt words. It's just that this prompt word dynamically increases the relevant knowledge obtained through vector similarity.

If you have used prompt words, you should understand the great power of prompt words. Most of the ChatGPT applications you see now are actually realized through prompt words. The Few-shot learning ability of ChatGPT that we are familiar with [7] and the reinforcement learning RLHF [8] based on human feedback also play a role in it.

What exactly are Embedding, embedding, and vectors?

When many people come into contact with this solution, the most difficult thing is how to understand this term. To put it simply, they are one thing, and they are all expressions of knowledge in a large model.

The quickest explanation of this concept comes from here [9] :

Manifold assumption in the field of deep learning: the natural raw data is a low-dimensional manifold embedded in the high-dimensional space where the data is located.

The task of deep learning is to map high-dimensional raw data (images, sentences) to low-dimensional manifolds, so that high-dimensional raw data becomes separable after being mapped to low-dimensional manifolds, and this mapping is called embedding.

Later, I began to call the representation vector of the low-dimensional manifold Embedding, which is actually a misuse.

It's just that it's used a lot, and misuse has become universal. Looking at OpenAI's documentation now, Embedding is not an action of mapping, but the result of mapping: vector.

After understanding the principle of this mapping, you can imagine why embedding is actually effective enough in principle.

All knowledge, whether it is pre-training or Fine-tune, will eventually enter the model and be distributed in the model space. When we generate a reply to predict each word, the word that has the greatest impact on the result is naturally the word closest to it. If the only thing that affects the reply is the words near it, then I only need to bring the sentences related to it when I ask the question, because other sentences have little effect on the result.

Which sentences in the knowledge belong to the knowledge near the question is calculated by the vector similarity of the sentences. As for why the vector similarity can reflect the proximity of sentences, there is a special paper research on this, and the space limit will not be expanded here.

Business first and then fine-tuning, priority embedding is the correct posture to use large models

In the previous analysis, we mainly elaborated and analyzed from the perspective of application. In fact, in the academic field, research in this area has already begun. The first solution we call the prompt word mode actually does prompt-tuning. This is also a kind of fine-tuning that you will often hear that using prompt words is also a kind of fine-tuning, and it is not wrong.

Prompt-tuning did not start to attract attention until GPT became hot, because compared with Fine-tuning, its advantages are very obvious. Since the focus of the work is to adjust the input rather than modify the model, the calculation cost is low, and the resources and training time required are also less.

According to Google's research [10] , when the parameter scale reaches more than one billion, the effect of prompt-tuning can be compared with model fine-tuning. That said, there are still gaps on a smaller scale.

The research of Tsinghua University and Zhiyuan [11] took a step in this direction. They introduced Prefix-Tuning, which is to add a prefix Prefix in front of Prompt, and then perform special treatment on this Prefix in the model. In the end, the performance comparable to Fine-tune was achieved on small models of various scales.

That is to say, in the public cloud model, Option 1 is close to Option 3, and in the private deployment model (default smaller scale), then Option 1 is slightly inferior to Option 3, but with Prefix-Tuning research, Option 2 is an enhanced version of Option 1. With the development of technology research, it is possible to reach the level of Option 3.

Of course, Option 2 and Option 3 are not completely mutually exclusive. If we divide knowledge into industry public knowledge and company proprietary knowledge, industry public knowledge enters the private model through Fine-tune, and company proprietary knowledge uses Option 2, then such a solution should have the best effect, but the complexity increases and the cost is also the highest.

To sum up, we believe that option 2 is the first choice, and the use of public large models through Embedding is the correct posture for enterprises to develop large models. In the current period of rapid development of large models, the most important thing is to polish your own business. Only when it is feasible in business, is it necessary to fine-tune or even privately deploy large models.

Create AI veteran experts who understand business for enterprises

d8896b37b36fe15f616fd4e322e524d7.jpeg

In Internet slang, there is a widely circulated sentence, that is, GIYF, Google is your friend. A reminder to ask Google before asking your friends and colleagues.

This phenomenon is very typical. It means that in work and skill communication, compared with human dialogue, any words and websites are blunt and inefficient. The most suitable way for newcomers is to let him ask questions and digest them in answers. Although this may seem like a newcomer's laziness, it is actually human nature.

But for the old people in the enterprise, although they have the experience and knowledge needed by the newcomers, they seldom have enough patience to guide others in every detail. The important thing is that they still have their own work, and the inheritance of experience and knowledge is not a high priority, and most of the time it is not even their job content.

They are here for work, not customer service.

So along with the slang, there is an old expert with a bad temper in most companies . Enterprises need them and hope that they can pass on business knowledge, but contrary to expectations, they can't do well.

Things are a little different now because we have large models. With an AI that has a brain and can think, and the enterprise knowledge base has become knowledgeable, patiently and meticulously answering internal questions is a matter of course.

3. Enterprise knowledge service with large model as the core

What will the enterprise knowledge base look like in the future?

Building with large-scale AI as the center is not just as simple as being able to chat. In addition, there are many things that need to be done, such as knowledge traceability and update, multi-document type support, rights management, vertical field customization, localized deployment, etc.

What does an enterprise knowledge base do?

1. Knowledge traceability

As mentioned earlier, the illusion of a large model and the interpretability of the reply content make knowledge traceability extremely important. Especially when professionals are used as verifiers of large-scale AI, the first thing this knowledge base service needs to do is to confirm the reply content of AI. If you use the search engines like Bing, you will also feel their thinking in this regard.

The enterprise knowledge base, in essence, is doing enterprise knowledge management and retrieval.

2. Knowledge update

We have explained in detail the advantages of the Embedding scheme over the fine-tuning scheme. The real-time update of knowledge without retraining is a huge leap in terms of cost and method.

3. Multiple document types

Knowledge within an enterprise is actually very scattered, and it exists in various documents, wikis, or internal websites. Therefore, the enterprise knowledge base needs to provide sufficient support for knowledge sources. We currently support Doc/Docx, PDF, Markdown, TXT, HTML, CSV, Xls/Xlsx, and can also automatically pull content through URL.

4. Rights Management

If you have knowledge, you will have authority. Different levels of knowledge can be read and contacted are different. Therefore, in addition to being a public service assistant within the enterprise, authority management must be added to adapt to the enterprise's control over authority.

5. Localized private deployment

There was a voice saying before that I want to privately deploy the large model, because my data cannot enter the large model, and it will learn and then leak out.

This worry is not unreasonable, so OpenAI revised the API data usage policy on March 1, 2023 [12] , mainly saying two points:

1) Do not use the data uploaded by the API to train the model, unless you explicitly ask to do so; 2) The data uploaded through the API will be deleted after 30 days, unless required by law;

I believe that OpenAI did not make this decision because of kindness, but that if it does not do so, the ecology of the large model will be fatally hit. Just like the company that provides cloud computing promises not to move the data on the cloud host. If you don't do this, many customers will not go to the cloud at all.

We believe that this will also become the basic industry rule for large-scale AI services. Therefore, in accordance with the requirements of data regulations, it is no problem to use domestic large-scale public cloud services in many scenarios.

The parts other than the knowledge service model, chat service and knowledge base service, can also be easily deployed with the help of our cloud-native design, ensuring that all data of user business can be effectively managed and controlled.

What needs to be reminded here is that the private deployment of this knowledge base does not conflict with the private deployment of the large model. If the large model is built locally, it is naturally possible. It's just that we recommend considering these two things separately, for the same reasons as explained earlier.

6. Vertical field customization

The principle of the Embedding solution used in the knowledge base is very simple, and many people are trying it, but many of the results are not ideal. Based on our experience and experience with customers, we found that the key lies in how to optimize the process of integrating knowledge into prompt words. Therefore, from the effect point of view, it is impossible to adapt all knowledge with a set of parameters.

Therefore, we have added document-specific prompt word presets to knowledge documents to guide AI on how to learn knowledge. At the same time, we have added the configuration of the document segmentation size, the number of slices selected for word questions, and the overlap size of slices in the console. From the effect point of view, the improvement is still very obvious.

Space is limited, so here is just a brief introduction. If you are interested, we can talk about it in the following articles.

Industry Knowledge Base and Federation Architecture

Although technology is available, enterprises still face the challenge of data in building knowledge bases. Just as data is needed for large model pre-training, some businesses also need industry knowledge in the process of knowledge base construction. Where does industry knowledge come from?

Although enterprises can accumulate and collect industry knowledge, we believe that an independent industry knowledge base service is also worth doing.

Providing industry knowledge is also a new category in the next generation of enterprise knowledge services. For example, the current HowNet provides paper query services, and the next generation HowNet can provide a paper knowledge base and provide knowledge query for AI to use.

This uses the federated architecture of our enterprise knowledge base:

4a49b6a9bea686ef06d51ffa7a82d4d2.png
Enterprise Knowledge Base Federation Architecture

The principle is shown in the figure. When the enterprise knowledge base obtains knowledge fragments, it can add a federated query request to obtain knowledge from other knowledge base services, and then combine them to provide large model services.

The knowledge base solution, which is the Embedding solution mentioned above, is affordable for most enterprises.

Therefore, we believe that in the not-too-distant future, knowledge provision can be changed from the way of web browsing in the past to the way of chatting to serve people and the way of knowledge API to serve AI. The federal architecture further promotes the establishment of a knowledge exchange market by means of promoting industry knowledge interaction.

This is also very likely to be the direction of the intelligent transformation of enterprises in the future.

Infrastructure in the Age of Big Models: CVL

So don't ignore vector databases.

Many people know about vector databases. In March, Huang Renxun mentioned in NVIDIA GTC2023 [13] that he will release his own vector database RAFT within this year.

The relationship between the vector database and the large model is similar to the CPU processor and storage in a computer system. Large models can store knowledge, but cannot store unlimited knowledge. Industry knowledge, domain knowledge, and business knowledge need to be stored and managed in vector databases.

Of course, for applications, we believe that another infrastructure that will use large models in the future is chat services. Because with the increasingly higher requirements of users for chat experience and the development of cloud services, the addition of Chat function to products has already changed from self-developed to integrated IMSDK [14] .

In the era of large models, when enterprises build their own intelligent applications, they will use CVL in combination. C is the chat service, V is the vector database, and L is the LLM large model service.

Imagine what it would be like if all websites and apps were turned into a dialog box?

Blue Warbler Product Information

The BlueVector enterprise knowledge base (BlueVector) is about to be released and has entered the stage of invitation testing. If you are interested in product experience, welcome to add "Xiaolan will chat" to sign up for experience.

If you are interested in the enterprise knowledge base, or if you think there are new needs, welcome to join the group discussion.

Scan code to add Xiaolan will chat

The content of this article has entered the Xiaolan article knowledge base, you can use the Lanying Link to ask questions:

https://lanying.link/00h0vp[15]

About Blue Warbler IM

Blue Oriole IM is a new generation of intelligent chat cloud service, Next-Gen Chat AI Cloud. ‍‍‍

Enterprises can integrate the Lanying IMSDK to have two functions of Chat and AI at the same time. The current AI engine already supports ChatGPT, and Baidu Wenxin Yiyan and Ali Tongyi Qianwen are all connected.

If you want to polish your products in the era of strong AI, welcome to continue to pay attention to Lanying IM, we will continue to export the latest experience and technology:

be7994799e29a6b596be0eede115120b.jpeg
Create a new generation of smart chat APP, using the Blue Oriole IM SDK!

References

[1] 

GPT-4 test results:  https://openai.com/research/gpt-4

[2] 

Research on the Emergence of Large Model Capabilities:  https://arxiv.org/abs/2206.07682

[3] 

ChatGPT’s ten guidelines for intelligent customer service:  https://docs.lanyingim.com/articles/product-and-technologies/chatgpt-intelligent-customer-service-ten-service-guidelines.html

[4] 

How to use ChatGPT to implement intelligent customer service:  https://docs.lanyingim.com/articles/product-and-technologies/how-to-implement-an-intelligent-customer-service-by-chatgpt.html

[5] 

Changes of cloud services to the industry:  https://docs.lanyingim.com/articles/Industry-development/the-next-decade-of-cloud-services.html

[6] 

How to add ChatGPT to APP:  https://docs.lanyingim.com/articles/product-and-technologies/how-to-add-chatgpt-to-your-app.html

[7] 

Few-shot learning ability:  https://arxiv.org/abs/2005.14165

[8] 

Reinforcement Learning RLHF from Human Feedback:  https://arxiv.org/abs/2203.02155

[9] 

How to visualize the concept of embedding:  https://www.zhihu.com/question/38002635

[10] 

The Power of Scale for Parameter-Efficient Prompt Tuning: https://arxiv.org/abs/2104.08691

[11] 

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks: https://arxiv.org/abs/2110.07602

[12] 

OpenAI API Data Usage Policy: https://openai.com/policies/api-data-usage-policies

[13] 

Jensen Huang's speech @NVIDIA GTC2023:  https://www.woshipm.com/ai/5848163.html

[14] 

How we did IM in the past fifteen years:  https://docs.lanyingim.com/articles/Industry-development/how-we-build-an-instant-messging-system-in-the-past-fifteen-years.html

[15] 

Xiaolan AI article reading assistant:  https://lanying.link/00h0vp

Guess you like

Origin blog.csdn.net/yellowzf3/article/details/131714407