Exploring the evolution of application engineering architecture in the AI era, how far is the era of one-man companies?

preamble

In the current AI era of generative models, understanding and using AI-related technologies is something that front-end and back-end R&D students will have to face sooner or later.

All products are worth re-doing with AI. The fundamental reason is that the current form of AI, that is, the generative model, is to change and create new product forms through AI assistance, rather than just supplementing existing product forms like previous technologies.

Simply put, product development students can do more things.

1. Characteristics of contemporary AI

Contemporary AI is on the rise. It has general-purpose powerful reasoning capabilities for different fields and even the whole model. Various theoretical practices have also grown explosively in the past two years. Everyone's understanding of contemporary AI is basically on the same starting line. One of the important reasons why AI is so attractive.

For AI, a large number of studies have shown that human consciousness is non-algorithmic. From Gödel's incompleteness theorem to Turing's incomputability problem, it has been confirmed that artificial intelligence based on Turing machines, that is, contemporary pre-training algorithms based on language models. Model AI cannot establish the concept of "self".

Therefore, contemporary AI is still supported by Turing's theoretical architecture, and it still solves Turing's computable problems. Therefore, it still needs a good and sustainable application architecture to constrain, guide, and manage it.

2. Challenges to R&D

Back to reality, the existing experience and knowledge of R&D students such as front-end and back-end cannot cross this threshold in a short period of time. Moreover, such as large model algorithms, training and reasoning acceleration, heterogeneous computing, etc. are not the fields and advantages of front-end and back-end R&D students.

However, from the recent emergence of a large number of AIGC-related practical articles, we can see that many practitioners are not algorithm students, which also shows that front-end and back-end R&D students can do it. In other words, the threshold of applying based on the existing large model can still be crossed.

3. AI application engineering

The current so-called AI-oriented development is the process of continuously inputting Prompts to the large model, reasoning under the control of the context/context, and obtaining the results we expect.

The efficiency of the entire reasoning process and the quality of the results, in addition to the premise of the stability of the large model, the biggest factor lies in our practical experience, that is, the technology of asking AI questions or guiding AI.

Imagine that there are people in front of us instead of AI, so how should we establish a context through dialogue to generate guidance so that the other party can meet our needs, even if this demand is unreasonable. There is a book "The Art of Deception" that puts forward the concept of so-called social engineering (Social Engineering) specifically for this kind of scene.

Similarly, the corresponding AI is the current popular Prompts Engineering (Prompts Engineering). Someone tried to let ChatGPT act as a grandma to tell the story about the Windows activation code to the grandson, and got a real usable MAK KEY. And this kind of Prompts Engineering, which is similar to Social Engineering, makes the process of AI solving needs completely subvert the traditional programming common sense.

4. Differentiation of AI scenarios

Different from the general concept of AIGC content generation, AI needs to be differentiated into different characteristics in different scenarios. The following are three typical intelligent scenarios:

4.1 Knowledge-intensive

Different from traditional knowledge scenarios, in the AI ​​era, we can also have scenarios such as knowledge summarization, extraction, summary, classification, and content processing and transformation [12].

For example, knowledge structure is transformed into maps (such as brain maps, flowcharts, architecture maps, etc.), detailed content supplements (such as adding examples, notes), etc.

4.2 Interaction Intensive

Such as: role-playing, social assistance, scene consultant, auxiliary decision-making, comprehensive coordination of office documents, etc., which emphasize human-computer interaction auxiliary scenarios where large models play different roles [15].

4.3 Text/code type

In addition to large unstructured text generation, there are also coding-related professional fields such as code generation, code testing, code translation, and code review in low-code, no-code, and hybrid R&D scenarios [15].

It can be seen that the intelligent scene problems we face are relatively complex, and it is difficult to solve this ever-changing demand through human pre-thinking and solidification, because these scenes have too much freedom. Compared with a dozen keywords in general programming languages, human thinking is free and unconstrained. In this case, it is almost uncontrollable to use colloquial prompts in AI applications to solve complex problems. .

Therefore, how to engineer AI applications to be controllable and solve the current problems of large model illusion and drift is very worthy of deliberation and is also the core issue we discuss. It is necessary for us to introduce new theoretical guidance to generate new architectures to solve these problems.

5. Reasoning ability

The following will illustrate the typical algorithms and practical architecture of large models in the industry:

5.1 Basic reasoning

The core capability of our use of large models is reasoning. The following introduces several well-known AI reasoning solutions in the industry.

5.1.1 IO standards

When there is no reasoning process, we ask the big model and he will give the answer directly. This kind of reasoning with zero process is called Standard IO. In most cases, Standard IO cannot solve our problems in one step. We need to identify and further guide ourselves. This is almost impossible to use for complex tasks, so it is generally used as a comparison reference in various optimization experiments.

5.1.2 Chain of Thought (CoT)

In 2022, the publication of the famous Chain of thought (CoT) [11] paper will play a key role in AI's processing of complex tasks, that is, to split a complex task into multiple manageable simple subtasks, allowing the large model to Think step by step, so that the prompts and reasoning of each small task are controllable.

It can be simply understood as, "For a problem, do not directly let the big model give the result, but let the big model reason step by step to generate inferences, and finally give the result." This technique often achieves very good results under Zero-Shot/Few-Shot. CoT is already an essential paradigm in AI application engineering, just like the process-oriented development model, and we will continue to use it in the future.

5.2 Chains Architecture

Here I have to mention Chains[24]. Chains is a module provided by the famous large-scale model application development framework Langchain. As the name suggests, the architecture of chains can be understood as the implementation and extension of CoT, from the most basic LLMChain to Common scenarios: APIChain, Retrieval QAChain, SQL Chain, etc.:

It can be seen that in the Chains architecture, each process from Prompt to Answer is standardized as a different type of LLMChain.

The specific process of the entire requirement from proposal to result is abstracted into a series of multiple LLMChains, which is very similar to the well-known structured and functional programming in terms of expression.

This is good news. If Prompts and Answers are water and soil, with the theoretical guidance of CoT and the structure of Chains, it is like opening channels and building rivers, solidifying the uncontrollable reasoning process that was originally spread by AI to Chain and The connection of the Chain allows everything to return to the way we know the process should be.

But is this really the future of AI application development? Is Chains' practice of relying on human brain thinking and solidifying the reasoning process the whole of AI?

Having said that, you can think about a question:

In the field of AI, do we use the gold of AI as a hoe, and the way we solve the needs is to hoe the ground? In your intuition, do you think that the capabilities and usage of AI are more than that? Is this the traditional architecture or programming paradigm that limits our imagination?

5.3 Better reasoning

5.3.1 CoT Self-Consistency(SC)

In May 2023, the SelfCheckGPT[7] paper mentioned that a mechanism called Self-Consistency has made an important contribution to hallucination detection, which can be simply understood as "a question that allows multiple people to participate in multi-step thinking and answering. Another person grades and chooses the best answer."

Generate multiple CoTs for a question at one time, and vote for the inference of each CoT, and finally get the inference that is closest to the result. The vote is an evaluation function, commonly used is BERT Score or n-gram.

5.3.2 Tree of Thought (ToT)

Also this year, the Tree of Thoughts (ToT for short) paper [10] was published. If CoT is a chain, then ToT is a tree composed of multiple CoT chains. ToT reveals that AI can be extended autonomously through the reasoning and decision-making process, which is a major breakthrough.

CoT emphasizes the process of decomposing tasks into subtasks, while ToT emphasizes that decomposing tasks is to generate multiple thinking processes. In the end, the entire ToT will form a thinking tree structure, so that we can conveniently use the thinking path from complex problems to results as A classic data structure such as Tree uses breadth-first (BFS) or depth-first (DFS) search to solve a complex problem, in which the thinking path, that is, each inference state of CoT is determined by the aforementioned Self-Consistency or other updates. Advanced way to evaluate.

The Tree structure formed in this way with large model self-reasoning and decision-making is completed based on AI scene drill-down and logical self-consistency . Simply put, it replaces the understanding, analysis, execution, and verification that humans have to do before. The whole process is repeated until the correct result is obtained.

6. Augmented language model (ALM)

Having said that, we already have a limited range of automatic reasoning and hallucination recognition capabilities, but the potential of large models goes beyond that. Yann LeCun (Yang Likun) of the Turing Award mentioned the concept of Augmented Language Model (ALM) in his paper published in early 2023, and mentioned three parts about ALM:

  • Reasoning: Decompose potentially complex tasks into simple subtasks that can be solved by the language model itself or by calling other tools.
  • Behavior: The tools invoked by ALM will affect the virtual or real world and observe the results.
  • Tools: the ability of the language model to call external modules through rules or special tokens, including retrieval systems that retrieve external information, or tools that can call robotic arms, etc.

The context length of the large models we can use today is too small to keep up with the expansion of the application scale. Therefore, the large model should have the ability to obtain data from the outside or influence the outside to expand the context. Here, we call this outside the environment. .

For example, "the large model manipulator picks up a cup of coffee from the table" [16], for this Act, Tools is the manipulator, Action is picking up, Action Input is the coffee on the table, and "the coffee is in the manipulator", " There is no coffee on the table" is Observation[16].

The example of WebGPT[17] in the figure is very similar to the gpt version of bing. It is a relatively pure Act-like large model. When a Question is raised to WebGPT, WebGPT will search the Web and give suggested results. Users can sort and filter these results , and then processed by WebGPT to generate Answer.

ReAct [2][12]

Before, Acting and Reasoning have been played separately. Even if they are done together, they are not viewed with architectural thinking. In October 2022, ReAct was proposed. Finally, Reasoning and Acting were connected together and have become the most capable player at the moment. de facto standard. So for this architecture, what is the practice of application engineering?

Seven, Agents architecture

Since the stunning launch of the initial version of AutoGPT in April this year, it has quickly become popular in the AI ​​application circle. One of the reasons is that the performance of AutoGPT seems to be closer to our yearning for the AI ​​​​application architecture:

For AutoGPT, we only need to set a demand goal for him and grant him resources and the ability to interact with resources, and then provide a set of rules to limit his behavior, then he can gradually approach the goal by "self-questioning and self-answering". With the help of the evaluation of the results, the requirements are finally completed.

Typically, unlike Chains, which relies on the human brain to think and solidify the reasoning process, he seems to be letting AI self-inspire the reasoning process. Compared with the chains architecture, the Reasoning and Acting processes of AutoGPT are automatic, which in practice negates the advantages of humans in Prompts engineering .

However, although this unmodified original self-questioning and self-answering method can help us solve some complex problems, its reasoning and decision-making ability is much less efficient than the way the human brain thinks and solidifies the reasoning and decision-making process. Insufficient effectiveness and flexibility in terms of world decision-making tasks. Its limited ability to engage with the real world and lack of benchmarks contribute to these uncertainties. Therefore, more optimization is needed to get close to our ideal design of AI application architecture.

The industry has long noticed the importance of the reasoning decision-making process in the application architecture, and made a benchmark for the effectiveness and flexibility of Auto-GPT similar applications. From LangChain Agents and the recently released Transformers Agent released by huggingface, which is still in the experimental stage, and Unity ML-Agents in the field of game development, I have learned a more complete AI application architecture that is differentiated by scenarios at the current stage, that is, the Agents architecture:

Agents [13] [24]

A typical Agents architecture includes the following components:

7.1 Agent

A well-tuned large model dedicated to reasoning and acting. Its core capabilities are task planning and reflection\continuous improvement, which requires strong reasoning and decision-making capabilities.

7.1.1 Mission planning

Task Planning: Break down large tasks into smaller manageable sub-goals so that complex tasks can be efficiently executed.

XoT & ReWOO [4]

The XoT (CoT, Cot-SC, ToT) mentioned in the previous sharing of reasoning are typical. Also introduce ReWOO, which is also a plan-based solution. The idea is that when a problem is raised, each Plan to solve the problem is formulated, and the result of the Plan is left blank (called a blueprint). It is executed by Worker, and the execution result is filled into this blueprint, and finally the result is handed over to the large model. Unlike the general scheme, it does not need to be executed step by step, which is a good way to highlight the "planning" ability scheme.

7.1.2 Reflection and continuous improvement

Then there is reflection and continuous improvement. Simply put, it is to provide improvement plans for the large model to help it learn from previous mistakes to better complete future tasks.

ART[6] & Reflexion[15] [8]

Taking ART as an example, this is a solution that requires supervision, which can precipitate the reasoning process that has occurred and recall it for reuse in the future. The process can be described as: a Task Library stores CoTs of various types of tasks. When asking questions about ART instances, it will find the most suitable Task case from the TaskLibrary and ask questions to the large model together with the user's questions. The final results will be reviewed by the human brain. And fix, the result will be persisted to TaskLibrary.

The Reflexion mentioned on the right replaces the human brain part with a language model, and converts it into a structure where the large model self-learns and optimizes its own behavior, and solves decision-making, programming and reasoning tasks through trial, error and self-reflection.

Excellent cases in the industry include ReAct, BabyAGI, etc., and ReAct is the current de facto standard with far-reaching influence. OpenAI also provides a tuning planning model based on GPT3.5 turbo \ 4.0 (version 0613) in the recently announced Function Call.

7.2 Memory

memory includes Context and History[13].

7.2.1 Context

Context context, which we are more familiar with, is similar to the STM (Short-term memory) of the human brain, which provides contextual capabilities for the Agent. The prompt word engineering of the current large model is based on the context.

7.2.2 History

Recall, similar to the LTM (Long-term memory) of the human brain, provides the Agent with the ability to store and recall associated data.

RAG[9] [14] & FLARE [8]

Retrieving data like WebGPT is a very common scenario. Different from traditional content retrieval, we also have some solutions to enhance retrieval through large models, such as: RAG, FLARE, etc.

In practice, the Approximate Nearest Neighbor (ANN) algorithm database that supports fast maximum inner product search (MIPS) is usually selected to match these schemes. There are many vector databases to choose from, and it is also a popular field in the current market, but I will elaborate more Interested students can learn about Alibaba Cloud's Tail-based VectorDB and the cloud-native vector data warehouse AnalyticDB PostgreSQL version, which will not be introduced in detail here.

7.3 Tools

A set of tools or all the external resources that the Agent can use, which is the callable and executable capability of the Agent. It can be a function, an API, or any other large model, including another Agent application, etc. [13]

ChatGPT plug-ins and OpenAI API Function Calls are the best examples of Tools application.

The current common idea applicable to the Internet is to provide APIs in different fields and the description and usage documents of these APIs, and let the Agent reason to judge whether the APIs that need to be used exist in Tools. This is a process of continuous consultation, calling, and verification:

API Bank: Augment Tool[3]

API Bank is a Benchmark tool that provides us with a feasible API call idea in his paper:

  • Step1. Provide the API Manual to the Agent. The Agent can use keywords to retrieve and summarize the required API usage in the API Manual in each of its planning tasks. The usage instructions can use Few-Shot or Zero-Shot according to the proposal proposed by Prompts Engineering. CoT guides the Agent.
  • Step2. Provide the API and input checker to the Agent. After the Agent has mastered the usage of the API, the Agent can generate the parameters required by the API and call the API to obtain the results. This process needs to constantly check whether the input parameters are correct and evaluate the output results Is it as expected.

8. Thinking about the future

Compared with Chains, Agents truly realize the potential of AI, and it is also an advanced architecture that is about to become popular. In my understanding of Agents, it is more like a Turing machine implementation architecture than the application layer architecture we usually discuss.

Let's recall the von Neumann architecture supported by Turing's theoretical architecture or the approximate Harvard architecture.

A fact is that in the actual development of von Neumann architecture or Harvard architecture devices, we will care about how to use the corresponding protocol to address and read and write bus to operate different devices, such as UART, I2C, SPI bus protocol, which are all We have to learn and master, but we basically don't care about the CU, ALU and other units in the CPU.

Another fact is that inside the PC CPU is a multi-bus Harvard architecture, and outside the CPU is a von Neumann architecture based on a single bus, and the system on a chip (SOC) will be further integrated and high-speed Operation related components.

The computer architecture continues to develop in this way while seeking common ground while reserving differences, and further encapsulates these units, high-speed storage and buses as abstract concepts. The AI ​​application is also similar. The Agent will continue to encapsulate the relevant planning, reflection and improvement capabilities as its own core capabilities.

Therefore, it is very likely that the future AI application will not be a program running on a traditional computer, but a standardized requirement, and it will run directly on the virtual instance of the AI ​​​​computer with the large model of the agent specialized in planning capabilities as the CPU. Today, we The application architecture discussed will also be deposited at the bottom layer and transformed into the core architecture of AI computers.

In the AI ​​computer, the Agent large model specialized in planning and decision-making will replace the traditional computing unit evaluation system based on the number of cores and the frequency of GHz with the assessment of planning and decision-making ability (Benchmark), and the peripheral equipment that the AI ​​​​computer relies on, namely Tools, will also Go deeper in different areas of expertise to provide more specialized execution capabilities.

In the end, the AI ​​computer will be Turing complete, and the iterative product will be promoted from the engineering field to the industrial field through AI bootstrapping.

And AI manufacturers will also change from the current Multi-Agents solutions such as MetaGPT and AgentVerse to manufacturers of AI computers and related clusters or other integration solutions. The era of one-person companies may come earlier by developing "advanced roles" that are two-in-one .

Author|Left

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

The third-year junior high school student wrote the web version of Windows 12 deepin -IDE officially debuted, known as "truly independent research and development " . Simultaneously updated", the underlying NT architecture is based on Electron "Father of Hongmeng" Wang Chenglu: The Hongmeng PC version system will be launched next year, and Wenxin will be open to the whole society . Officially released 3.2.0 Green Language V1.0 Officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10108319