Dr. Xiao Han, Founder of Jina AI: Unveil the brutal truth behind the hustle and bustle of Auto-GPT

Is Auto-GPT a groundbreaking project, or an overhyped AI experiment? This article gives us the truth behind the noise and reveals production limitations where Auto-GPT is not suitable for real-world applications.


background introduction

In the past two days, Auto-GPT, a model that allows the strongest language model GPT-4 to complete tasks independently, became famous overnight and made the entire AI circle crazy. In just seven days, it has gained an astonishing number of Stars on GitHub, which has exceeded 50,000, and has attracted the attention of countless open source communities.

The only thing that is not very easy to use in ChatGPT, which has exploded before, is that it requires humans to input Prompt. And a major breakthrough of Auto-GPT is that AI can prompt itself. In other words, AI does not need us humans at all?

While carnival for Auto-GPT, it is also necessary for us to take a step back and examine its potential shortcomings, and discuss the limitations and challenges faced by this "AI prodigy".

Next, Dr. Xiao Han will discuss with us whether Auto-GPT is a groundbreaking project or another artificial intelligence experiment that has been overhyped.


How does Auto-GPT work?

It has to be said that Auto-GPT has made huge waves in the field of AI. It is like endowing GPT-4 with memory and entities, allowing it to cope with tasks independently, and even learn from experience to continuously improve its performance.

To make it easier to understand how Auto-GPT works, let's break it down with some simple metaphors.

First, imagine Auto-GPT as a resourceful robot.

Every time we assign a task, Auto-GPT will give a corresponding solution plan. For example, if it needs to browse the Internet or use new data, it will adjust its strategy until the task is completed. It's like having a personal assistant that can handle various tasks like market analysis, customer service, marketing, finance, etc.

Specifically, for Auto-GPT to run, it needs to rely on the following four components:

1. Architecture

Auto-GPT is built using powerful GPT-4 and GPT-3.5 large language models, which act as the robot's brain, helping it think and reason.

2. Independent iteration

It's like a robot's ability to learn from its mistakes. Auto-GPT can look back at its work, build on previous efforts, and use its history to produce more accurate results.

3. Memory management

Integration with vector databases, an in-memory storage solution, enables Auto-GPT to preserve context and make better decisions. It's like equipping a robot with a long-term memory that can remember past experiences.

4. Versatility

Features such as file manipulation, web browsing, and data retrieval of Auto-GPT make it versatile. It's like giving robots multiple skills to handle a wider range of tasks.

However, these tantalizing prospects may not yet translate into the real achievable capabilities of Auto-GPT.

sky-high cost

If you want to use Auto-GPT in a real production environment, the first obstacle you face is its high cost.

Since the task needs to be completed through a series of thinking iterations, in order to provide better reasoning and prompts, each step of the model usually uses up all tokens.

However, GPT-4 tokens are not cheap.

According to OpenAI's pricing, the GPT-4 model with an 8K context window costs $0.03 per 1000 tokens for the hint part and $0.06 per 1000 tokens for the results part.

And 1000 tokens can be converted into about 750 English words.

Let's break down the cost of each step in the thought chain, assuming each action uses up a context window of 8000 tokens, of which 80% are hints (6400 tokens) and 20% are results (1600 tokens).

  • Tip cost: 6400 tokens x $0.03/1000 tokens = $0.192

  • Resulting cost: 1600 tokens x $0.06 / 1000 tokens = $0.096

So the cost per Step is: $0.192 + $0.096 = $0.288

On average, Auto-GPT takes 50 steps to complete a small task.

So the cost of completing a single task is: 50 steps x $0.288/step = $14.4

VUEGPT THOUGHTS: Let's start by checking if there are any updates to VueJS. If there are, we can update to the latest version and proceed. Otherwise, we can move on to creating the TODO list website application.
REASONING: Starting with the most updated and stable version of the framework will ensure our project has the latest features, bug fixes and is properly supported. Creating the TODO list website application is our primary goal, so we can move on to that if there are no updates.
PLAN:
-  Check for VueJS updates
-  Update to latest version if there are updates
-  If no updates, move on to creating the TODO list website application
CRITICISM: None
NEXT ACTION:  COMMAND = google ARGUMENTS = {'input': 'VueJS latest version update'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for VueGPT...

Take VueGPT as an example: this is an Auto-GPT created AI designed to create website applications using Vue JS, let's take a look at one of its steps in the chain of thought

And this is the case where the result can be produced once, if it needs to be regenerated, the cost will be higher.

From this perspective, Auto-GPT is currently unrealistic for most users and organizations.

Development-to-production challenges

At first glance, spending $14.40 to complete a complex task might seem like a good idea.

As an example, we first asked Auto-GPT to make a Christmas recipe. Then, ask it for a Thanksgiving recipe, and guess what?

That's right, Auto-GPT will do it all over again according to the same chain of thought , that is to say, we need to spend another $14.4.

But in fact, there should be only one difference in the "parameters" between these two tasks: festivals.

Now that we've spent $14.40 developing a method to create a recipe, it's not logical to spend the same amount tuning parameters.

Imagine playing Minecraft and building everything from scratch every time. Obviously, this would make the game very boring.

And this exposes a fundamental problem with Auto-GPT: it cannot distinguish between development and production.

When Auto-GPT completes its goals, the development phase is complete. Unfortunately, we have no way to "serialize" this series of operations into a reusable function for production.

Therefore, users have to start from the starting point of development every time they want to solve a problem, which is not only time-consuming and labor-intensive, but also expensive.

This inefficiency raises questions about the usefulness of Auto-GPT in real-world production environments and highlights the limitations of Auto-GPT in providing sustainable, cost-effective solutions to large-scale problem solving.

circular quagmire

Still, if $14.4 does the trick, it's still worth it.

But the problem is that when Auto-GPT is actually used, it often falls into an endless loop...

So why does Auto-GPT get stuck in these loops?

To understand this, we can think of Auto-GPT as relying on GPT to solve tasks using a very simple programming language.

The success of solving a task depends on two factors: the range of functions available in the programming language and the divide and conquer ability of GPT, that is, how well GPT can decompose the task into a predefined programming language. Unfortunately, GPT falls short on both of these points.

The limited functionality provided by Auto-GPT can be observed in its source code. For example, it provides functionality for searching the web, managing memory, interacting with files, executing code, and generating images. However, this restricted feature set narrows the range of tasks that Auto-GPT can effectively perform.

In addition, the decomposition and reasoning capabilities of GPT are still limited. Although GPT-4 has significantly improved over GPT-3.5, its reasoning ability is far from perfect, further limiting the problem-solving ability of Auto-GPT.

The situation is similar to trying to build a game as complex as StarCraft in Python. While Python is a powerful language, decomposing StarCraft into Python functions is extremely challenging.

Essentially, the combination of the limited feature set and GPT-4's constrained reasoning capabilities ended up creating a quagmire of this cycle, making Auto-GPT unable to achieve the desired results in many cases.

The difference between humans and GPT

Divide and conquer is the key to Auto-GPT. Although GPT-3.5/4 has significantly improved over its predecessors, its reasoning ability still cannot reach human level when using divide and conquer.

1. Insufficient problem decomposition

The effectiveness of divide and conquer depends heavily on the ability to decompose a complex problem into smaller, manageable sub-problems. Human reasoning can often find multiple ways to break down problems, while GPT-3.5/4 may not have the same level of adaptability or creativity.

2. Difficulty in identifying suitable base cases

Humans can intuitively choose appropriate base cases for efficient solutions. In contrast, GPT-3.5/4 may struggle to determine the most efficient base case for a given problem, which can significantly affect the overall efficiency and accuracy of the divide-and-conquer process.

3. Insufficient understanding of the problem background

While humans can use their domain knowledge and background understanding to better deal with complex problems, GPT-3.5/4 is limited by its pre-trained knowledge and may lack the background information needed to effectively solve some problems with divide and conquer.

4. Dealing with overlapping subproblems

Humans can often recognize when solving overlapping subproblems and strategically reuse previously computed solutions. Whereas GPT-3.5/4 may not have the same degree of awareness and may redundantly solve the same subproblem multiple times, resulting in a less efficient solution.

Vector databases: an overkill solution

Auto-GPT relies on vector databases for faster k-Nearest Neighbor (kNN) searches. These databases retrieve previous chains of thought and incorporate them into the context of the current query to provide GPT with a sort of memory effect.

However, considering the constraints and limitations of Auto-GPT, this approach has been criticized as excessive and unnecessarily resource consuming. Among them, the main argument against using a vector database stems from the cost constraints associated with the Auto-GPT chain of thought.

A 50-step thought chain will cost $14.4, and a 1000-step chain will cost more. Consequently, memory sizes or the length of thought chains rarely exceed four digits. In this case, an exhaustive search of nearest neighbors (ie, the dot product between a 256-dimensional vector and a 10000 x 256 matrix) proved to be efficient enough, taking less than a second.

In comparison, each GPT-4 call takes about 10 seconds to process, so it's GPT, not the database, that's actually limiting the system's processing speed.

Implementing a vector database in an Auto-GPT system to speed up kNN "long-term memory" searches seems like an unnecessary luxury and an overkill solution, although in certain scenarios vector databases may have advantages in some respects.

The Birth of the Agent Mechanism

Auto-GPT introduces a very interesting concept to generate agents (Agents) to delegate tasks.

Although, this mechanism is still in its infancy, and its potential has not been fully tapped. Still, there are ways to enhance and extend current agent systems, opening up new possibilities for more efficient and dynamic interactions.

A potential improvement is to introduce asynchronous agents . By incorporating the async-wait pattern, agents can operate concurrently without blocking each other, which significantly improves the overall efficiency and responsiveness of the system. The concept was inspired by modern programming paradigms that have adopted an asynchronous approach to managing multiple tasks simultaneously.

Image source: https://scutapm.com/blog/async-javascript

Another promising direction is to **enable agents to communicate with each other. **By allowing agents to communicate and collaborate, they can work together to solve complex problems more effectively. This approach is similar to the IPC concept in programming, where multiple threads/processes can share information and resources to achieve a common goal.

Generative agents are the way of the future

As GPT-powered agents continue to develop, the future of this innovative approach seems bright.

New research, such as "Generative Agents: Interactive Simulacra of Human Behavior," highlights the potential of agent-based systems to simulate believable human behavior.

The generative agents proposed in the paper can interact in complex and engaging ways, form opinions, initiate dialogues, and even plan and participate in activities autonomously. This work further supports the argument that agent mechanisms hold promise in the development of AI.

By embracing a paradigm shift towards asynchronous programming and facilitating inter-agent communication, Auto-GPT can open up new possibilities for more efficient and dynamic problem solving.

Integrating the architecture and interaction modes introduced in the "Generative Agents" paper can realize the integration of large language models with computing and interactive agents. This combination has the potential to revolutionize the way tasks are assigned and performed within an AI framework and enable more realistic simulations of human behavior.

The development and exploration of agent systems can greatly facilitate the development of AI applications, providing more powerful and dynamic solutions to complex problems.

Summarize

In conclusion, the buzz around Auto-GPT raises important questions about the state of AI research and the role of public understanding in driving hype for emerging technologies.

As demonstrated above, the limitations of Auto-GPT in terms of reasoning capabilities, the overuse of vector databases, and the early stages of development of the agent mechanism reveal that it is still a long way from being a practical solution.

The hype surrounding Auto-GPT is a reminder that superficial understanding can lead to inflated expectations, ultimately leading to a distorted perception of what AI is really capable of.

Having said that, Auto-GPT does point to a promising direction for the future of AI: generative agent systems.

Finally, Dr. Han Xiao concluded: "Let us learn from the Auto-GPT hype and foster a more nuanced and informed dialogue about AI research."

In this way, we can harness the transformative power of generative agent systems to continue pushing the boundaries of AI capabilities and shape a future where technology truly benefits humanity.

Author: Dr. Xiao Han, Founder and CEO of Jina AI
Translator: Xinzhiyuan Editorial
Department

Jina AI (https://jina.ai) utilizes cloud native, MLOps and LMOps to enable enterprises and developers to enjoy the best search and generation technology. Its core product Finetuner+ is customized and privatized for enterprises through advanced fine-tuning technology Deployed large model. A total of US$37.5 million in financing has been obtained from Chinese and American investment institutions such as GGV, Yunqi Capital, and SAP. The company is headquartered in Berlin, Germany, with offices in China and the United States. Team members come from top technology companies such as Microsoft, Google, Tencent, Adobe, etc., covering more than 10+ countries around the world.

Guess you like

Origin blog.csdn.net/Jina_AI/article/details/130201504