A small step for agent development, a big step for large model application

https://www.sohu.com/a/708426242_425761
Chat The large model taking off with GPT is undoubtedly the hottest track in the first half of the year. With the release of GPT-4, major Internet giants, technology companies, etc. have entered the game. In the domestic market, large models have emerged intensively in the past few months.

It has to be said that ChatGPT is an important milestone in the development of large models. It has pushed AI back to the center of the times and has become the commanding heights of a new round of digital technology competition.

While the "Battle of Hundreds of Models" is intensifying, OpenAI founding member Andrej Karpathy has turned his attention to the other end - Agent

"Whenever a new Agent paper comes out, the team will be excited and discuss it seriously.
You (developers) are all at the forefront of Agent development, and OpenAI has little accumulation in this field."
Andrej Karpathy, founding member of OpenAI In his hackathon speech, he said that compared to large model training, OpenAI is currently paying more attention to the Agent field.

What is Agent?

In the context of large models, it can be understood as a system that can autonomously understand, plan, and execute complex tasks.

Technology demonstration projects represented by AutoGPT and BabyAGI briefly became popular in April this year, but they are still some way away from being truly applied in business.

Now, the second round of Agent explosion is brewing, which is marked by a new round of applications that are more closely integrated with scenarios.

Not surprisingly, it was the programming and development industry that took action first.

The recent popular open source project Sweep directly integrates with GitHub's Issue and Pull Request scenarios to automatically "sweep" bug reports and feature requests and directly complete the corresponding code.

Among startups, there is also the Cursor code editor supported by OpenAI, which elevates code generation to the level of generating the entire project framework in one sentence.

Next, Agent will also become a new starting point and an indispensable component for building a new generation of AI applications in all walks of life.

In this regard, the founder of the startup Seednapse AI proposed a five-layer cornerstone theory for building AI applications, which attracted attention in the industry.

★ Models, which is the large model API we are familiar with.

★ Prompt Templates, prompt templates that introduce variables into prompt words to adapt to user input.

★ Chains, chain calls to the model, where the previous output is part of the next input.

★ Agent can independently execute chain calls and access external tools.

★Multi-Agent, multiple Agents share part of the memory and collaborate independently with each other.

In addition to entrepreneurial pioneers, even AI infrastructure giants have begun to work on Agent.

For example, the new features of Amazon Bedrock Agents announced at the Amazon Cloud Technology Summit in New York are the most representative manifestation of this trend.

Based on the fully managed basic model service, Amazon Bedrock Agents also packages and integrates the capabilities of developing, deploying, and managing multiple Agents.

If we follow the previous five-layer cornerstone theory, this type of service is equivalent to starting directly from the fifth layer, which greatly lowers the development threshold.

As Amazon Cloud Technology described at the press conference:

☞ Create generative AI applications that can perform tasks in just a few clicks.

It is foreseeable that Agent applications that have lowered the threshold will also explode in all walks of life.

Agent, the starting point of a new era of AI applications

What counts as an Agent application? OpenAI Chinese scientist Weng Lilian gave an intuitive "recipe":

☞Agent = large model + memory + active planning + tool use

Taking the Amazon Cloud Technology Platform as an example, when developing Agent applications, you must first select an appropriate basic model for the Agent based on specific task scenarios.

In addition to its own large Amazon Titan model, Amazon Bedrock also brings together various models such as Anthropic, which is good at security and controllability, Cohere, which is good at retrieving summary information, and stability.ai, which specializes in Vincentian graphs.

After selecting, directly describe the task instructions to be performed in words so that the Agent can understand the role to be played and the goals to be accomplished.

Instructions can be structured prompt words including a series of "questions-thinking steps-action steps-examples". With the support of ReAct (collaborative reasoning and action) technology, the basic model can find the corresponding solution through reasoning and decision-making.

The next highlight is Add Action Group.

The specific tasks to be completed by the Agent, as well as the tools that can be used, such as enterprise system APIs, Lambda functions, etc., are set here.

The official demo is an insurance claims management scenario, where the Agent manages insurance claims by extracting a list of open claims, identifying outstanding paperwork for each claim, and sending reminders to policyholders.

After all action groups are set up, Agent creation and deployment can be completed in just a few clicks.

After the deployment is completed, you can see in the test that the Agent understands the user request, breaks down the task into multiple steps (collecting open insurance claims, finding claim IDs, sending reminders) and performs corresponding operations.

Amazon Bedrock reduces the coding effort required to configure basic models through a wizard-based interactive interface.

The action group provides the ability to call APIs to implement specific functions and use your own data to build differentiated applications, allowing the basic model to complete more complex actual business tasks.

Throughout the process, you can also cooperate with various security services on the Amazon cloud technology platform. For example, use PrivateLin to establish a private connection between the basic model and the local network, so that all traffic will not be exposed to the Internet.

By providing fully managed services, developers can leverage the capabilities of the basic model without having to manage the underlying system.

Ultimately, the cycle from basic model to practical application is shortened, and the value created by the basic model for the business is accelerated.

What else should we pay attention to when accelerating large model applications?

With Amazon Bedrock's Agent capabilities, we can quickly put large models into actual business and achieve cost reduction, efficiency improvement or innovation for enterprises.

But to truly harness the full value of generative AI, realize its full potential, and compete with its competitors, private data is fundamental.

In other words, the key to the implementation of large model applications is the company's own valuable industry data.

How to integrate these rich resources into our Agent to ensure that our large model applications can efficiently access the correct information when executing tasks is a problem that every enterprise must face today.

Of course, all this must be done on the premise of ensuring privacy.

In addition to the integration and invocation of private data, on the road to the implementation of large model applications, the most basic support, computing power, is always a topic that is never tired of being talked about.

As we all know, current graphics card resources are extremely scarce and expensive.

For example, a survey found that Nvidia's H100 had been sold for more than $40,000 on overseas e-commerce platforms in mid-April this year, and it was not uncommon to even have a price tag of $65,000.

Whether purchased or rented, this has become a large expenditure for enterprises around the world in exploring generative AI applications.

How to make this expenditure more economical? This is also what every company thinks about.

It is worth noting that leading suppliers, represented by Amazon Cloud Technology, are providing systematic solutions to these challenges and pain points in the implementation of generative AI and solving the above problems one by one.

In response to the issue of personalized data, Amazon Cloud Technology announced that it will provide vector engines for three data services to facilitate generative AI applications and business integration.

We know that after the outbreak of generative AI, vector databases are also very popular. Because it can give responses more relevant to the model context than traditional relational databases (as shown in the figure below).

This latest service of Amazon Cloud Technology stores our private data in a database with a vector engine. When performing generative AI applications, we can easily query internal data of the enterprise through simple API calls.

Based on different needs such as the current data storage location, familiarity with database technology, expansion of vector dimensions, number of Embeddings, and performance requirements, Amazon Cloud Technology provides 3 options to meet:

-Amazon Aurora PostgreSQL compatible version of relational database, supports pgvector open source vector similarity search plug-in;

-Distributed search and analysis service Amazon OpenSearch, with k-NN (k nearest neighbor) plug-in and vector engine for Amazon OpenSearch Serverless;

-Amazon RDS (Amazon Relational Database Service) relational database compatible with PostgreSQL and supports pgvector plug-in.

Of course, the most noteworthy thing is the newly launched Amazon OpenSearch Serverless service. Its biggest advantage is that it allows enterprises to only care about the storage and retrieval of vector data without having to bear any burden of underlying operation and maintenance.

After solving the data integration problem, in terms of underlying support, Amazon Cloud Technology also directly launched the new Amazon EC2 P5 instance supported by H100 this time. This computing power resource, which was once quite rare for most enterprises, has now become "at your fingertips" .

It is understood that this instance contains 8 NVIDIA H100 Tensor Core GPUs, 640GB high-bandwidth GPU memory, and also provides third-generation AMD EPYC processors, 2TB system memory and 30TB local NVMe storage, as well as 3200Gbps aggregate network bandwidth and GPUDirect RDMA support. Enables lower latency and efficient scale-out performance.

Compared with the previous generation of GPU-based instances, Amazon EC2 P5 can shorten training time by up to 6 times (from days to hours) and reduce training costs by up to 40%.

Coupled with the Amazon EC2 Inf2 and Amazon EC2 Trn1n previously released by Amazon Cloud Technology based on self-developed chips, which also perform well, we can say that we have a lot of room for on-demand choices when it comes to computing power requirements. .

In addition to the above basic support, various out-of-the-box AI services are also "absent":

For example, Amazon CodeWhisperer, an AI programming assistant for the development process, is now integrated with Amazon Glue, expanding the AI ​​code generation scenario to a new group of people: data engineers, who only need natural language (such as "Use the content in the json file to create a Spark DataFrame"), these developers can handle a variety of tasks;

Another example is Amazon QuickSight for business intelligence (BI), which also allows business analysts to use natural language to perform daily tasks and create various data visualization charts in seconds;

There is also Amazon HealthScribe, which can be used in the medical industry to generate clinical documents and save doctors time.

These tools are designed to allow companies to focus on their core business and improve production efficiency.

Finally, to briefly summarize, we can find:

Since April this year, Amazon Cloud Technology has officially announced its entry into the generative AI market based on its own positioning and real user needs, providing services to all companies that want to use generative AI technology to accelerate or innovate their business.

In just four months, Amazon Cloud Technology has launched various base resources, ranging from basic models to computing power support, from private data storage to efficient development tools, with a wide range of applications.

The latest trends released at the New York Summit are everything needed to continue to increase the development of generative AI applications.

From the computing layer represented by Amazon EC2 P5 instances, to the tool layer represented by Amazon OpenSearch Serverless vector engine, Amazon Bedrock Agents, to the application layer represented by Amazon QuickSight, an end-to-end solution has been formed.

Among these, Amazon Cloud Technology continues to lower the threshold for generative AI. Whether you are a start-up or a traditional industry, no matter which layer you are in the generative AI process, you can find the right tools here without spending too much energy. On top of the underlying logic, you can quickly get into actual business.

As Swami Sivasubramanian, global vice president of database, data analysis and machine learning at Amazon Cloud Technology, said:

“I believe that generative AI will change every application, industry, and enterprise.”
In fact, as the AI ​​model war continues to escalate, generative AI has also entered the spotlight. A group of companies that have accumulated experience in the field of AI are also exploring application directions suitable for them, trying to find their own new opportunities from this unprecedented change.

The many services of Amazon Cloud Technology have undoubtedly won more development space for enterprises to reduce development costs and accelerate commercialization.

Guess you like

Origin blog.csdn.net/WitsMakeMen/article/details/133388584