Led by the Tsinghua team, the first AI agent systematic benchmarking website came out AgentBench.com.cn

AI agents, or autonomous intelligent agents, are not only human super assistants in sci-fi movies such as Jarvis, but have also been a research hotspot in the field of AI in the real world. In particular, the emergence of AI large models represented by GPT-4 has pushed the concept of AI agents to the forefront of technology.

In the previously popular Stanford "virtual town", 25 AI agents grew freely in the virtual town and held a Valentine's Day party; the embodied agent model Voyager proposed by Nvidia and others also learned in "Minecraft" Various survival skills have created their own world; in addition, AutoGPT, BabyAGI and AgentGPT, which can complete tasks independently, have also aroused widespread interest and heated discussions among the public.

Even Andrej Karpathy, the former Tesla AI director and returning to OpenAI, revealed at a developer event that whenever a new AI agent paper appears, OpenAI will be very interested and discuss it seriously .

Although the current AI agent research is extremely hot, the current AI industry lacks a systematic and standardized benchmark to evaluate the intelligence level of LLMs as agents.

To this end, a research team from Tsinghua University, The Ohio State University, and the University of California, Berkeley proposed the first systematic benchmark test——AgentBench (agentbench.com.cn ) , which is used to evaluate LLMs as agents in various real-world World challenges and performance (such as reasoning and decision-making skills) in 8 different environments.

Guess you like

Origin blog.csdn.net/qinglingye/article/details/132272949