Beyond ChatGPT-4, Google’s multi-modal large model Gemini combined with AlphaGo technology has been tested on a small scale

"  Google's multi-modal AI system Gemini is being tested on a small scale, which indicates that it will soon be open to the outside world. Gemini integrates multiple modes such as text and images, and uses reinforcement learning and other technologies in AlphaGo. The goal is to plan , memory, and multi-modal aspects. Gemini may become another landmark product after the ChatGPT series. "

32ef6b4061778f9654870bffdc6351a2.png

01

In the past two days, I saw news that Google’s multi-modal large model Gemini is being tested in a small range of enterprises, and I learned about it:

Google has already allowed a small group of companies to use an early version of Gemini software, which means it will soon incorporate it into consumer services and sell it to enterprises through the company's cloud computing services.

As early as May, Google officially announced that it had trained the Gemini multi-modal large model from scratch, saying that Gemini performed well in using tools and integrating APIs, and was committed to achieving innovations in memory (currently ChatGPT has no memory) and planning. . ‍‍‍‍‍‍‍‍

In July, there were news reports: SERGEY BRIN (co-founder of Google) returned to work at Google to work with artificial intelligence researchers to help build the Gemini system. ‍‍

The model, which was developed after the merger of Google Brain and DeepMind, will have trillions of parameters like GPT-4.

The first generation of Gemini should be trained on TPUv4. Subsequent iterations have begun to be trained on TPUv5-based pods, with a computing power of up to ~1e26 FLOPS, which is 5 times greater than training GPT-4.

02

DeepMind co-founder and CEO Demis Hassabis said that their engineers are learning from AlphaGo's technology to develop an artificial intelligence system called Gemini, which will surpass the system behind OpenAI's ChatGPT.

What is Google DeepMind Gemini?

Google DeepMind Gemini is a giant artificial intelligence language model designed from the ground up to be multi-modal, integrating text, images and other data types. Gemini's goal is to combine technology from AlphaGo with language models.

AlphaGo is an artificial intelligence program developed by DeepMind that defeated human Go masters in the game of Go. AlphaGo relies on reinforcement learning technology pioneered by DeepMind, which allows the software to learn to solve tough problems and make action decisions through repeated trials and receiving feedback on performance. It also uses a method called tree search to explore and memorize possible moves on the board.

Similar to AlphaGo, Gemini will use algorithmic deep learning and reinforcement learning techniques to solve complex problems. Gemini's development team hopes to apply reinforcement learning and tree search techniques from AlphaGo to language models to add new capabilities to the system, such as planning and problem solving.

Combined with the technology used in AlphaGo, it aims to give the system new functions such as planning or problem solving. From a macro level, Gemini combines many advantages of AlphaGo-like systems with the breathtaking language capabilities of large models.

Not only that, Gemini also combines the text capabilities of large language models such as GPT-4 with the ability to create artificial intelligence images based on text descriptions, similar to artificial intelligence image generators Midjourney and Stable Diffusion.

So it may be the first truly large multi-modal model.

Gemini aims to provide solutions in areas such as climate change, healthcare, aviation, food and agriculture. It will improve efficiency and accuracy in these fields by processing text data.

Gemini development costs could run into tens or even hundreds of millions of dollars.

03

Gemini Features‍‍‍‍

Gemini has tools and APIs available.
‍‍‍‍

Gemini may be the largest language model ever created, possibly exceeding the size of GPT-3 with over 175 billion parameters.

The Gemini is a "series model" that will be available in different sizes and features.

Gemini may leverage memory, fact-checking sources such as Google searches, and improve reinforcement learning to increase accuracy and reduce dangerous hallucinatory content.

Gemini aims to combine scale with innovation, and combining planning and memory is still in the early stages of exploration.

Gemini may employ retrieval methods to output entire chunks of information rather than generating them verbatim to improve factual consistency. (The current generation of ChatGPT models still enunciates words one by one.)

Gemini builds on DeepMind's multimodal work, such as the image captioning system Flamingo.

As for Bard mentioned in the previous " Google Bard Late Night Update: Supports Chinese, Voice Input/Broadcast, Code Export, Conversation Sharing ", such a conversational artificial intelligence system is "not the final state", but only a preliminary version. ‍‍

OpenAI expressed disdain for the statement that "Google Gemini may surpass GPT-4."

410e01d019c31f5f990ef92167d6aa30.png

Judging from the constant news about Gemini released by the media and the early access to Gemini provided to a small group of developers outside of Google, it is estimated that this large model will be opened to more users soon, and a beta version may be released and integrated. to services like Google Cloud Vertex AI. ‍‍‍‍‍‍

References

https://zhuanlan.zhihu.com/p/656514116

https://zhuanlan.zhihu.com/p/653023679

https://www.searchenginejournal.com/google-gemini-what-we-know-so-far/496494/#close

Reading recommendations:

What are the millions of ChatGPT users doing with it?

Better than ReACT, it allows large models to learn the groundbreaking experience-based learning strategy ExpeL by "analogy by analogy" while solving problems.

Foreign reports indicate that 90% of AI product companies have achieved profitability, but interviews with domestic large models and AIGC said that this is too high.

Research on hallucinations of large language models | Alleviating and avoiding large model LLM hallucinations (2)

Hello, I am Baichuan Big Model|The secret of Baichuan2, which is open source and free for commercial use in China

Is artificial intelligence safe? OpenAI is "aligning" large models with humans - ensuring ChatGPT is smarter than humans while still following human intentions

REACT: Collaborating reasoning and action in language models, enabling them to solve a variety of linguistic reasoning and decision-making tasks.

AI expert who has operated 10 online earning projects and 12 mini programs

Embrace the future and learn AI skills! Follow me and receive free AI learning resources.

Guess you like

Origin blog.csdn.net/fogdragon/article/details/132960371