Google releases large model Gemini to catch up with GPT4

The largest and most powerful Google model yet is here. On December 6, local time, Google CEO Sundar Pichai officially announced that Gemini version 1.0 was officially launched.

The Gemini large model released this time is a native multi-modal large model. It is the first step in a new era of Google large models. It includes three levels: Gemini Ultra, the most capable, Gemini Pro suitable for multi-tasking, and Gemini Pro suitable for multi-tasking. Gemini Nano for specific tasks and end-sides.

640.gif

Now, Google's ChatGPT-like application Bard has been upgraded to the Gemini Pro version, enabling more advanced reasoning, planning, understanding and other capabilities, while continuing to remain free. Google is expected to launch "Bard Advanced" early next year, which will use Gemini Ultra.

This is the biggest update since Bard's introduction.

Since the release of ChatGPT, we have been very curious about the capabilities of the competing Gemini model claimed by Google. This large model was rumored as early as March this year and entered the "coming soon" status at the I/O conference in May.

As people familiar with the matter continue to reveal new information, we can learn: Gemini is said to have trillions of parameters, and the computing power used for training is five times that of GPT-4. However, the official release of Gemini seems to have been repeatedly delayed for various reasons.

In order to compete with OpenAI and Microsoft, Google decisively switched from PaLM 2 to Gemini, and even directly merged Google Brain and DeepMind in April this year. Gemini merged the two experiments with the newly formed Google DeepMind. The room's strength is used to tackle key problems.

image.png

This shows Google’s all-or-nothing mentality in the large-model arms race.

So, can Gemini really surprise us? In addition to achieving the best results on various Benchmarks, even surpassing humans, what is interesting is that at the press conference, when faced with reporters’ questions about “What are the new capabilities of Gemini compared to previous large models?” Google DeepMind products Vice President Eli Collins responded, "I doubt it does," saying that Google is still trying to understand Gemini Ultra's full capabilities.

The following is a statement from Google CEO Pichai:

Every technological change is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the AI ​​shift we are seeing now will be the most profound shift in our lifetimes, far greater than the previous shifts to mobile or the web. Artificial intelligence has the potential to create opportunities for people around the world, from the everyday to the extraordinary. It will usher in a new wave of innovation and economic progress and drive knowledge, learning, creativity and productivity at an unprecedented scale.

This excites me: the opportunity to make artificial intelligence helpful to everyone, everywhere.

We're almost eight years into our journey as an AI-first company, and the pace of progress is only accelerating: millions of people are now using generative AI in our products to do things they couldn't do just a year ago. Everything from finding answers to more complex questions to using new tools to collaborate and create. At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing using our AI tools.

This is incredible momentum, yet we’ve only begun to scratch the surface of what’s possible.

We are doing this work boldly and responsibly. This means being ambitious in our research and pursuing capabilities that can bring huge benefits to people and society, while building safeguards and working with governments and experts to address the risks of AI becoming more powerful. We will continue to invest in the best tools, foundational models, and infrastructure and bring them into our products and beyond, guided by our AI principles.

Google’s large model Gemini is officially released

Google DeepMind CEO and co-founder Demis Hassabis officially launched the large model Gemini on behalf of the Gemini team.

Hassabis said that Google has long wanted to build a new generation of large AI models. In his view, what AI brings to people is no longer just smart software, but more useful and intuitive expert assistants or assistants.

Today, Google finally unveiled its big model, the Gemini, becoming the most powerful and versatile model it has ever built. Gemini is the result of a massive collaboration across teams at Google, including researchers at Google Research.

Of particular note, Gemini is a multimodal large model, meaning it can generalize and seamlessly understand, manipulate, and combine different types of information, including text, code, audio, images, and video.

Google said that Gemini is also their most flexible model to date, able to run efficiently on multiple types of platforms, including data centers and mobile devices. The SOTA capabilities provided by Gemini will significantly enhance the way developers and enterprise customers build and scale AI.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

Currently, Gemini 1.0 provides three different size versions, as follows:

  • Gemini Ultra: The largest and most capable, used to handle highly complex tasks;
  • Gemini Pro: The best model that scales on a variety of tasks;
  • Gemini Nano: The most efficient model for on-device tasks.

Google rigorously tested Gemini models and evaluated their performance on a variety of tasks. From tasks such as natural image, audio and video understanding to mathematical reasoning, Gemini Ultra outperformed current SOTA results in 30 of 32 academic benchmarks widely used in large language model development.

In addition, Gemini Ultra achieved a score of 90.0% in the MMLU (Massive Multi-task Language Understanding dataset), surpassing human experts for the first time. The MMLU data set contains 57 subjects including mathematics, physics, history, law, medicine, and ethics, and is used to test the knowledge reserve and problem-solving ability of large models.

A new approach to the MMLU test set allows Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, resulting in significant improvements in performance compared to just answering based on first impressions of the question.

image.png

For more details, please view the detailed test report:https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf

In the latest version of the MMMU test set, Gemini Ultra also achieved the best result with a score of 59.4%. The enhanced test set consists of multimodal tasks that require deliberative reasoning.

In the image benchmark test, Gemini Ultra can perform OCR processing without extracting text from the image, which highlights the powerful multi-modal capabilities built into Gemin and initially shows the harbinger of Gemini's more complex reasoning capabilities.

Next generation all-round capability upgrade

Gemini is designed to support multi-modality natively, pre-trained on different modalities from the beginning, and then fine-tuned with additional multi-modal data to improve effectiveness. As a result, Gemini is able to seamlessly understand and reason about a variety of inputs, far outperforming existing multi-modal models, and its capabilities are among the strongest in almost every domain.

complex reasoning ability

Gemini 1.0 features sophisticated multimodal reasoning capabilities to help understand complex written and visual information. This makes it particularly good at discovering hard-to-discern knowledge in massive amounts of data. Gemini 1.0 has the extraordinary ability to extract insights from hundreds of thousands of documents by reading, filtering and understanding information, which helps make new breakthroughs at ultra-fast speeds in many fields such as science and finance.

Simultaneously understand information in text, images, audio and more modalities

After training, Gemini 1.0 can simultaneously recognize and understand text, images, audio, and more, so it can more fully understand the details of the information in the input and answer questions about complex topics. As such, it is particularly good at reasoning about problems in complex subjects such as mathematics and physics.

As shown below, a teacher draws a physics problem of a skier coming down a slope, while a student proposes a solution to calculate the skier's speed at the bottom of the slope. Utilizing Gemini's multi-modal reasoning capabilities, the model can read messy handwriting, correctly understand problem formulations, convert both problems and solutions into mathematical formulas, identify the specific reasoning steps where students make mistakes when solving problems, and then provide The right solution to the problem.

image.png

advanced coding

Gemini can understand, interpret, and generate high-quality code in popular programming languages ​​(such as Python, Java, C++, Go), and its strong ability to work across languages ​​and reason about complex information makes it one of the world's leading coding foundation models.

Gemini Ultra performs well on several coding benchmarks, including HumanEval, an important industry standard for evaluating performance on coding tasks, and Natural2Code, an internal Google dataset that uses author-generated source code rather than web-based information. .

Gemini can also be used as an engine for more advanced encoding systems. Two years ago, Google launched AlphaCode, the first artificial intelligence code generation system to reach competitive levels in programming competitions.

Using a specialized version of Gemini, Google created AlphaCode 2, a more advanced code generation system that excels at solving competitive programming problems that go beyond coding and involve complex mathematics and theoretical computer science.

image.png

Evaluated on the same platform as the original AlphaCode, AlphaCode 2 showed a huge improvement, solving nearly twice as many problems.

image.png

Dedicated TPU training

Google trained Gemini 1.0 at scale on AI-optimized infrastructure using in-house designed tensor processing units (TPU) v4 and v5e, designed to be the most reliable, scalable training model and most efficient serving model .

On the TPU, Gemini runs significantly faster than earlier, smaller, less capable models. These custom-designed AI accelerators are at the heart of Google's artificial intelligence products, which power billions of users in Search, YouTube, Gmail, Google Maps, Google Play and Android. They also help companies around the world cost-effectively train large-scale AI models.

Today, Google also released the most powerful, efficient, and scalable TPU system to date—Cloud TPU v5p, designed for training cutting-edge artificial intelligence models. The new generation of TPU will accelerate the development of Gemini, helping developers and enterprise customers to train large-scale generative AI models faster, so that new products and new features can reach customers faster.

image.png
Starting today, Google will add Gemini to its products, such as Bard, which will use a fine-tuned version of Gemini Pro to perform more advanced tasks such as reasoning, planning, understanding, and more. This is also the biggest upgrade to Bard since its launch.

The upgraded version of Bard will be available in English in more than 170 countries, and will expand to more modalities and support more languages ​​in the near future.

Google is also bringing Gemini to Pixel. The Pixel 8 Pro will be the first smartphone to run Gemini Nano.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

In the coming months, Gemini will appear in more Google products and services, including search, ads, Chrome, Duet AI, and more.

Google says it has been experimenting with Gemini in search, making the search generation experience (SGE) faster for users, with 40% less latency and improved quality.

User guide and future plans

Finally, how do developers use Gemini?

Starting December 13, developers and enterprise customers can access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Starting with Pixel 8 Pro devices, Android developers can also build with Gemini Nano through AICore. Android AICore is a new system service in Android 14 that handles model management, runtime, security features, and more, making it easier for users to integrate AI into their applications.

image.png

AICore implements low-rank adaptation (LoRA) fine-tuning with Gemini Nano. This powerful concept enables application developers to create small LoRA adapters based on their own training data. The LoRA adapter is loaded by AICore, resulting in a large language model fine-tuned for the application's own use cases.

In addition, Google revealed that Gemini Ultra will be released soon, as well as Bard’s next upgrade plan.

The Gemini Ultra model is currently undergoing a trust and safety check phase, including a red team of trusted external parties, and further refinement of the model using fine-tuning and reinforcement learning with human feedback (RLHF).

As part of this process, Google will make Gemini Ultra available to select customers, developers, partners, and security and liability experts for early experimentation and feedback, before rolling it out to developers and enterprise customers early next year.

Gemini Ultra is Google's largest and most powerful model, designed for highly complex tasks. The first way regular users will experience Gemini Ultra will be through Bard Advanced, which Google will launch early next year.

Google said it will work to expand Gemini's capabilities in the future, including advancements in planning and memory, as well as increasing contextual windows to process more information for better responses.

原文链接:Introducing Gemini: our largest and most capable AI model

Guess you like

Origin blog.csdn.net/xiangzhihong8/article/details/134885018