Google Internal Documents Leaked! Neither Google nor OpenAI has a moat, and the threshold of large models is being broken by open source!

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —>【Transformer】WeChat Technology Exchange Group

Reprinted from: Heart of the Machine

Is the power of the open source community underestimated?

“We don’t have a moat, and neither does OpenAI.” In a recently leaked document, a researcher inside Google expressed such views.

The researcher believes that although it seems that OpenAI and Google are catching up with each other on the AI ​​model, the real winner may not come from these two, because a third-party force is quietly rising.

This power is called "open source". Around open source models such as Meta's LLaMA, the entire community is rapidly building models similar to OpenAI and Google's large model capabilities, and open source models are faster iterative, more customizable, and more private... "When free, People will not pay for a constrained model when the unrestricted alternatives are of comparable quality," the authors wrote.

The document was originally shared by an anonymous person on a public Discord server, and industry media SemiAnalysis, which has been authorized to reproduce it, said they had verified the authenticity of the document.

This article has been retweeted a lot on social platforms such as Twitter. Among them, Alex Dimakis, a professor at the University of Texas at Austin, expressed the following views:

  • Open source AI is winning, and I agree, it's a great thing for the world, and a great thing for building a competitive ecosystem. While in the LLM space, we haven't done this yet, we just had OpenClip beat openAI Clip, and Stable Diffusion is better than closed models.

  • You don't need a huge model, high-quality data is more effective and important, and the alpaca model behind the API further weakens the moat.

  • You can start with a good base model and a Parameter Efficient Fine-tuning (PEFT) algorithm like Lora works really well in a day. Algorithm innovation has finally begun!

  • Universities and the open source community should organize more efforts to curate datasets, train base models, and build fine-tuning communities like Stable Diffusion did.

d632c926c2162620a7c3d78fc4cfcbcb.png

Of course, not all researchers agree with the views in the article. Some people are skeptical about whether open source models can really have the power and generality of OpenAI's large models.

6d72e47a4e15942b96ea20b3b2d5a53b.png

However, for academia, the rise of open source power is always a good thing, which means that even without 1,000 GPUs, researchers still have something to do.

c8f2e5f765aa41a0e884aa0a3053f095.png

The following is the original text of the document:

Neither Google nor OpenAI have a moat

We don't have a moat, and neither does OpenAI.

We have been following the dynamics and development of OpenAI. Who will cross the next milestone? What's next?

But the uncomfortable truth is that we are not equipped to win this arms race, and neither is OpenAI. While we were bickering, the third faction had been taking advantage.

This faction is the "open source faction". Frankly, they are outrunning us. What we thought of as "significant unsolved problems" has now been solved, and it's in people's hands.

I give a few examples:

  • Large language models that can run on phones: People can run the base model on a Pixel 6 at a speed of 5 tokens/sec.

  • Scalable Personal AI: You can spend an evening fine-tuning a personalized AI on your laptop.

  • Responsible Release: This issue is less "solved" than "ignored". Some websites are full of artistic models without any restrictions, and the text is no exception.

  • Multimodal: The current multimodal scientific QA SOTA is trained in under an hour.

While our model still maintains a slight edge in quality, the gap is closing at an impressive rate. The open source model is faster, more customizable, more private, and all things being equal, more capable. They're doing something with $100 and 13 billion parameters that we're having trouble doing with $10 million and 54 billion parameters. And they can do it in weeks, not months. This has profound implications for us:

  • We have no secret weapon. Our best hope is to learn from and collaborate with others outside of Google. We should prioritize achieving 3P integration.

  • People won't pay for a restricted model when free, unrestricted alternatives are of comparable quality. We should consider where our added value is.

  • Huge models slowed us down. In the long run, the best models are those that can be iterated quickly. Now that we know what models under 20 billion parameters can do, we should have made them in the first place.

de973433e21d980c7eac74d986af9d64.png

The open source revolution wrought by LLaMA

In early March, with the leak of Meta's LLaMA model to the public, the open source community got its first really useful base model. The model has no command or dialogue adjustments, and no RLHF. Nonetheless, the open source community immediately grasped the importance of LLaMA.

What followed was a constant stream of innovation, with major advances occurring only a few days apart (e.g. running LLaMA models on a Raspberry Pi 4B, fine-tuning LLaMA instructions on a laptop, running LLaMA on a MacBook, etc.). Just one month later, variants of instruction fine-tuning, quantization, quality improvement, multimodality, RLHF, etc. have all appeared, many of which build on top of each other.

Most importantly, they have solved the scaling problem, which means that anyone can freely modify and optimize this model. Many new ideas come from ordinary people. The threshold for training and experimentation has moved from major research institutions to one person, one night, and a powerful laptop.

LLM's Stable Diffusion Moment

In many ways, this shouldn't come as a surprise to anyone. The current renaissance of open source LLM follows the renaissance of image generation in what many are calling LLM's Stable Diffusion moment.

In both cases, low-cost public participation is achieved through a much cheaper low-rank adaptation (LoRA) fine-tuning mechanism, combined with a major breakthrough in scale. The easy availability of high-quality models has helped individuals and institutions around the world conceive a range of ideas and allow them to iterate on them, quickly surpassing large corporations.

These contributions are crucial in the field of image generation and set Stable Diffusion on a different path than Dall-E. Having an open model enabled product integration, marketing, user interface and innovation that did not occur with Dall-E.

The effect was clear: Stable Diffusion's cultural impact quickly became dominant compared to OpenAI's solution. It remains to be seen whether similar trends will follow for LLM, but the broad structural elements are the same.

What did Google miss?

Open source projects use innovative approaches or techniques that directly address problems we are still grappling with. Focusing on open source efforts can help us avoid repeating the same mistakes. Among them, LoRA is an extremely powerful technology, and we should pay more attention to it.

LoRA represents the update of the model as a low-rank factorization, which can reduce the size of the update matrix by thousands of times. In this way, the fine-tuning of the model requires only a small cost and time. Reducing the time for individual tuning of language models to hours on consumer-grade hardware is important, especially for visionaries looking to incorporate new, diverse knowledge in near real-time. While the technology has had a big impact on some of the projects we want to accomplish, it's underutilized within Google.

The Magic Power of LoRA

One reason LoRA is so efficient: like other forms of fine-tuning, it stacks. We can apply improvements such as instruction fine-tuning to help with tasks such as dialogue and reasoning. While individual fine-tuning is low-rank when their sum is not, LoRA allows full-rank updates to the model to accumulate over time.

This means that, as newer and better datasets and tests become available, the model can be kept up to date cheaply without paying the full cost of running it.

In contrast, training a large model from scratch not only throws away the pre-training, but also throws away all previous iterations and improvements. In the open-source world, these improvements quickly prevail, making full-scale retraining prohibitively expensive.

We should seriously consider whether each new application or idea really requires an entirely new model. If we do have significant architectural improvements that preclude direct reuse of model weights, then we should aim for a more aggressive distillation approach that retains as much previous generation functionality as possible.

Large model vs. small model, who is more competitive?

The cost of a LoRA update is very low (~$100) for the most popular model sizes. This means that almost anyone with an idea can generate it and distribute it. At the normal pace of training time of less than a day, the cumulative effect of fine-tuning quickly overcomes the initial size disadvantage. In fact, in terms of engineer time, the models improve far faster than our largest variants can. And the best models are largely indistinguishable from ChatGPT. So focusing on maintaining some of the biggest models actually puts us at a disadvantage.

Data quality trumps data size

Many of these projects are about saving time by training on small, highly curated datasets. This suggests flexibility in the data scaling laws. This dataset exists out of ideas from Data Doesn't Do What You Think, and is fast becoming the standard way to train without Google. These datasets are scraped using synthetic methods (such as filtering out the best responses from existing models) and scraped from other projects, both of which are not commonly used at Google. Fortunately, these high-quality datasets are open source, so they are freely available.

Competing with open source is doomed

This recent development has very direct implications for business strategy. Who would pay for a Google product with a usage limit when there was a free, high-quality alternative with no usage limit? Moreover, we should not expect to be able to catch up. The modern Internet runs on open source because open source has some significant advantages that we cannot replicate.

"We need them" more than "They need us"

Keeping our technology secret has always been a tenuous proposition. Google researchers are regularly traveling to other companies to learn, so it can be assumed that they know everything we know. And as long as this pipeline is open, they will continue to do so.

But as cutting-edge research in the field of LLMs becomes affordable, maintaining a technological competitive advantage is becoming increasingly difficult. Research institutions around the world are learning from each other to explore the solution space in a breadth-first manner that is far beyond our own capabilities. We can try to hold on to our own secrets, but outside innovations dilute their value, so try to learn from each other.

Individuals are not subject to licenses like businesses

Most innovations build on the model weights leaked by Meta. This will inevitably change as truly open models get better, but the point is they don't have to wait. The legal protections offered by "personal use" and the impracticality of individual prosecutions meant that individuals could use these technologies while they were hot.

Owning the Ecosystem: Making Open Source Work Work for You

Paradoxically, there's only one winner from all of this, and that's Meta, after all the leaked model is theirs. Since most open source innovations are based on their architecture, there is nothing preventing them from being directly integrated into their own products.

As you can see, the value of owning an ecosystem cannot be overemphasized. Google itself already uses this paradigm in open source products like Chrome and Android. By incubating a platform for innovative work, Google has cemented itself as a thought leader and direction setter, gaining the ability to shape ideas bigger than itself.

The tighter our control over the model, the more attractive it is to make open alternatives, and both Google and OpenAI favor a defensive release model that allows them to tightly control how the model is used. However, such control is impractical. Anyone who wants to use LLMs for unsanctioned purposes can opt for freely available models.

Therefore, Google should position itself as a leader in the open source community, taking the lead by collaborating with the broader conversation rather than ignoring it. This might mean taking the uncomfortable step of publishing model weights for small ULM variants. It also necessarily means giving up some control over one's own model, but such compromises are unavoidable. We cannot hope to both drive innovation and control it.

Where is the future of OpenAI?

All of this open source discussion feels unfair given OpenAI's current closed policy. If none of them are willing to disclose technology, why should we share it? But the fact is that we have shared everything with OpenAI's senior researchers through a steady stream of poaching. Secrecy remains a moot point until we stem the tide.

In the end, OpenAI doesn't matter. They've made the same mistakes we have made with their open source stance, and their ability to maintain their edge is bound to be questioned. Unless OpenAI changes its stance, open source alternatives can and will overshadow them. At least in this regard, we can take this step.

Original address: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

Click to enter —>【Transformer】WeChat Technology Exchange Group

The latest CVPR 2023 papers and code download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

Background reply: Transformer review, you can download the latest 3 Transformer review PDFs

目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watch5fd3c65203f0a6aa4f2d17b6f7739ad2.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/130538217