OpenAI’s economic benefits of scale and second moat

090741feec194f55e3a956bafccfcd4b.jpeg

Although many large language models with excellent performance have been open sourced in the industry, the high deployment cost of OpenAI's closed-source model has discouraged most organizations that want to use open-source LLM models.

OpenAI's cost advantage comes from scale effects on the one hand, and its deep accumulation of infrastructure on the other. To be successful, open source LLM providers must catch up or even surpass OpenAI in these two aspects.

In addition, the author believes that open source LLM needs to be continuously improved to reduce application complexity and take advantage of customized requirements.

The author of this article, Dr. Vikram Sreekanti, graduated from the RISE Laboratory of the University of California, Berkeley, and studied data systems and distributed systems. Joseph E. Gonzalez is a professor at the University of California, Berkeley. The two co-founded RunLLM, a company that provides a developer platform for the LLM stack. , its products provide easy-to-use, scalable components that enable users to quickly define, deploy and run LLM-based applications.

(This article is compiled and published by OneFlow. Please contact us for authorization for reprinting. Original text: https://generatingconversation.substack.com/p/openai-is-too-cheap-to-beat)

Source | Generating Conversation

OneFlow compilation

Translation|Yang Ting, Wan Zilin

Since the advent of the Internet, the data flywheel has spawned some giant companies, first Google and various social media, and now OpenAI and other large language model providers have emerged.

OpenAI's user adoption alone likely exceeds that of the other large model vendors combined, with Google and Anthropic accounting for most of the remaining market share. These companies are collecting massive amounts of data, and not only can they see user prompts, they can also get explicit feedback (likes or dislikes) as well as implicit feedback (for example, they can remind users to provide more information for questions if they don’t get the ideal answer). detail). In addition, they actively communicate with customers to understand the needs and model limitations of LLM users.

The data and feedback described above are critical to training future models, and related investments are accelerating. Anthropic CEO Dario Amodei recently predicted that their model will cost $10 billion in the next two years.

Model quality is important, but it is only part of the model's advantage. The scalability of infrastructure and the quality of model services are more important moats for these companies. The following uses fine-tuning API as an example to illustrate.

The RunLLM team has recently been experimenting with GPT’s fine-tuning API. The cost of a fine-tuning run of GPT-3.5 is between 4 and 12 US dollars, and it takes about 1 to 1.5 hours to fine-tune 1 million tokens.

At the same time, the cost of a p4d.24xlarge on AWS is US$32.77 per hour (charged on demand), and if subscribed for one year, the cost is US$19.22 per hour. Each machine is equipped with 8 Nvidia A100 GPUs. Assuming that OpenAI only uses 8 GPUs to fine-tune GPT-3.5, OpenAI is 3-8 times cheaper than renting a p4d.24xlarge from Amazon, and this is without taking into account the technical expertise required to deploy and run the task. .

Apparently, Amazon AWS charges a premium for the EC2 instances it provides. In contrast, OpenAI's costs include: training and storing model weights (possibly using relatively cheap LoRA technology), building and maintaining fine-tuning infrastructure, and the expertise required to manage a large number of GPUs in-house [1].

If you have a sufficiently intensive workload, you might consider subscribing to the p4d.24xlarge on an annual basis, which works out to about $166,000 per year at $19.22 per hour.

Assuming we use LoRA again to fine-tune the model on 8 A100 GPUs, each fine-tuning run may take 2 hours. Twelve fine-tuning runs are possible per day, and on these GPUs, 4380 fine-tuning runs are possible per year. We could assign an engineer to deploy, check, and verify the fine-tuning runs (we admire them!), and that would probably cost about $200,000 per year. (Assuming we have a lot of data available, fine-tuning can continue.)

At a cost of $366,000 per year ($166,000 for AWS and $200,000 for labor), the cost of each fine-tuning is about $80, which is 8-20 times higher than what we paid OpenAI !

This is only the fine-tuning cost of the model. Although the fine-tuned GPT-3.5 single word inference cost is 10 times more expensive than GPT-3.5, it is still 10 times cheaper than GPT-4! The cost of deploying model services on your own hardware will increase significantly unless you can achieve a large enough scale to fully utilize the server hardware or achieve elastic expansion (which is impossible to achieve when GPU resources are limited).

The above rough estimate proves a key point:For the major language large model vendors, the advantage lies not only in the quality of the models, but also in their ability to achieve extremely high economies of scale. Benefits provide model services. For most organizations, without good infrastructure, it makes no economic sense to deploy language models themselves. There is no need for them to waste time, manpower and financial resources on an unsolvable optimization problem, while competitors will integrate technology based on OpenAI, progress faster and potentially achieve better model quality.

Of course, this does not mean that the open source model has no future. Last week, Nathan Lambert also published an article on Interconnects about the future of the open source model. Open source models must significantly reduce cost, application complexity, and take advantage of customization requirements over time.

In other areas, the major language model vendors will dominate.

Note:

[1] You may be curious whether OpenAI will incur fine-tuning and service costs in order to capture market share, just like Uber and Lyft have done in the ride-hailing market for many years. As we all know, these online ride-hailing companies have not completely killed competition as many people predicted, but the switching costs in software infrastructure are much higher than the switching costs in mobile apps. Even if prices eventually rise, these companies will still dominate the market, and they will still have a huge gap to fill until they reach the cost level of home-grown models.

It's important to note that we're comparing existing GPU pricing offered by AWS to OpenAI's potentially heavily subsidized GPU pricing on Azure, and OpenAI's scale will only further solidify their advantage here.

Everyone else is watching

试用OneFlow: github.com/Oneflow-Inc/oneflow/

Guess you like

Origin blog.csdn.net/OneFlow_Official/article/details/133897255