Moving towards 2024, how do we think about AI entrepreneurship?

In December 2023, "Nature" magazine released its annual list of "Top Ten Influential People in Science". This year, for the first time in history, a "non-human" was selected - the list includes ChatGPT. "Nature" pointed out: "Although ChatGPT is not an individual and does not fully meet the selection criteria, we decided to make an exception for inclusion to recognize that generative artificial intelligence is fundamentally changing the development trajectory of science."

▲ Image source: Nature

On the technology landscape of 2023, generative AI undoubtedly marks an important turning point. Its development has not only attracted widespread attention in the industry, but has also had a profound impact on the global economy, social structure and even our expectations for the future.

This is an AI innovation that every ordinary person can participate in. From the continued development of large-scale language models, to the widespread application of AI technology in different industries, to the ongoing battle between open source and closed source strategies, every step of AI development is depicting the outline of future trends.

Facing the surging tide, the country has successively introduced a series of policies and measures to support the development of AI in the "14th Five-Year Plan for National Informatization" and "Guiding Opinions on Accelerating Scenario Innovation and Promoting High-Quality Economic Development through High-level Application of Artificial Intelligence" . China's artificial intelligence industry has also grown rapidly, and a number of internationally competitive AI companies have emerged.

At the end of the year, we review the development of generative AI in 2023, discussing the impact of technology on humans, industry landscape and future development trends, entrepreneurship and investment opportunities, etc. This is not only a review of the development of the AI ​​field in the past year, but also a reflection on the development direction of AI.

Let me share the core conclusions first:

  • Before the truly valuable AI application ecosystem flourishes, it makes sense to bet on core technology sources such as large models and "shovel selling" companies. But the AI ​​applications that are currently booming are also the source of value creation and the sea of ​​stars we want to pursue.

  • Closed-source large language models such as OpenAI will charge traffic fees to APPs connected to their ports. In order to reduce the burden of traffic costs, one way for application companies is to use open source models and train a small and medium-sized model themselves. Another way is to optimize the business model to balance traffic costs.

  • As AI technology advances, the way we work will also change. AI technology may reconstruct both people's workflow and the workflow of the language model itself.

  • How to make good use of extremely intelligent tools like AI is undoubtedly a huge challenge for humans. However, let’s not be so pessimistic. AI’s capabilities have limits.

  • In the field of AI technology, the United States and China have different development paths. The leading large language model camp in the United States has been basically established, and China's large language models are blooming. For China, it is more important to vigorously develop the AI ​​application ecosystem.

  • AI Agent is an entrepreneurial direction worthy of attention. AI Agent is a kind of intelligent software that can autonomously perform tasks, make independent decisions, actively explore, self-iterate, and collaborate with each other.

  • Although many technological breakthroughs have been achieved in the field of large language models, there are still many areas that can be iterated and improved, such as reducing "illusions", increasing context length, realizing multi-modality, embodied intelligence, performing complex reasoning, and self-iteration, etc. wait.

  • Several key points for entrepreneurship in the field of AI applications: create a high-quality native new application experience; be more forward-looking, discover non-consensus, and disruptive; focus on user growth and commercialization potential; grasp the dividends of macro trends; maintain a safe distance from large models, It has its own business depth; the most important thing is the team.

  • Startups must dare to do the right thing rather than the easy thing in non-consensus areas .

/ 01 / 

What are the new changes in the AI ​​field in 2023?

The development of AI so far can be divided into two stages from an industry perspective: stage 1.0 mainly focuses on analysis and judgment, while stage 2.0 focuses more on generation. The representative models in the 2.0 stage are large-scale language models and image generation models. The two algorithm models, Transformer and Diffusion Model, promote the development of generative AI.

For most of 2023, OpenAI, a start-up company's products, have been firmly at the top of the list of high-performance large-scale language models. Especially after OpenAI released the GPT-4 language model in March, it almost topped the list. However, Google successfully released the latest large-scale language model Gemini in December, forming a dual-power structure with GPT-4.

In the field of AI, the open source model community has never been absent. With the support of Meta (formerly Facebook)'s open source large language models LlaMa and LlaMa2, the open source model community is conducting intensive scientific research and engineering iterations, such as: trying to use smaller models to release capabilities similar to large models; supporting more Long context; use more efficient algorithms and frameworks to train models, etc.

Multimodality (multimedia forms such as images and videos) has become a hot topic in the field of AI research. Multimodality is divided into two aspects: input and output. Input refers to allowing the language model to understand the information contained in images and videos, and output refers to generating other media forms in addition to text, such as Vincent pictures. Considering that humans’ ability to generate and obtain data is limited, it may not be able to support the training of artificial intelligence for a long time. In the future, it may be necessary to use data synthesized by AI itself to train language models.

In the field of AI infrastructure, NVIDIA has become the industry leader due to the huge market demand for its GPUs, joining the $1 trillion market capitalization club. But it will also face fierce competition from chip manufacturers such as old rivals AMD and Intel, as well as major manufacturers such as Google, Microsoft, and OpenAI, and language model upstarts.

In addition to large models, the industry has strong demand for various types of AI applications. Generative AI has made significant progress in many fields such as images, videos, programming, voice, and intelligent collaboration applications.

Users around the world have shown great enthusiasm for generative AI. ChatGPT reached 100 million monthly active users in just 2 months. Compared with the super apps of the smartphone era, with a large promotion budget, TikTok took 9 months, Instagram took 2.5 years, WhatsApp took 3.5 years, and YouTube and Facebook took 4 years.

picture

▲ The time it takes for different types of technology applications to reach 100 million monthly active users.

Image source: 7 Global Capital

Venture capital institutions are also investing heavily to support progress in the AI ​​field. According to statistics from the US investment institution COATUE, as of November 2023, venture capital investment institutions have invested nearly US$30 billion in the AI ​​field, of which about 60% is invested in large language model upstarts such as OpenAI, and about 20% is invested in supporting and delivering these models. of infrastructure (AI cloud services, semiconductors, model operation tools, etc.), about 17% is invested in AI application companies.

picture

▲ Picture source: COATUE

Before the truly valuable AI application ecosystem flourishes, this investment logic of betting on the source of core technology and "shovel-selling" companies makes some sense. But the AI ​​applications that are currently booming are also the source of value creation and the sea of ​​stars we want to pursue.

▎Multiple technological breakthroughs have occurred in the field of multi-modal generation

In 2022, after Stable Diffusion became open source, we witnessed the launch of a large number of "Vincent Picture" (images generated from text) products. This year can be seen as the year the problem of image generation was solved.

Then in 2023, the technology of using AI to recognize sounds and produce audio has also made significant progress. Today, AI speech recognition and synthesis technology is very mature, and it is difficult to distinguish synthesized voices from human voices.

As technology continues to develop, video generation and processing will be the focus of the next stage of AI development. At present, there have been many technological breakthroughs in the field of "Venture Video" (video generated from text), and AI has shown potential and possibility in video content generation. With models and applications such as AI video newcomers Runway Gen-2, Pika, and Stanford University's WALT, users only need to enter a description of the image to get a video clip.

Jim Fan, a well-known engineer at NVIDIA, believes that AI will most likely make progress in the video field in 2024.

picture

▲ Picture source: X.com

If we think about different media formats in another dimension, then a two-dimensional image becomes a video if we add a time dimension. If you add a spatial dimension, it becomes 3D. If the 3D model is rendered, we can get a more precisely controllable video. Maybe AI can gradually conquer 3D models in the future, but it will take longer.

“Compression is intelligence”

In 2023, Ilya Sutskever, chief scientist of OpenAI, put forward the view of "compression is intelligence" in an external sharing, that is, the higher the compression ratio of the language model to text, it means that it The higher the level of intelligence.

Compression is intelligence, which may not necessarily be rigorous, but it provides an explanation that is in line with human intuition: the most extreme compression algorithm, in order to compress data to the extreme, must abstract higher-level meanings based on full understanding.

Take Llama2-70B, a language model developed by Meta, as an example. It is a 70 billion parameter version of the Llama2 model and is currently one of the largest open source language models.

Llama2-70B uses approximately 10T (10 trillion) bytes of text as training data. The trained model is a 140GB file with a compression ratio of approximately 70 times (10T/140G).

In daily work, we usually compress large text files into Zip files, and the compression ratio is about 2 times. In contrast, you can imagine the compression strength of Llama2. Of course, Zip files are lossless compression, and language models are lossy compression, which is not a standard.

picture

▲ OpenAI Vice President Andrej Karpathy shared screenshots.

Image source: Web3 Sky City

The amazing thing is that a 140GB file can preserve human knowledge and intelligence. Most laptops can hold 140GB of files. When the computing power and video memory of a laptop are large enough, a large language model can be run by adding a 500-line C code program.

/ 02 / 

Open source ecosystem and traffic tax on large language models

▎Open research and open source ecosystem are important forces promoting the development of AI

picture

▲ The open source ecosystem promotes AI technology innovation. Image source: Coatue.com

Open research is the foundation for the development of AI technology. The world's top scientists and engineers publish a large number of papers on websites such as Arxiv to share their technical practices. Whether it is the early AlexNet convolutional neural network model, Google's Transformer that laid the foundation for the algorithm, or model practice papers published by companies such as OpenAI and Meta, they are major breakthroughs in scientific research and technology, leading the development of AI technology.

The development and iteration of the open source community deserves special attention. With the support of open source large language models, researchers and engineers can freely explore various new algorithms and training methods. Even closed-source large language models can learn from and draw lessons from the open-source community.

It can be said that the open source community has achieved a certain degree of technological equality, allowing people around the world to share the latest technological achievements in the field of AI.

▎The “traffic tax” of large language models

Returning to the essence of business, the cost of training large language models is very expensive. Take GPT as an example. According to statistics from the Yuanchuan Research Institute, more than 10 million US dollars were spent on training GPT-3 and more than 100 million US dollars on training GPT-4. The training cost of the next generation model may reach 1 billion US dollars. In addition, when running these models and providing external services, the consumption of computing power and energy is also very expensive.

The business model of large-scale language models is MaaS (Model As a Service). The billing method for outputting intelligence is to charge according to the input and output traffic (or token, word unit). Due to the expensive training and running costs of large language models, the traffic fees charged by them will most likely "rise as the tide goes up".

picture

picture

▲ Picture source: openai.com

Taking OpenAI as an example, the picture above is the traffic billing scheme of some models displayed on its official website. Someone has made a rough estimate. According to the median level of GPT-3.5 Turbo traffic called by AI applications, as long as one user uses the application every day (DAU), the APP company behind the user will need to pay OpenAI about 0.2 yuan in traffic fees. . By analogy, if an APP with tens of millions of daily active users is connected to the GPT port, it will have to pay OpenAI a traffic fee of 2 million yuan every day.

picture

▲ Picture source: WeChat public account @AI Empowerment Laboratory

The traffic fee quotation of domestic large models is shown in the figure above, which is basically the same as the price of OpenAI. Some small and medium-sized models will be cheaper, but there will be a gap in performance.

Traffic costs will affect how AI applications design business models. In order to reduce the burden of traffic costs, some startups will consider using the capabilities of the open source ecosystem to build their own small and medium-sized models to handle most user needs. If you encounter user needs that exceed the capabilities of small and medium-sized models, then call the large language model.

This type of small and medium-sized models may be deployed directly on the terminal side closest to the user, becoming a "device-side model". Device-side models test the level of hardware integration. In the future, our computers and mobile phones may integrate hardware chips such as GPUs more widely, with the ability to run small models on the terminal side. Google and Microsoft have launched small models that can run on the terminal side. Nano is the smallest of the large Gemini models released by Google. It is specially designed to run on mobile devices. It does not require an Internet connection and can run directly on the device locally and offline.

/ 03 / 

How has AI affected human society?

picture

▎Every technological revolution brings new efficiency tools

There have been several major technological revolutions in human history. The first industrial revolution that arose around 1760 produced mechanical equipment; the second industrial revolution after 1860 produced electronic equipment; after 1970, we experienced three more technologies: computer software, PC Internet and smartphones. Innovation, some people call it the third industrial revolution or the information revolution.

The generative AI revolution starting in 2023 may be called the fourth industrial revolution. We have created new intelligence. Generative AI is a new tool for humans to understand and transform the world, and has become a new layer of abstract tools.

According to historical experience, every technological revolution will greatly improve human production efficiency. After the first and second industrial revolutions, the natural world developed two layers of abstract tools, namely mechanical and electronic equipment. In the 1970s, the information technology revolution represented by computers introduced a new abstraction layer—software. Through software, people begin to understand, transform, and interact with the world in more efficient ways. Subsequently, the rise of PC Internet and smartphones further promoted the development of software technology.

How does AI affect people’s work?

In addition to paying attention to the efficiency improvements brought by AI, we also need to pay attention to how machines replace human work. According to statistics, before the first industrial revolution in the UK, the agricultural population accounted for about 75%, but after the industrial revolution it dropped to 16%. After the information revolution in the United States, the industrial population dropped from 38% to 8.5%. At that time, most of the industrial population transformed into a white-collar population. The white-collar population is the first to bear the brunt of this AI smart revolution.

With the advancement of AI technology, a series of changes may occur in organizational forms and collaboration methods in business society.

First, the company may develop into a smaller company. Business outsourcing may become very common. For example, companies can outsource R&D, marketing and other departments.

The second is the reconstruction of workflow, that is, the standard operating procedures (SOP) may change. Everyone has different abilities and energy, so workflow allows people to be more efficient and perform their duties. Researchers are exploring how people's workflows might adapt as AI potentially replaces certain functions. There are also areas where the current language model can improve efficiency and enhance capabilities. The language model may also need to use workflow orchestration for collaboration.

In addition to technical skills, improving other capabilities becomes critical. For example, only by improving your appreciation and taste can AI help you generate better plans or works. For another example, enhancing critical thinking can help you better judge and identify content generated by AI.

We must make more active use of AI, treat it as an auxiliary tool, or co-pilot, in work and life, and make full use of its potential and advantages.

AI capabilities have boundaries

With the rapid development of AI, many people have put forward the threat theory of AI and are worried about the negative impact of AI on human beings. Indeed, humans are currently inventing tools that appear to be smarter than themselves. How to control "silicon-based organisms" like AI is undoubtedly a huge challenge for humans. Scientists are trying to solve this problem, and OpenAI has published papers exploring similar issues.

However, we should not be so pessimistic. At least the current degree of digitalization of human society can limit the boundaries of AI's capabilities.

Today's large language models are mainly trained with large amounts of text data. The text is highly digitized and abstracted by humans, so the information density is high, so the AI ​​training effect is very good.

But without the text space, AI's intelligence will be subject to many limitations because it has not been trained with corresponding data. So we don’t need to worry too much for the time being, AI is not that powerful and comprehensive. We have enough time to get familiar with and adapt to it, and find ways to get along well with silicon-based creatures.

/ 04 / 

Outlook 2024,

How will large language models and AI applications develop?

▎Head -big language model camp

Globally, large-scale language models show significant regional development characteristics. For example, the development paths of the United States and China have their own characteristics. The big head language model camp in the United States has been basically established, mainly concentrated in a few large technology companies, or a combination of them and several head model startups. It can be said that the AI ​​field in the United States has entered a high-cost arms race stage, and it is difficult for new players to enter the game.

China's large-scale language models are blooming, and there are currently more than a hundred projects claiming to be developing large-scale models. China may rely more on the open source ecosystem to develop new language models.

Currently, no country other than the United States has developed a large language model comparable to GPT-4. In the field of large model technology, there is still a gap between China and the United States.

But the global competition in the field of AI is not yet over. For China, the most important thing is to vigorously develop the AI ​​application ecosystem. In the era of the Internet and digital economy, China is an outstanding student in the application field and has also exported relevant application practices overseas. On the premise of keeping up with the latest technology of large models, it may be a solution for us to make technological breakthroughs in the opposite direction after the application ecosystem flourishes.

▎How will large language models develop?

Although many technological breakthroughs have been achieved in the field of large language models, there are still many areas that can be iterated and improved, such as reducing "illusions", increasing context length, realizing multi-modality, embodied intelligence, performing complex reasoning, and self-iteration.

First, let's discuss the phenomenon of "hallucinations." Illusion can be understood as a false output, which Meta defines as "confident falsehood." The most common reason for hallucinations is that the knowledge or data collected by the language model is not dense enough. However, hallucinations can also be seen as a manifestation of creativity. Just like a poet can write beautiful poems after drinking, AI hallucinations may also bring us wonderful content.

There are many ways to reduce hallucinations, such as using higher-quality corpora for training; improving the accuracy and adaptability of the model through fine-tuning and reinforcement learning; adding more background information to the model's prompt words and allowing the model to be based on this information. Understand and respond to questions more accurately.

Second, increase the context length. The context length is equivalent to the brain capacity of the language model, which is usually 32K now, and the highest is 128K, which is less than 100,000 words or English words. If you want the language model to understand complex language texts and handle complex tasks, this length is far from enough. The next generation of models will most likely strive to expand the context length to improve the ability to handle complex tasks.

The third is multimodality. Humans mainly rely on vision to obtain information, while current language models mainly rely on text data for training. Visual data can help language models better understand the physical world. In 2023, visual data will be added to the model training process on a large scale. For example, GPT-4 introduces multi-modal data, and Google’s Gemini model is said to also use a large amount of image and video data. Judging from the performance of Gemini's demonstration video, its multi-modal interaction seems to have been significantly improved, but the improvement in intelligence such as complex reasoning has not yet been seen.

The fourth is embodied intelligence, which refers to an intelligent system that perceives and acts based on the physical body and can obtain information from the environment, understand problems, make decisions and act. This concept is not that complicated. All living things on the earth can be said to have embodied intelligence. Humanoid robots, for example, are also considered a form of embodied intelligence. Embodied intelligence is equivalent to extending movable "hands and feet" to AI.

The fifth is complex reasoning. Typically, GPT gives an answer in one go, without overt multi-step reasoning or back iteration. When humans think about complex problems, they will list some steps on paper and repeatedly deduce and calculate. Researchers have thought of some methods, such as using thinking models such as thinking trees, to try to make GPT learn complex multi-step reasoning.

Finally, there is self-iteration. The current language model mainly relies on people to design algorithms for it, provide computing power, and feed it data. Thinking about the future, can language models achieve self-iteration? This may rely on new model training and fine-tuning methods, such as reinforcement learning. It is said that OpenAI is trying a training method code-named "Q*" to study how to make AI iterate on its own, but the specific progress is not yet known.

Large models are still in a period of rapid development, and there is still a lot of room for improvement. In addition to the points listed above, there are still many areas to be solved and improved, such as interpretability, improved security, output content that is more in line with human values, etc.

▎Future application software——AI Agent

In September 2023, the official website of Sequoia Capital published the article "Generative AI's Act Two", mentioning that generative AI has entered the second stage. The first stage mainly focuses on the development of language models and surrounding simple applications, while the focus of the second stage turns to the development of new intelligent applications that truly solve customer needs.

Future application software may gradually turn to AI Agent—a type of intelligent software that can autonomously perform tasks, make independent decisions, proactively explore, self-iterate, and collaborate with each other. Existing legacy software may need to be adapted and improved accordingly. Compared with traditional version 1.0 software, AI Agent can provide a more realistic, high-quality one-to-one service experience.

However, the difficulty in developing AI Agent is that the language model is currently too immature and stable. If you want to create a good application experience, you need to add some small models, some rule algorithms, and even artificial services in some key links based on the language model, so as to output a stable experience in vertical scenarios or specific industries. .

Multi-Agent collaboration has become a popular research direction. On the basis of standard operating procedures, multiple AI Agents that cooperate with each other can produce better results than calling the language model alone. Here is a more intuitive explanation. Each Agent may have its own advantages, disadvantages and specializations. It is the same as the division of labor among humans. Everyone is grouped together to perform their respective duties and inspire each other through new standard operating procedures (SOPs). and supervisory collaboration.

/ 05 / 

Entrepreneurship and investment opportunities

▎In non-consensus areas, do the right thing rather than the easy thing

In a new era, as a startup company, you need to seriously think about what entrepreneurial opportunities there are for native new models based on this technological innovation. At the same time, we must also consider which opportunities are for new entrants and which are opportunities for existing industry leaders.

We can look back at how the two technological changes of PC, Internet and smartphones created new opportunities.

In the PC Internet era, the main capability provided is connection, that is, PCs, servers and some other devices around the world are connected to the Internet. The native new models produced in the PC era include: search, e-commerce and social communication, etc., giving birth to leading companies in various industries such as BAT.

In the era of smartphones, the main capability provided is that most people have a mobile phone with functions such as mobile Internet, GPS, and camera. This basic condition makes new models such as the sharing economy, instant messaging, short video sharing, and mobile financial payment possible. Industry leading companies in the previous era had a strong first-mover advantage and seized many new model opportunities. For example, Tencent and Alibaba launched WeChat and Alipay respectively. But we have also seen some new forces such as Meituan, Douyin and Didi achieve great success. Why can they do it?

I think the key to its success is to do the right thing rather than the easy thing in non-consensus areas.

Take Meituan and Douyin as examples. The original new model chosen by Meituan is called "food delivery", which belongs to the "O2O (online to offline)" part of the "sharing economy". There are a large number of catering stores on the left, many various consumers on the right, and in the middle It is a "heavy model" with thousands of riders, but early Internet companies prefer and are good at doing "light model", and it is "non-consensus" to enter the catering industry. The delivery service chain is too long, difficult to digitize, and difficult to carry out refined operations. But in the end Meituan got it done, and these difficult things became its biggest core advantages and competitive barriers.

Looking at Douyin again, the new native model it chose was called "short video sharing", which was part of the popular "creator economy" at the time. The biggest "anti-consensus" of Douyin is that it bridges the gap between the video creator economy and the trillions of e-commerce GMV, forming a large-scale and efficient transformation.

Before the rise of e-commerce live broadcasts, there were two types of live broadcasts, one called game live broadcasts and the other called Internet celebrity live broadcasts, and their monetization mainly relied on audience rewards. The economic volume of this type of monetization model is very small and cannot accommodate so many outstanding creators. However, Douyin has solved the huge closed business loop of converting content to e-commerce through various efforts such as recommendation algorithms, developing the creator ecology and merchant ecology, establishing a closed loop of Douyin stores, and optimizing content e-commerce conversion. Once this is done, Douyin can invite the most and best creators across the country to create content on the Douyin platform and reward them with huge e-commerce sales revenue.

Therefore, after TikTok, the overseas version of Douyin, went overseas, many local short video and live broadcast platforms could not beat it. Because Tiktok is not just a video content platform for creators on the left and consumers on the right, it is a combination of a new creator economy and massive e-commerce GMV conversion. It is a new species with compound competitive advantages.

In summary, startups must dare to choose and enter non-consensus areas, and strive to get things done in difficult circumstances.

▎Business direction and key points

picture

In terms of the direction of entrepreneurship, there are many giants in the field of large models, and there is a high probability that it will not be the first choice of entrepreneurs. There is a "middle layer" between large models and applications, which is mostly infrastructure, application frameworks, model services, etc. This part is easily squeezed by models and applications in both directions. In some fields, there are many giants, and there is not much space for entrepreneurship.

To sum up, we tend to believe that, combined with the current technology and business environment, we should vigorously develop the AI ​​application ecosystem.

The picture above shows the generative AI-related startups we have invested in, including: a new DevOps platform designed for language models, a social game platform, an intelligent companion service, AI-assisted RNA drug development, automated store marketing, and a global intelligent commercial video SaaS , a new online psychological consultation platform and a remote employment platform for Chinese and American engineers, etc.

We have summarized several key points for entrepreneurship in the field of AI applications:

First, we must make high-quality native new applications. It is not easy to seize the new capabilities provided by the AI ​​​​intelligent era, that is, the supply of intelligence and artistic creativity, and create a high-quality and unique native new application experience. We mentioned above that the intelligence of language models is not mature and stable enough, and there are obvious capabilities boundaries. Startup companies may need to choose relatively vertically segmented scenarios and use various technologies and operational methods to create a good experience.

Second, it is non-consensus, more forward-looking, and subversive. Non-consensus means not following the crowd when it comes to track selection, daring to enter difficult areas, and “doing the right thing rather than the easy thing”. Being more forward-looking means choosing challenging business and technical routes.

For example, using the more advanced technology architecture that is still developing, for example: entrepreneurs should prioritize making Agents instead of CoPilots. CoPilots are more like opportunities for industry leaders (think Microsoft and Github). For another example, entrepreneurial teams can consider conceiving and designing applications based on the capabilities of next-generation language models (such as GPT-5) in advance.

(This article is reproduced from Fengrui Capital, with slight modifications)

Guess you like

Origin blog.csdn.net/richerg85/article/details/135267842