AIGC casts "physical magic", 3D vision breaks through the "precision limit"

Click to follow

Text|Yao Yue, edited|Wang Yisu

"No art, all physics! Physics makes you happy, doesn't it?"

Recently, at the world computer graphics conference SIGGRAPH 2023, Nvidia founder and CEO Huang Renxun announced that when he combined generative AI with the simulation platform Omniverse, he was as excited as he announced that "AIGC is the iPhone moment".

Unlike large language models that can only be applied to graphics and text, with a simulation platform based on physical laws, generative AI can be directly used in the real world.

In addition to Huang Renxun, Li Feifei's team at Stanford University in the United States has also recently integrated large models into robots, which not only enable robots to effectively interact with the environment, but also complete various tasks without additional data and training.

"The generative AI based on the simulation of the physical world is generative AI 2.0," said Jia Kui, the founder of Kuawei Intelligence and a professor at South China University of Technology, to Light Cone Intelligence. Combining with embodied intelligence, generative AI will play a more definite role. Sexuality.

With the enhancement of general capabilities, AI is also expected to break the "curse" of commercialization.

01 When generative AI learns physics

Combining generative AI with the physical world is not easy, and the technical chain involved is very long.

First of all, it is necessary to master the basic laws of the physical world in order to model the real world to the simulation platform.

The simulation platform can not only simulate physical scenes, but also simulate the interaction, movement and deformation of objects in the real world.

The addition of generative AI will allow the simulation platform to have the ability to "preview".

"Humans have known physical knowledge since childhood, but AI does not know it." Huang Renxun said, "The combination of generative AI and simulation platform is to make the future of AI take root in physics."

Huang Renxun further explained that let AI learn how to perceive the environment in the virtual world, and understand the impact and consequences of physical behavior through reinforcement learning, so that AI can achieve specific goals.

This requires the use of generative AI to predict tens of millions or even hundreds of millions of possibilities in the physical world to form valuable synthetic data.

For example, the robotic arm needs to use the "eyes" of 3D vision to accurately grasp, but how to eliminate the interference of environmental changes and recognize the objects to be grasped (such as parts in the factory)?

Through the simulation platform, we have mastered the physical laws such as "the reflection and refraction of light on the scene object", and the generative AI can predict and simulate a bottle with different degrees of reflection under different scene lighting; under the same lighting, metal, The state of the surface of objects made of different materials such as plastics and wood products; a pile of nails, all possible scattered states...

Again, all the data needs to be run on the simulation platform with AI.

This step is to train the 3D visual large model. Different from large language models, 3D visual large models are crucial for understanding and reasoning about the compositional characteristics of visual scenes, and need to deal with complex relationships between objects, positions, and changes in the real environment.

Finally, connect it with intelligent hardware such as a robotic arm, so that it can learn to operate intelligently.

It can be seen that the entire technical chain of the combination of generative AI and the physical world involves not only physics, graphics, computer vision, and multidisciplinary robotics, but also digital twins, geometric deep learning, kinematics calculations, hybrid intelligence, and intelligent Hardware and other multi-dimensional cutting-edge technologies.

Correspondingly, the chain of the entire industry is also relatively complex, from data to models, and then from models to deployment.

In these links, there is a node that is very different from the previous AI path, and that is "synthetic data generation".

Using data synthesized by generative AI based on physical laws to train large models will bring a leapfrog revolution to the physical industry.

02 Train a large 3D visual model without a real picture

Why not train large models directly on real data?

At present, most robotic arms based on 3D vision in the industry use real data for algorithm training of their control systems. Due to issues such as commercial privacy, these real data are difficult to obtain in general data, and are basically collected by enterprises themselves.

However, self-collected real data, first of all, is very cost-effective in terms of the two key indicators of operation, "efficiency and cost".

This is because the terminal application scenarios are fragmented, and the data cannot be used universally at all. To collect real data, enterprises need "carpet" collection of each industry, each factory, and each scene. Moreover, the collected data cannot be used directly, and a series of processing is required.

In this process, there is even an "artificial intelligence paradox".

"Collecting real data, more than half of the cost of AI technology is data cost, and the processing of data collection, cleaning, labeling, and enhancement is often the result of a large amount of manpower accumulation." Some analysts have pointed out, The essence of artificial intelligence is to replace artificial intelligence. "The irony is that such AI has obvious labor-intensive industrial characteristics."

What about synthetic data?

"Using the real data accumulated in five or six years and thousands of cases, it can be completed in a few days and a few weeks through synthetic data." Jia Kui told Lightcone Intelligence that compared with manual collection and labeling of data, the cost of synthetic data can be achieved. decrease by several orders of magnitude.

The most important thing is that in terms of training effect, synthetic data can be better than real data.

Since it is synthesized based on physical laws, the synthetic data is born with absolutely accurate annotations, which means that AI learning is very efficient.

In addition, the "comprehensiveness" of synthetic data is unmatched by real data. "Generative AI 2.0 can create countless worlds, and it can make this world evolve rapidly." Jia Kui said.

When it comes to the 3D vision industry, the robotic arm is like the "hand of God", which can control all the past and future.

"Of course, this cannot be outside the laws of the physical world." Jia Kui emphasized.

"At present, we can complete the 3D visual model training of the robotic arm for complex scene operations without using a real picture." Jia Kui told Lightcone Intelligence that the flexible operation of the robotic arm can be guided by the model trained entirely with synthetic data, which can realize on-site More than 99.9% stable grip.

It is precisely because of this that synthetic data is called the "data perpetual motion machine" of large models.

At present, in addition to the field of 3D vision, many fields are also trying to use synthetic data due to problems such as lack of general data and high noise. However, there are also strong doubts about synthetic data, saying that if it is not carefully debugged and used extensively during training, it will cause the model to crash and cause irreversible defects.

From the perspective of technological evolution, synthetic data will not be the only solution to large models.

However, Jia Kui pointed out, "Before finding a better way, synthetic data is the best way to solve practical problems at present. If real data piled up by humans is still used, in many fields including 3D vision, AGI (General Artificial Intelligence) intelligence) will never be possible.”

03 Breaking the "curse" of commercialization of AI

In the field of machine vision, the demand for synthetic data is stronger, and the value that generative AI 2.0 can unlock will be even greater.

As a very important perception method of machine vision, 3D vision has an urgent need for synthetic data.

"Find the difference" among a bunch of similar parts, and change the material and color of the object, and you need to adjust the parameters." A 3D vision practitioner said that the different needs of different fields make the landing scene too fragmented. Finish one project and then re-customize another project.

This means that it is difficult for enterprises to form standardized products by focusing on solving one or several project requirements. It is also impossible to enter and expand the market and pursue profit scale through rapid replication.

It is difficult to reduce the marginal cost, which will turn a technology company into a project company and eventually drag it down.

The "devil" is in the details.

How fragile is traditional 3D visual perception? Jia Kui described to Light Cone Intelligence, "During the grasping process of the robotic arm, if someone passes by and changes the light, the task may fail."

This is caused by the imaging principle of the hardware 3D camera. 3D camera imaging is easily affected by the environment, object shape, material, color, scattering medium, etc., and this problem is difficult to solve in a short time.

"It may take a hundred steps to solve a problem, but the effort in the last step may be the same as the sum of the previous 99 steps." Yang Fan, co-founder of SenseTime, once said that most of the energy of the enterprise needs to be used to deal with small parts Long tail problem.

But now, "generative AI 2.0 with strong versatility can solve the long-tail problem, which is crucial for product standardization." Jia Kui said.

Compared with the industry's traditional customized development model, based on the generative AI 2.0, enterprises can use the general large-scale model to realize product modular development, achieve out-of-the-box deployment, and then realize direct expansion in the same industry, and different industries can also Effective reuse. The commercialization problem of the 3D vision industry will be easily solved.

At the same time, the cost of data, development, deployment, hardware, and industry expansion has also fallen sharply in every link.

Under the catalysis of generative AI 2.0, once 3D vision explodes, it means that vertical scenes that rely heavily on 3D vision technology such as robotic arms, robots, unmanned vehicles, and metaverses will accelerate to eat AI. dividend.

A lot of data has confirmed this point. In fields such as data labeling, synthetic data, industrial robots, and machine vision, the global market size is growing at a high speed, especially the compound annual growth rate of synthetic data even exceeds 30%.

Behind this is actually the strategic value of generative AI 2.0, which has been highly valued by technology and many manufacturing giants.

From established manufacturing companies such as Siemens and Ford, to technology giants such as Nvidia, Tesla, Google, and star start-ups such as Waabi, they have all begun to work in many fields such as industry, robotics, unmanned driving, medical care, and retail. Explore the greater possibilities of generative AI 2.0.

At the same time, the enthusiasm of capital has also been greatly mobilized. According to incomplete statistics, in recent years, foreign synthetic data-related financing has accumulated close to 800 million US dollars.

In China, companies related to synthetic data have also attracted the attention of capital. In June 2022, Kuowei Smart announced the completion of the Pre-A round of financing, with a financing amount of tens of millions of yuan, and a cumulative financing of nearly 100 million yuan in less than a year since its establishment; in July this year, Guanglun Smart also announced the completion of the angel + round of financing. The cumulative amount of financing is tens of millions of yuan.

It can be said that from being able to compose poetry to studying physics, generative AI 2.0 is opening up a grand future of industrial digitization.

Welcome to pay attention to Light Cone Intelligence and get more cutting-edge knowledge of science and technology!

Guess you like

Origin blog.csdn.net/GZZN2019/article/details/132405125