"Reshaping the R&D system of SenseTime" and "Mobilizing the entire company", I talked with Wang Xiaogang, Chief Scientist of SenseTime, about the AI model...

b7080efa4e48474e032480ddac77c7db.jpeg

Text丨Tan Jing

Original丨Dear Data

Today, the story of the big model must be written as much as possible.

The more this is the case, the more the point of view is like a crucian carp crossing the river, and it is difficult to distinguish good from bad.

However, the large-scale model competition is like cooking with a fire, and top scientists have no time to refute rumors.

Take a set of figures for my own reference only - the scale of talent for large-scale models across the country should not be a huge number - about 100 people.

Mr. Tan, I especially hope that the experts who are chasing and intercepting will make a large model in person.

On the second Friday of April 2023, I had the honor to chat with Professor Wang Xiaogang of SenseTime about the big model. As the co-founder and chief scientist of SenseTime, and the president of Jueying Intelligent Vehicle Business Group, Wang Xiaogang is also a professor of the Department of Electronic Engineering of the Chinese University of Hong Kong.

f5a8f4b9cd28c77793c88b50ff4cb7df.jpeg

Not much to say, let me focus on the whole chat.

First, understanding complex things requires a good metaphor.

The analogy Professor Wang Xiaogang gave me is this:

"Comparing the large model of general artificial intelligence to nuclear fusion. First, there are nuclear devices, followed by nuclear materials. Nuclear devices are our existing basic hardware systems. Nuclear materials are very rich data, rich tasks in various industries."

I asked Professor Wang: "SenseTime's years of talent and technology accumulation, what positive effect will it have on conquering the mountain of large models?"

Or the metaphor of nuclear fusion.

He said: "'Good raw material' is to go deep into various industries and accumulate a lot of know-how. The American company OpenAI can make ChatGPT, and there are many years of accumulation behind it, from the research and development of small models to large models. Much know-how."

Talking back to SenseTime, Professor Wang believes that SenseTime has very similar advantages. SenseTime has a lot of R&D personnel who can go deep into the front line, use models to solve practical problems, and have a good accumulation.

He believes that good raw materials can help the Shangtang model to succeed.

Regarding the second point, he emphasized: "A good top-level design is needed, and the raw materials are well designed into a system that can work towards the goal of general artificial intelligence. In addition to large devices and infrastructure, it is also necessary to be able to learn from algorithms, In terms of the framework, design the entire system well so that the R&D team can focus on the first aspect.”

In this way, the success of the big model will happen.

Confusion and anxiety about ChatGPT come not only from ordinary people, but also from technology practitioners and researchers. They tend to have good educational and research backgrounds.

On the morning when GPT-4 was released, Mei Lingrui, a first-year graduate student at the National University of Science and Technology in Beijing (he is also my reader), shared his anxiety with me frankly:

"Many indicators and fields of GPT-4 seckill may have been cultivated by countless scientific researchers for decades with other methods. After GPT-4 came out, these things seemed to have become a detour in technological development in an instant."

“On the morning of the GPT-4 release, the University seminar on Machine Translation turned directly into a GPT-4 seminar,” he said.

(It needs to be explained here that the "Machine Translation" course is a course about NLP intermediate tasks.)

Shock and bewilderment hit at the same time.

This scene will never be forgotten in my mind for a long time. Not being successful is not terrible, what is terrible is that the opponent succeeds.

In the chaos brought by GPT-4, it is difficult to stay clear.

I specially wrote down the following question for the advice of Professor Wang Xiaogang. It can also be said that this question is asked for the readers of "Dear Data".

GPT-4 comes out.

The developers said in unison: NLP intermediate tasks are dead.

When SAM came out, the developers said in unison: CV is dead.

(This sentence is mixed in Chinese and English, to the effect that the intermediate task of natural language understanding has died, and computer vision has died.)

What does it signify that one AI technology "dries up" other AI technologies?

The practitioners behind the "killed" technologies are high-level talents. What changes should they make in terms of concepts and actions?

How would you encourage the SenseTime R&D team to face such a "change" or "frustration"?

In short, the key words of Professor Wang Xiaogang's answer are "embrace a new research paradigm" and "change concepts".

He replied: "Thank you for your question. This matter is nothing new. History always repeats itself over and over again. Let's look back at the era when deep learning replaced traditional algorithms ten years ago. At that time, everyone generally had traditional technology. Accumulation. Deep learning subverts all traditions at once.

At the beginning, people didn't quite believe it, and everyone thought that deep learning might only be able to do a good job in speech. It quickly became apparent that the new technique was also feasible on computer vision classification problems. Later, basically everyone felt that the new technology could not do things, and later they could do it. "

"Disruption will continue to occur, and it will occur very quickly," he emphasized.

In his view, now there is a new opportunity. This is very good for the development of the industry as a whole.

He said: "We want to embrace a new research paradigm, and the key here is to change our concepts. Ten years ago, SenseTime had not yet been born, and the decision our founding team made at that time was - All in deep learning. We were in There is also a long accumulation in traditional visual technology, but when new technologies come, we will decisively embrace new technologies. The same is true today.”

He said: "Nowadays, what SenseTime R&D wants to do is how to make good use of these new technologies. The new paradigm it brings includes man-machine co-intelligence and creating new wisdom together. The core lies in how to make good use of our technology. This large model can form positive feedback, instead of blindly saying 'very afraid' and 'being subverted'."

For researchers, he points out, this is a very exciting thing to do.

Teacher Tan observed that without hesitation, fast consensus is a common feature of almost all large model participants. It is not only the pursuit of commercial returns, but also the pursuit of technical excellence.

The next question is about "emergence".

6558ba58e0a9151d6a6101c6d3560a5e.png

Recently, when Mr. Tan and I were chatting with many large-scale model managers, the first question I asked was often:

"Has your big model emerged yet?"

Hearing this question, everyone smiled. Gives the impression that only insiders understand the "inside joke (inside joke)".

Professor Wang Xiaogang gave an affirmative answer, and also popularized the technical term "emergence".

He said: "The emergence of this phenomenon means that the big model will continue to surprise you, and the big model can have new capabilities."

He said: "Through human-computer interaction or the design of the thinking chain, scientists will continue to unlock new capabilities of the large model. For example, it can give very high-quality reasoning steps to answer questions that have not been encountered before."

Simply understood, "a topic that has not been encountered before" is a kind of "unknown task".

Immediately afterwards, Professor Wang Xiaogang talked about the adjustment of SenseTime.

He said: "In the future, we will better reshape the Shangtang R&D system along the established direction and rhythm. Mobilize the entire Shangtang research team to form a joint force, and finally be able to do a good job of the general artificial intelligence model. .This is definitely not just about training a model with a very large number of parameters, it is a systematic project."

Wang Xiaogang publicly expressed his emphasis on large models. Through this dialogue, I believe that everyone has also read SenseTime's determination to make a good model.

"Reshaping the R&D system" and "mobilizing the entire company" are major events for any technology company, and often affect organizational strategy and culture, team management and business operations.

For a listed company, embracing the new paradigm requires more than just technical challenges.

I am very concerned about the technical development of multimodal large models. So, I asked about technical difficulties.

Professor Wang Xiaogang answered the question after emphasizing the question. "What is the difficulty involved here?"

He said: "Images are completely different from natural language. The granularity of expression and the ability to express are completely different. Combining the two technologies of image and natural language, the interface or task interface needs to be redesigned. Image is a description , natural language is another representation.”

He specifically emphasized the key points of innovation. He said: "How to design is something that I think needs a lot of energy and innovation."

It should be divided into two steps and mobilize different forces.

First, define the task itself.

After the definition work is completed, scholars will be very good at using various mathematical tools to solve the problem of the interface.

(I discussed the use of the word "interaction" with Professor Wang Xiaogang. I suggested whether the word "fusion" can be used. He thinks it is still interaction, because the road to integration may be long compared to interaction.)

For example, in the autonomous driving scenario, how to describe an automatic driving system in natural language. The current practice of computer vision technology is to use detection boxes and pixels to describe, which is completely different from the way people use language to understand.

Mr. Tan, I vaguely feel that many people do not pay attention to, do not understand the multi-modal large-scale model technology, and are even more unwilling to understand it because it is too complicated. But I think multimodal large model technology is becoming more and more important.

For multi-modal large-scale model technology, GPT-4 has already made achievements, and many large-scale model teams in China have already deployed. (Because many domestic large-scale models have not yet been released, I am inconvenient to say more.)

The last question I asked was: "Do you think multimodal large model technology has been underestimated before?"

Wang Xiaogang replied: "Yes, it must be."

He replied:

"The various abilities displayed by the large model of natural language and the emergence of new functions really make our eyes shine. People naturally think about how to make a better combination of natural language and images. After all, the information that people absorb 90% of the content is visual information. Language has opened up a very large space for our imagination. Later, how to combine with vision?

I think this is a whole new question to think about. It was also emphasized before that the two are very different, and the way of integration is completely different. In computational vision, various information sources may also be involved. Basically, a weighted average is performed and a similar fusion is performed. However, natural language technology has its processing uniqueness. "

He emphasized: "The so-called multimodality is not just about putting language and images together. How to make the two interact and help each other, I think this may be the key point we need to fully explore later. "

Technological change is ruthless and cruel. It conquers the hard-spoken, but not the hard-hearted.

(over)

5e110bcde7f64fb4719aa5c91b3ab301.jpeg

read more

AI framework series:

1. The group of people who engage in deep learning frameworks are either lunatics or liars (1)

2. The group of people who engage in AI frameworks 丨 Liaoyuanhuo, Jia Yangqing (2)

3. Those who engage in AI frameworks (3): the fanatical AlphaFold and the silent Chinese scientists

4. The group of people who engage in AI framework (4): the prequel of AI framework, the past of big data system

Note: (3) and (4) have not been published yet, and will meet you in the form of book publishing.

comic series

1.  Interpretation of the Silicon Valley Venture Capital A16Z "Top 50" data company list

2.  AI algorithm is a brother, isn't AI operation and maintenance a brother?

3.  How did the big data's social arrogance come about?

4.  AI for Science, is it "science or not"?

5.  If you want to help mathematicians, how old is AI? 

6.  The person who called Wang Xinling turned out to be the magical smart lake warehouse

7.  It turns out that the knowledge map is a cash cow for "finding relationships"?

8.  Why can graph computing be able to positively push the wool of the black industry?

9.  AutoML: Saving up money to buy a "Shan Xia Robot"?

10.  AutoML : Your favorite hot pot base is automatically purchased by robots

11. Reinforcement learning: Artificial intelligence plays chess, take a step, how many steps can you see?

12.  Time-series database: good risk, almost did not squeeze into the high-end industrial manufacturing

13.  Active learning: artificial intelligence was actually PUA?

14.  Cloud Computing Serverless: An arrow piercing the clouds, thousands of troops will meet each other

15.  Data center network : data arrives on the battlefield in 5 nanoseconds

16. Data Center Network "Volume" AI: It's not terrible to be late, but the terrible thing is that no one else is late

AI large model and ChatGPT series:

17. ChatGPT fire, how to set up an AIGC company, and then make money?

18.  ChatGPT: Never bully liberal arts students

19.  How does ChatGPT learn by analogy? 

20.  Exclusive丨From the resignation of the masters Alex Smola and Li Mu to the successful financing of AWS startups, look back at the evolution of the "underlying weapon" in the era of ChatGPT large models

21.  Exclusive 丨 Former Meituan co-founder Wang Huiwen is "acquiring" the domestic AI framework OneFlow, and wants to add a new general from light years away

22.  Is it only a fictional story that the ChatGPT large model is used for criminal investigation?

DPU chip series:

1.  Building a DPU chip, like a dream bubble? 丨 Fictional short stories

2.  Never invest in a DPU?

3.  How does Alibaba Cloud perform encryption calculations under the support of DPU?

4.  Oh CPU, don’t be tired, brother CIPU is helping you on the cloud

Long article series:

1.  I suspect that JD.com’s mysterious department Y has realized the truth about the smart supply chain

2. Supercomputers and artificial intelligence: supercomputers in big countries, unmanned pilots

57e06077c17bc617ea4b857f81fb77cd.jpeg

a57d48ec5233363ef79b04b061e9857e.png

Finally, let me introduce myself as the editor-in-chief.

I'm Tan Jing, author of science and technology and popular science topics.

To discover stories in the times,

I am chasing the gods of technology and blocking technology companies.

Occasionally write novels and draw comics.

Life is short, don't take shortcuts.

Originality is not easy, thank you for forwarding

If you still want to read my articles, just pay attention to "Dear Data".  

Guess you like

Origin blog.csdn.net/weixin_39640818/article/details/130050928