CVPR's first large-scale model seminar was successfully held, attracting more than 1,000 teams to participate in the Wenxin Large-scale Model International Competition

065fece6d34e7118b744ee50cce4eb57.gif

As a world-class academic conference in the field of computer vision and pattern recognition, CVPR is not only an academic conference for scholars to display cutting-edge scientific and technological achievements, but also a platform for enterprises to explore cutting-edge applications. In recent years, with the explosive development of large-scale model technology, innovative applications based on large-scale model technology are gradually releasing huge value in the industry. As a leader and deep cultivator in the field of artificial intelligence technology, Baidu has strong technical advantages and profound technical accumulation in the field of large-scale model technology. The industrial-level knowledge-enhanced large-scale model system independently developed by Baidu has built a complete large-scale model. The system covers basic large models, task large models, industry large models, etc., fully meeting the needs of industrial applications. As one of the cores of Wenxin's large model, Wenxin·CV large model VIMER has been widely used in core businesses such as autonomous driving, cloud-intelligence integration, and mobile ecology.

41a6acf70b2a216de5d7c5b18acb2210.pngBaidu holds CVPR's first large-scale model seminar

Explore the status quo and future of large-scale model technology

In order to further promote the development of large-scale visual model technology, Baidu, together with Zhejiang University, Hong Kong University, and Aerospace Academy of Chinese Academy of Sciences, jointly held the first large-scale model Workshop at CVPR 2023. Top scholars and elites in the field of large-scale models jointly discussed large-scale model technology. The latest progress and future trends of , and included papers from enterprises/universities such as Baidu and Cambridge University.

874e9b90257699dcb94d2a5a3af539fb.png

At the CVPR 2023 Foundation Model Workshop held on June 19, Professor Xi Teng, a senior engineer from Baidu and a visiting researcher at the Chinese Academy of Sciences, and Zhang Gang, chief architect of Baidu, delivered opening and closing speeches respectively. Professor Hitten also focused on the Baidu Wenxin CV large model and Wenxin traffic large model.

685f63a7ed1d4e4f7c63ba44c458a951.png

Xie Lingxi, a senior researcher from Huawei, introduced the difference between the NLP large model and the CV large model, and proposed the future opportunities and challenges of the CV large model.

54faed91634ea876f1b413049362328d.png

1137a85c9aa2c51450a0013b32759033.png

Thousands of teams compete in the industrial-level large-scale model competition

Industry-Academia Jointly Exploring the Way of Technological Innovation

The holding of the first multi-task large-scale model international competition in the field of intelligent transportation is a highlight of this year's CVPR 2023 large-scale model seminar. Starting from the key issues in the field of Foundation Model, the competition set up a multi-task large-scale model track and a cross-modal large-scale model The track has attracted more than 1,500 participants from 35 countries and regions around the world, and has collected many solutions from companies such as Meituan, NetEase, and Dahua, as well as universities such as Tsinghua University, Hong Kong University of Science and Technology, Huazhong University of Science and Technology, and the Chinese Academy of Sciences.

In recent years, the development of industries such as smart cars and artificial intelligence has created good development opportunities for the development of smart transportation. Intelligent transportation-related technologies have penetrated into our daily life, but the multitasking mode of existing large models and traditional perception methods (such as classification, detection, segmentation, etc.) cannot meet our needs for wider traffic scenarios and higher autonomous driving. level chase. Starting from the key issues in the current actual technology research, Baidu has set up two tracks :

a4c6815fa8753b74dea4ac9e7014272e.png

Track 1: Unified multi-tasking large model track 

This track aims to solve the merge conflict problem of multi-task and multi-data. For well-designed network structures and loss functions, joint training of multiple tasks can greatly improve the generalization of the model. Due to the noise in the data of a specific task, only the data of a single task is used for training, and there is a risk of overfitting. The unified multi-task large model can average the noise of different tasks by integrating the data of multiple tasks for unified training, so that the model can learn better features. In order to further explore the upper limit of the capability of the unified multi-task large model, this track takes the typical tasks of traffic scenes as the topic, covering the three types of CV tasks of classification, detection, and segmentation into a single large model, so that the single large model has the ability at the same time Get performance ahead of specific single-task models. In the end, the weighted indicators of the Allin one large model on classification, detection, and segmentation tasks will be used as the award criteria.

5015645062ab6ca8a94cb75d21bec8f6.png

Track 2: Cross-modal image retrieval track 

This track aims to improve the accuracy of text image retrieval. In traffic scenes, high-performance image retrieval capabilities play a very important role in traffic law enforcement and public security governance. Traditional image retrieval methods usually use image attribute recognition first and then achieve retrieval capabilities by comparing with expected attributes. With the development of multimodal large model technology, the unification of text and image representation and modality conversion have been widely used. Using this capability can further improve the accuracy and flexibility of image retrieval.

At the seminar, Professor Hitten announced the winners of this year's Foundation Model International Competition. In the end, the CTRL team and the njust team won the championships of the multi-task large-scale model track and the cross-modal large-scale model track respectively, and carried out technical solutions on the spot share.

e27007cbc8a37f2147a40de60878ff1b.png

e7681c7374427a740eb7fa86108ad7aa.png

With the official conclusion of the CVPR 2023 large-scale model seminar, the application scenarios discussed in the conference are gradually coming to the industry. As a leader and deep cultivator in the field of artificial intelligence technology, Baidu will continue to export technical capabilities and solutions to various industry scenarios while cultivating the "internal strength" of AI technology, and further promote the upgrading and development of industrial intelligence.

 Wenxin · CV large model address

https://github.com/PaddlePaddle/VIMER

822b97f4a8ff54ceaffeb000878b06a5.gif

Follow 【Flying PaddlePaddlePaddle】public account

Get more technical content~

Guess you like

Origin blog.csdn.net/PaddlePaddle/article/details/131335814
Recommended