Thoroughly understand the application of Transformer algorithm in detection/segmentation/3D vision/automatic driving/visual large model

Follow and star

never get lost

Institute of Computer Vision

f9d49869f140501eef8a62a126fec787.gif

1f4623a498489e7e0f5cc2ae044eefb0.gif

Public IDComputer Vision Research Institute

Learning groupScan the QR code to get the joining method on the homepage

Computer Vision Research Institute column

Column of Computer Vision Institute

Since Transformer and BERT came out, they have begun to dominate the NLP field. Recently, Transformer-based multimodal large models and AIGC generation directions have also become research hotspots in the industry and academia. At the same time, large models are also widely used in data labeling and model distillation in autonomous driving.

c67e28b7e849bb9a125b1fe1ada41784.gif

Since Transformer and BETR came out, they have begun to dominate the NLP field. With the development of the past few years, Vision Transformer has gradually replaced the previous CNN structure on various visual benchmark datasets, and the overall architecture is simpler. Recently, Transformer-based multimodal large models and AIGC generation directions have also become research hotspots in the industry and academia. At the same time, large models are also widely used in data labeling and model distillation in autonomous driving.

733bc6f144f703edd96c55f6ba7a6fda.jpeg

​Autonomous driving is a high-safety application that requires a high-performance and highly reliable deep learning model, and Vision Transformer is an ideal choice. Now mainstream autonomous driving perception algorithms basically use Vision Transformer-related technologies, such as segmentation, 2D/3D detection, and recently popular large models (such as SAM). Vision Transformer is blooming everywhere in the field of autonomous driving. On the other hand, Vision Transformer is a compulsory question in the interview questions for autonomous driving or image processing-related algorithm positions. It requires a deep understanding of its theoretical knowledge and actual use of related technologies in projects.

fe74246bd79122b3152e8a88ead19dac.jpeg

777e41a02a3a215e1fd538a5867c2f38.jpeg

How to get started?

The design idea of ​​Visual Transformer is very different from the previous manual design of CNN. It is not very intuitive to understand, and it is not easy to learn by yourself. Many students often don't know how to start when they first learn. Most people don't know how to apply Vision Transformer to specific tasks, for example, how to design a specific model structure for perception tasks, and how to choose a more suitable Transformer according to their business needs. Models, how to apply large visual models to your own business or research tasks, etc.

5b5a5b012e1634a837981b373209e715.jpeg

​ To this end, after in-depth research on everyone's needs, we have selected the industry's mainstream Vision Transformer perception basis and application algorithms. Its content mainly includes ViT-based segmentation, detection, large models, and applications in autonomous driving perception. From 0 to 1, we will introduce network structure design, algorithm optimization, actual combat and other aspects in detail. The content is very detailed, sort out some general design principles and research progress in the whole field, and keep up with some new methods and research hotspots after 22 years.

2b133259933d36e9a5524bca4751d2e7.jpeg

Scan the code to learn the course


Closely combined with actual combat, it helps everyone better understand the implementation details of the algorithm, and systematically and comprehensively introduces the segmentation and detection model based on Transformer. This is the first complete visual Transformer theory and practical tutorial in China. It is especially suitable for beginners who are just getting started and students who need to engage in autonomous driving perception or image algorithms in business, and it is also very suitable for practitioners in other directions who want to systematically understand new directions.

The course outline is as follows:

5f3193a4a2f2ed5184de8c1501e2bdcb.jpeg

main lecturer

Mr. Tiger, Doctor of Top2, currently a researcher in the industry. Current main research directions: image and video detection and segmentation, multimodal scene understanding, object tracking, and multimodal large models. He has published nearly 30 related papers in top computer vision conferences (CVPR, ECCV, ICCV, NeurIPS, ICLR, etc.)/top journals (T-PAMI, IJCV, TIP), among which 15 papers were published as the first author. Familiar with the design and implementation of commonly used segmentation and detection algorithms. Guided more than 6 junior doctoral and master students to publish top conference papers.

This course is suitable for everyone

  1. Undergraduate/Master/PhD in the research direction of computer vision and autonomous driving perception;

  2. 2D/3D perception related algorithm engineers for autonomous driving;

  3. Those who want to transfer to the Transformer-based perception algorithm;

  4. Algorithm engineers and enterprise technical managers who need to be improved at work;

The basics required for this course

  1. Have a certain foundation of python and pyTorch, and be familiar with some basic algorithms commonly used in deep learning;

  2. Have a certain understanding of 2D perception, including basic solutions such as detection and segmentation;

  3. A certain foundation of linear algebra and matrix theory;

  4. The computer needs to have its own GPU, which can be accelerated by CUDA (at least 12GB of video memory);

After school harvest:

  1. Able to have a systematic and in-depth understanding of recent Transformer-based segmentation and detection models.

  2. Learn how to build your own Transformer perception system to solve some multimodal tasks.

  3. Be able to proficiently implement some Transformer segmentation and detection algorithms, and master its algorithm improvement and application in the automatic driving system, to achieve code-level understanding.

  4. After completing this course, you can carry out your own research work on Transformer segmentation and detection or design new methods in algorithm engineering.

  5. Meet multi-industry practitioners and learning partners, and achieve a deeper understanding in the exchange.

start time

The class will officially start on September 8, 2023, join us to learn the basics, the class will end 2 months after the start of the class, offline teaching, Q&A in the WeChat group (the communication environment is very good, a very important part)!

course consultation

ae5ce7f2416f6baa34579ddb867bfa62.jpeg

Scan the code to join learning!

6bcc4af6dcb1794ff50dc92c342f68b2.jpeg

Add a small assistant to consult and receive a course gift package!

(WeChat: AIDriver004)

Guess you like

Origin blog.csdn.net/gzq0723/article/details/132267605