"Unexpected" vol27. Review | Well-known visual SLAM expert Gao Xiang: Let's talk about the practical application of visual SLAM in the fields of autonomous driving and robotics

Following the best-selling book "Fourteen Lectures on Visual SLAM", Mr. Gao Xiang has launched a new book "SLAM Technology in Autonomous Driving and Robots". The book has attracted much attention since its publication, providing readers with a comprehensive and in-depth understanding of SLAM technology.

In the 27th "Yu Jian" closed-door sharing session, I am very happy to invite the well-known visual SLAM expert Gao Xianglai to share with you the application of laser and visual SLAM in autonomous driving and robots, as well as his thoughts on SLAM entrepreneurship. opinions, and conduct QA exchanges and interactions with online classmates.

During the open microphone discussion session of the event, everyone asked questions about SLAM one after another. I also recorded the essence of the Q&A text from teacher Gao Xiang, edited it slightly, and presented it to everyone~

If you want to learn more about visual SLAM-related work, you are also welcome to add thexiaojiang on WeChat, join the SLAM communication community, and interact with more partners in related fields!

Guest introduction

picture

Gao Xiang

Well-known visual SLAM expert

A native of Huzhou, Zhejiang, he has a PhD in the Department of Automation of Tsinghua University and a postdoctoral fellow at the Technical University of Munich. He has been engaged in computer vision, positioning and map construction algorithm research for a long time, and has successively served as a senior algorithm engineer and algorithm director in autonomous driving in companies such as Baidu, Zhixingzhe, and Mainline Technology. His main author and translation works include "Fourteen Lectures on Visual SLAM: From Theory to Practice", "State Estimation in Robotics", "SLAM Technology in Autonomous Driving and Robotics", and has published in international publications such as ICRA, IROS and RA-Letters. He has published many papers in well-known journals and conferences.

Homepage:

https://www.techbeat.net/grzytrkj?id=183

1. What are the technical barriers to SLAM in the fields of autonomous driving and robotics? What is the development and application situation at home and abroad?

Gao Xiang: The main barrier to SLAM (real-time localization and map construction) technology in the field of autonomous driving and robotics is that realizing a stable algorithm requires a large number of practical application cases and accumulation of experience. In academia, data sets are relatively small, while industry faces larger scale and more complex application scenarios. There is not much of a technological gap between the laboratory and the industry because the stability of the algorithm is closely related to the actual effect of the product. In the industrial world, the iteration time of a product and the number of practical application cases play a key role in the stability and performance of the algorithm.

In terms of development at home and abroad, SLAM technology differs in the research directions and methods of various laboratories, but they all focus on solving practical problems and improving product quality. In terms of application, SLAM technology has been practically used in products such as sweepers and sweepers. Companies that sell the most products tend to have more stable performance.

2. How to implement efficient SLAM and autonomous driving algorithms in real-time applications to meet real-time requirements and run on embedded systems with limited computing resources?

Gao Xiang: The main problem in realizing efficient SLAM and autonomous driving algorithms in real-time applications is the limitation of computing resources. Currently, most positioning algorithms run without problems on embedded systems, such as hardware provided by domestic companies such as Horizon and Black Sesame, as well as foreign companies such as Nvidia. Most domestic companies will make a layer of packaging on Nvidia's hardware and add their own products. From a positioning perspective, this is not a big problem. For mapping, the mainstream approach is still to run on the PC or server. If you want to implement more complex features, such as semantics, BEV, or real-time map generation, the overall project flow will be different depending on how complex the feature you want to implement is.

3. How to effectively fuse multiple sensor data, such as lidar, cameras, inertial measurement units (IMU), etc., to improve the accuracy and robustness of positioning and environmental perception?

Gao Xiang: To achieve sensor fusion and improve the accuracy and robustness of positioning and environmental perception, we first need to focus on robustness to cope with various abnormal situations. When designing the system framework, consider the occurrence of abnormal situations, such as configuring multiple sensors in the car as redundant backup. For different scenarios, specific requirements need to be clarified, such as the spacing and width of parking lot columns, etc. In the laboratory stage, it is difficult to foresee the complexity and richness of the field, so it is necessary to fully test in actual scenarios, solve various corner problems, and continuously optimize the algorithm to adapt to different situations.

In practical applications, it is normal most of the time, so filters, factor graphs and other methods can be used for data processing. When abnormal situations occur, redundant mechanisms are needed to compensate. For example, in addition to its own DR, a car will also be equipped with a VO video automatic or a radar odometer as a redundant backup.

4. Compared with the traditional sparse point cloud and depth map, the current implicit scene representation represented by the radiation field has the advantages of high resolution and 360-degree direct modeling. What are the challenges of combining these implicit representations with the SLAM framework?

Gao Xiang: Compared with the traditional sparse point cloud and depth map, the implicit scene representation represented by the radiation field does have the advantages of high resolution and 360-degree direct modeling. However, combining it with the SLAM framework faces some challenges: this direction is relatively new, it is still being discussed heatedly in the academic community, and there are many uncertainties. Industry is usually more conservative and only considers applications after academia has reached consensus on a problem.

Most of the algorithms currently used in the industry are algorithms that have been stabilized by the academic community a few years ago, and research on the combination of implicit representation and SLAM framework is still in its early stages. Therefore, the innovation at the method level needs to be improved. In addition, implicit scene representation still has a lot of uncertainty in terms of network structure. Now everyone is doing their own research and development, and the whole is in a relatively early stage. Many studies are combining existing methods rather than proposing brand new methods. At the level of each module, methodological innovation may not be enough. Of course, this also means that there are still many directions that can be tried and explored.

5. How to construct a large-scale data set suitable for SLAM and autonomous driving, and define evaluation criteria and indicators in order to compare and evaluate the performance of different algorithms?

Gao Xiang: Building large-scale data sets suitable for SLAM and autonomous driving requires many considerations. First, major companies may have their own large-scale data sets, but they may not be publicly available. The practice may be different in schools, where the data collected over a long period of time is relatively limited due to the limited number of vehicles. If it is carried out in a company, a similar learning method can be used, using a large number of vehicles and background databases to establish cloud servers to collect and store data, and use a specialized system for maintenance and testing.

In this process, establishing and maintaining infrastructure facilities, including databases, storage systems, etc., is key. In this regard, Internet companies have advantages. Companies such as Baidu have done quite well in infrastructure construction. For the field of autonomous driving, the size of the data set is very important. The data set in the academic circle is usually in the hundreds of gigabytes, while the industry needs larger-scale data sets, such as tens of terabytes, hundreds of terabytes.

When it comes to testing and storage, you need to consider how to test on multiple machines and how to collect and organize test results. This requires a very stable system. In summary, building large-scale data sets and defining evaluation criteria and indicators is a challenging process that requires technical support and investment from many aspects.

6. How to continue to use the previously accumulated point cloud map data on models that do not use lidar, improve efficiency and avoid re-development of new technical directions?

Gao Xiang: There are several solutions for how to continue to use the previously accumulated point cloud map data for models that do not use lidar in the field of autonomous driving. One is to make maps and locate by detecting features such as lane lines and walls in specific scenes such as garages. However, this method is very dependent on the stability and accuracy of the detection results. Another solution is to use vision techniques to reconstruct point clouds, but its generalization ability still needs to be verified. Current sensor technology is still in development. If you use binoculars to build a point cloud, its essence is similar to solid-state lidar, but the point cloud accuracy cannot be fixed like radar, but is related to the measured distance.

Regarding the idea of ​​using visual methods to create local maps and using Surf feature matching, I think the feasibility needs to consider the stability and accuracy of visual mapping. Visual mapping needs to be based on reliable three-dimensional data, and binocular vision may be affected by factors such as texture and color, leading to uncertainty in spatial position. This requires us to consider the consistency and stability of the visual data, as well as the degree of matching with the radar data.

There is a team in South Korea doing surround reconstruction and then point cloud mapping for indoor positioning. But I have only seen their demo so far and have not seen specific product applications. I believe that solid-state radar may become cheaper in the future, and outdoor TOF equipment may also become popular, which will provide more stable perception data for autonomous driving.

In general, the consistency and stability of the visually reconstructed point cloud map may vary as the scene and vehicle trajectory change. What we need to consider is how to ensure the stability and consistency of visual data in different scenes and motion states to achieve accuracy that matches radar data.

7. With the subsequent trend of mass-producing L2 autonomous driving with light maps or even no maps, what role can SLAM continue to play?

Gao Xiang: Regarding light maps or no maps, it does not completely cancel the map, but transforms the map built offline into a process of real-time online construction on the vehicle side. On the car side, more attention is paid to road-level mapping, that is, lanes and road extensions. Although the current mainstream approach can achieve this goal, the effect is highly uncertain and may not satisfy all situations. I am skeptical about L4 level autonomous driving, thinking that it may not be able to achieve high accuracy requirements.

Now everyone is doing BEV, but it may be saturated in a few years. To achieve L4 level autonomous driving, it relies heavily on high-definition maps, which is indeed a problem. I think BEV may not be able to achieve this level. If we want to pursue a low takeover rate and do it through BEV, I don’t think we can achieve similar L4 functions. This requires considering whether this is an L2 level function or an L4 level function.

For the parking function, if the car has parked at the edge of the parking space, the driver is basically right next to the car. At this time, there is no need to pursue the take-over rate. But if the car is to be parked in a parking space, this involves L4-level functions, because it is impossible for the driver to come back and park after failure. Achieving L4 level autonomous driving requires the use of traditional L4 methods, such as high-precision maps, radar maps, point cloud maps, etc. for building parking lots. Of course, ideally, it would be possible to explore in real time while walking and find out everything inside, but it is not yet possible to reliably implement this function. Therefore, I think we should still use the L4 routine, focusing more on lightweight maintenance, rapid generation and simplified generation and maintenance of high-definition images.

8. What do you think of the advantages of the new large-perception organizational structure that some autonomous driving companies have recently seen, that is, the organizational structure of perception and positioning mapping is placed in the same department?

Gao Xiang: The new large-scale perception organizational structure that has emerged in autonomous driving companies puts the perception and positioning mapping organizational structures in the same department. This approach has certain advantages. On the whole, it would be better to look at L2 and L4 separately now. If it is an L2 company structure, there is no problem with such placement. Currently, many companies are advocating removing high-precision maps or partially using high-precision maps so that vehicles can identify their location based on road markings or pavement markings. Putting perception and localization together makes the entire system more comprehensive.

However, the comprehensive behavior of a robot or vehicle does not necessarily have to follow the existing L4 architecture. Many L4 companies do not have perfect functions such as lane keeping, and the entire system relies too much on high-precision maps and high-precision positioning. How should vehicles respond in situations such as tunnels or mountainous areas where high-precision positioning cannot be fully guaranteed? This requires a higher level of comprehensive personnel to design the behavior of the vehicle.

At the same time, in practical applications, there is a contradiction between the usability and accuracy indicators of the system. If the accuracy index cannot meet the requirements, the vehicle will have to stop. Therefore, when designing the company structure, you need to consider how to resolve this contradiction. This requires a higher level of comprehensive personnel to design the behavior of the vehicle to adapt to different scenarios. Putting perception and positioning in the same department can help solve this problem and improve the performance and stability of autonomous driving systems.

9. Regarding the sensor configuration of the humanoid robot, should we choose binocular type to meet the needs of compatible perception and positioning, or should we choose lidar or depth sensor to ensure that it can work in extreme situations? How to make this decision when starting a business making humanoid robots?

Gao Xiang: Regarding the sensor configuration of humanoid robots, it is necessary to first clarify the specific functions and business goals of the robot, and then select the corresponding sensors based on these goals. For example, if a robot is specifically designed to grab objects, its sensor configuration will be clear. If a general-purpose humanoid robot is to be built, the selection of sensors will be more complicated, and multiple possibilities need to be considered.

In the design process, the role of product managers is very important. They need to understand the technology and cannot propose functional requirements just based on imagination, because these requirements may not be realized. The design of humanoid robots needs to consider specific business problems and functional goals, and then derive the required sensor configuration.

10. What do you think of Boston Dynamics’ operating methods and technical difficulties?

Gao Xiang: It has had a certain impact on the industry, but it has not found a good business model. Their technology development path relies on high-cost investment in the early years, which leads to commercialization difficulties now. For example, in the field of autonomous driving, companies like Boston Dynamics, and even Google and Baidu, their approach is to purchase the best equipment at the time to realize its various functions regardless of cost, making the autonomous driving effect better, but the cost is high. making it difficult for consumers to accept.

For the robotics industry, future development trends will be more complex. Motors will become more numerous, joints will become more complex, and information will become more versatile. In terms of autonomous walking, it is hoped that the robot can realize functions such as automatic grabbing and walking on complex roads. If you continue to engage in business such as cleaning, logistics or food delivery, the current form is relatively mature, and it may reduce sensor costs and increase market size in the future.

To realize a robot with autonomous walking and grasping capabilities requires a high level of technical content. The general direction is correct, but who takes bigger steps and who takes smaller steps will affect the speed and certainty of realization.

11. What are your views on the current situation and future development of the robotics industry?

Gao Xiang: Judging from the current situation, the robot industry is developing rapidly and is relatively large-scale. Compared with the previous two years, the technical content has improved, and the development of hard technology has made the entire industry larger, which is a good trend. However, compared with autonomous driving, the robotics industry pays more attention to costs and actual product output.

At present, most people are still researching and developing robot technology, but it is still in the pilot stage. Compared with the original unmanned taxi and other forms, there is still a big difference. The robotics industry is more practical and needs to consider factors such as what problem to solve, what cost to solve this problem, and how to sell the product.

It is a very hard truth that robots can replace people. If the cost can be made very low, they can indeed replace people to do some things. The robotics industry as a whole is worth looking at because it is a practical thing.


  About TechBeat Artificial Intelligence Community

TechBeat (www.techbeat.net) is affiliated with Jiangmen Venture Capital and is a growth community that gathers global Chinese AI elites.

We hope to create more professional services and experiences for AI talents, accelerate and accompany their learning and growth.

We look forward to this becoming a high ground for you to learn cutting-edge AI knowledge, a fertile ground for sharing your latest work, and a base for upgrading and fighting monsters on the road to AI advancement!

More detailed introduction >> TechBeat, a learning and growth community that gathers global Chinese AI elites

Guess you like

Origin blog.csdn.net/hanseywho/article/details/132496944