What is interactive perception? A comprehensive review of social interaction dynamic models and decision-making frontiers in autonomous driving!

Click on the card below to follow the " Heart of Autonomous Driving " public account

ADAS giant volume of dry information is now available

>> Click to enter→ Heart of Autonomous Driving [Full Stack Algorithm] Technology Exchange Group

Editor | Heart of Autopilot

Written above & the author’s personal understanding

Interaction-aware autonomous driving (IAAD) is a rapidly growing research field focused on developing autonomous vehicles that can interact safely and efficiently with human road users. This is a challenging task as it requires autonomous vehicles to be able to understand and predict the behavior of human road users. In this literature review, the authors survey the current state of IAAD research. Beginning with a survey of terminology, the focus was on the challenges and existing models of simulating driver and pedestrian behavior. Next, a comprehensive review of various techniques for interaction modeling is provided, covering cognitive methods, machine learning methods, and game theory methods. Conclusions are drawn by discussing the potential benefits and risks associated with IAAD, as well as key issues important for future research exploration.

Introduction to interactive perception

In recent years, with recent advances in robotics and machine learning, there has been increasing interest in the development of autonomous vehicle technology. This enables autonomous driving engineers to develop algorithms that can address the complexities of autonomous driving tasks. Autonomous vehicles have the potential to improve traffic quality, reduce traffic accidents, and improve the quality of travel time. Today, more and more autonomous vehicles are being deployed into the real world, sharing the environment with other human road users. This has raised concerns that self-driving cars may not be able to understand and interact smoothly with other human road users, potentially leading to traffic woes and safety issues. To operate in an efficient and safe manner, autonomous vehicles need to behave in a human-like manner and generate optimal behavior that takes into account interactions with other human road users. This is essential to reduce potential traffic conflicts. For example, careful but unnecessary stopping at an intersection can result in a rear-end collision. To develop fully autonomous vehicles, advances are needed in many aspects of autonomous vehicle technology, including perception, decision-making, planning, and control. Interaction with surrounding human road users is becoming increasingly important in predicting the behavior of surrounding human road users and making decisions for autonomous vehicles accordingly, as the behavior of autonomous vehicles will influence their behavior and vice versa. Of course.

The purpose of this paper is to conduct a detailed survey of the state-of-the-art in interactive perceptual motion planning and decision-making in the context of autonomous driving. Specifically, the text first covers models of human road user behavior to highlight the factors that influence human road user decision-making on the road. There’s a reason why models of driver and pedestrian behavior are so important for self-driving cars. First, they can be used to assess and predict road user behavior surrounding autonomous vehicles. Second, they can help develop human-like autonomous vehicle behavior. Therefore, they both have predictive value and add relevant insights to model/system design.

This review is divided into 5 main parts, covering different areas in interactive perception autonomous driving. Section 2 introduces the terminology used in interaction-aware autonomous driving. Please refer to Figure 1 below for an overview of the paper structure. Section 3 will cover human factors research that affects human driving decisions, as well as pedestrian behavior research. Section 4 provides a broad overview and classification of existing techniques for interaction modeling. Finally, Sections 5 and 6 cover state-of-the-art techniques for motion planning and decision-making in interactive scenarios.

Although autonomous driving has been an active research area in recent years, most research has focused on scenarios involving only vehicles. Relatively few works deal with heterogeneous scenes that include both vehicles and pedestrians. In this paper, the focus is on heterogeneous scenes, but Sections 5 and 6 will also cover related work on handling scenes without pedestrians. This is because the techniques used in these papers can be easily adapted to mixed traffic scenarios, or they can provide important insights into the general problem of dealing with mixed traffic scenarios.

2451d5f75db0e9ff53c879075cba42b7.png

Interaction-aware autonomous driving terminology

Before discussing recent advances in interactive sensorimotor planning and decision-making, this article first defines some terms used in the field. In the field of autonomous driving, the term "ego-vehicle" refers to the specific vehicle to be controlled and studied. All other vehicles, cyclists, pedestrians, etc. occupying the area surrounding the self-vehicle are considered interactive obstacles and are called surrounding traffic participants, see Figure 2a below. As road traffic is unlikely to become fully automated in the near future, autonomous vehicles will inevitably operate in a mixed environment with human road users (HRUs), such as human drivers and pedestrians. Therefore, interaction-aware autonomous driving is a research area focused on developing autonomous vehicles that can safely and effectively interact with surrounding HRUs. Traditional autonomous driving methods usually treat surrounding HRUs as dynamic obstacles. However, this is not a realistic approach as they constantly change their behavior to adapt to the current situation.

Often, multiple surrounding HRUs may create conflicts over shared space between themselves or with the own vehicle: a situation in which it is reasonable to infer that two or more road users intend to be in the same area of ​​space in the near future occupy the same position, see Figure 2b. Road users involved in a conflict are thought to exhibit interactive behaviour, meaning that they would behave differently if there was no space-sharing conflict. Furthermore, interactions do not necessarily involve conflict. It can be explicit or implicit communication that indicates road user intentions and affects HRUs. For example, the driver can develop a driving strategy based on the turn signal of the vehicle ahead so that the vehicle and the vehicle ahead are not in the same lane and will not conflict in the near future. Interactive behavior therefore refers to the different ways in which road users behave to adapt to the behavior of others or to solicit responses and take action to achieve their desired goals. Since interactions occur at any time while driving, autonomous vehicle algorithms developed must understand the dynamics of interactions between road users. Such algorithms are called interactive perception and are often the focus of recent autonomous driving research. Currently, safe and socially acceptable interactive perception autonomous driving systems are constrained by several challenges. One challenge is the lack of innovative theories on how HRUs interact. This is a difficult task because the theory to be developed is not limited to predicting and modeling the behavior of HRUs, but also includes exploring behavioral patterns and their underlying mechanisms. Seamlessly integrating autonomous vehicles into traffic, just like humans, will require more advanced behavioral theories and models. Another challenge is the need to develop algorithms that can interact safely and efficiently with other HRUs and produce autonomous vehicle behavior consistent with human standards. Figure 3 below shows the main parts that make up an autonomous vehicle system. The raw data from the sensors are processed by the perception module, which detects the surrounding environment and performs localization, allowing the generation of a global route plan for the self-vehicle to reach the target destination. Scenarios can also be further interpreted and predictions of surrounding traffic participants can be acted upon. Interactive perception models play an important role in prediction tasks because road users influence each other's trajectories and decisions.

Decision-making and path planning are one of the two most important tasks in autonomous driving. They are responsible for determining how the vehicle moves through the environment. Decision-making is the process of choosing an action from a set of possible options. For example, a vehicle may need to decide whether to change lanes, slow down, or stop. Path planning is the process of generating safe and feasible trajectories that a vehicle can follow. Decision-making and path planning are closely related. The decision-making process typically outputs a high-level plan, such as "change lane left." The path planning process then takes this plan and generates a detailed trajectory that the vehicle can follow. Both tasks must take into account the current location of the vehicle, its capabilities and the surrounding traffic, which is why interaction perception models are very relevant for both tasks. From a control system perspective, the dynamics of a vehicle are represented by its states, i.e. position and orientation, and their time derivatives. The state of the environment is determined by the state of all dynamic and static entities. The physical state space can also be augmented by capturing additional latent space variables that capture the intentions or behavioral preferences of surrounding users as part of a scene understanding system.

36c601b2529a69c046260057ce16a628.png 21d76a259a286043e6021caf7805aa34.png

Human Behavior Research and Interaction

This section synthesizes the results of empirical and modeling studies on the behavior of HRUs (Human Road Users), including human drivers and pedestrians interacting with autonomous vehicles or conventional vehicles, especially from a communication perspective. The focus is on research involving on-road interactions, with the aim of discovering insights that may facilitate the development of interaction-aware autonomous vehicles. Research beyond the scope of this paper here also includes the impact of macroscopic traffic conditions, such as routing, weather, or regulations.

cea629892ff7c3a01e0b8d60be95394b.png0bd3eccc45a210be67997359d245e481.png

driver behavior research

Driver behavior models are used to predict and understand how drivers will behave in different driving scenarios. These models can be used to improve the safety and efficiency of transportation systems and aid in the design process of autonomous vehicles. Many factors may influence driving behavior, including individual characteristics (age, gender, personality, experience), environmental factors, namely road and weather conditions, and social factors, including driver interactions with HRUs. The focus here is on DBM related to vehicle-pedestrian interaction.

The most common driver behavior models include:

  • Driver Risk Field Model: (Figure 4a below) This model predicts how drivers perceive risk in different driving situations. The basic idea of ​​the DRF model is that drivers make decisions based on their perception of risk. The research results of [16] show that driving behavior is controlled by a cost function that takes into account the impact of noise on human perception and behavior. Risk perception on autonomous vehicles was also analyzed in [19], which used driving simulation scenarios.

  • Theory-based: (Figure 4b below) Model of perception and cognition. Models based on perceptual information describe driver behavior based on perceptual cues (such as distance, vehicle speed, acceleration, expansion angle, reaction time, etc.). The cognitive model outlines the driver's internal state flow as a psychological human being and the motivations that regulate his or her behavior.

  • Data-driven models: (Figure 4c below) This group of methods relies on analyzing natural driving data using machine learning to analyze driver behavior. Data-driven models can learn generative or discriminative models of human behavior to make predictions about a driver’s future decisions or preferred driving style. Model validation can be accomplished by comparing predictions to actual data and through human-in-the-loop simulation.

87a4676edbc30256f7d0797c7cf91ad9.png

Existing research highlights driver behavior in the presence of pedestrians through naturalistic driving data analysis. The authors of [24] found that drivers tend to maintain smaller minimum lateral gaps and lower overtaking speeds when overtaking pedestrians walking along the lane, overtaking pedestrians walking in the opposite direction, or when oncoming traffic is present. The minimum lateral gap and time-to-collision relationship are only weakly related to overtaking speed. Results in [25] showed that vehicle deceleration behavior is related to initial time to collision (TTC), subjective judgment of pedestrian crossing intention, vehicle speed, pedestrian position and crossing direction.

Less attention has been paid to multi-road user settings where multiple vehicles and pedestrians interact. In [26], the authors developed a multi-road user adversarial reinforcement learning (IRL) framework based on data collected at intersections to simulate driver and pedestrian behavior at intersections. Overall, DBM is a promising research area that promises to significantly improve the safety and efficiency of transportation systems. However, considerable work is still needed in developing and validating these models. Future research should focus on developing more comprehensive models that take into account a wider range of factors such as the driver's internal state, environment and interactions with other HRUs.

Pedestrian behavior research

Since pedestrians are considered the most vulnerable road users, lacking protective equipment and moving at slower speeds, investigating pedestrian behavior and autonomous vehicle-pedestrian interactions has clear relevance for safety and acceptability. Pedestrian behavior has been the subject of extensive research for decades. The emergence of autonomous vehicles has recently given rise to many new research questions about pedestrian behavior. Given the large amount of work in this area and the authors' goals, this section surveys the main studies rather than providing an exhaustive survey. The review covers research on pedestrian behavior interacting with vehicles from three perspectives: communication, theories and models of crossing behavior, and applications involving autonomous vehicles. The aim is to identify and summarize their value for the development of autonomous vehicles with interactive awareness capabilities.

communication

In a dynamic traffic environment, road users intentionally or unintentionally signal information to each other through their movements and spatial cues, resulting in explicit and implicit communication. The findings agree that the kinematics and signaling information of autonomous vehicles have a significant impact on pedestrian road behavior due to the lack of a driver role. Therefore, research on identifying key action cues and signals that influence pedestrian road behavior is of great significance (see Figure 5a below).

eadf8feb2924d02a40658fbc0f41f81c.png

Implicit communication signals , such as a vehicle's motion cues, involve road user behavior that affects its own motion, but can be interpreted as cues to another road user's intentions or motion. The distance between approaching vehicles and pedestrians, or TTC, is the most critical implicit information that affects pedestrian behavior. Evidence shows that pedestrians tend to rely more on distance than on the TTC. That is, for the same TTC, when vehicles approach at higher speeds, pedestrians will cross the road more often. Recent research shows that pedestrians utilize multiple sources of information from vehicle motion rather than relying on one. The effects of speed, distance and TTC on pedestrian behavior are coupled to each other.

Braking action is another key implicit information that affects pedestrian behavior. The movement of the vehicle is related to pedestrians' trust in the vehicle, their emotions, and its impact on pedestrians' decision-making. When approaching vehicles slow down early and brake lightly, pedestrians feel comfortable and begin to cross the street quickly. Sudden braking leads to avoidance behavior by pedestrians. On the other hand, early braking action and strong pitching reduce the time required for pedestrians to understand the vehicle's intentions. Vehicles approaching pedestrians at a slower speed and yielding may hinder understanding.

Traffic characteristics, such as traffic volume and gap size, provide implicit information to pedestrians. High traffic volumes force pedestrians to accept smaller traffic gaps because of the increased time cost, increasing their propensity to take risks. However, there is substantial evidence that pedestrians who tend to wait are more cautious and less likely to accept risky gaps. The relationship between traffic volume and pedestrian crossing behavior is context dependent and may be affected by the size and sequence of gaps in traffic.

Additionally, pedestrian movement toward the road, curbside standing, and pedestrian head direction may convey critical implicit information to approaching vehicles. Pedestrians often assert their right of way by stepping onto the road or looking toward approaching vehicles.

Unambiguous communication signals involve the behavior of road users conveying signaling information to other road users without affecting their own movement or perception. A common scenario is for vehicles to communicate information to pedestrians via an external human-machine interface (eHMI). In the context of autonomous vehicles, where there are no human drivers, eHMI becomes important. Substantial evidence supports the benefits of eHMI in pedestrian interactions with autonomous vehicles. Various types of eHMI prototypes have been proposed, such as headlights, light strips, anthropomorphic symbols, but consensus on the best eHMI form and the message to be conveyed remains elusive.

Many studies have shown that the performance of eHMI depends on various factors. Pedestrians’ familiarity, trust, and interpretation of the eHMI may significantly affect the effectiveness of the eHMI in conveying information to pedestrians. For example, pedestrians better understand traditional eHMIs (flashing headlights) as a signal to give way to vehicles, rather than novel eHMIs (light strips). If the eHMI fails, pedestrians’ excessive trust in the eHMI may cause them to rely too much on vehicle motion cues, which is dangerous. Self-centered messages conveyed by eHMI, such as “OK TO CROSS”, are more persuasive than messages assigned to others, such as “STOPPING”. In addition, eHMI reliability is affected by weather, lighting conditions and vehicle behavior. For example, in inclement weather, pedestrians may not be able to read vehicle markings. When vehicles do not yield or slow down sharply, pedestrians' willingness to cross the road is not affected by eHMI. Other concepts, such as mounting eHMI on road infrastructure rather than vehicles, and using eHMI with vehicle motion cues, may outperform pure eHMI.

Additionally, although less commonly from a vehicle perspective, pedestrians also use clear signals to communicate with self-driving cars. These signals include eye contact and hand gestures, which pedestrians use to ensure that self-driving cars can see them and request the right of way. To make up for the lack of a human driver, self-driving cars can make use of human-like visual avatars in the driver’s seat and wireless communication technology to enhance vehicle-pedestrian communication.

Theories and models of crossing behavior

Pedestrian crossing behavior involves various cognitive processes. Previous research has shown that structuring pedestrian crossing behavior in interactions involves three levels of processes, namely perception, decision-making, initiation, and movement. Based on this assumption, the following sections will synthesize theories and models of pedestrian crossing behavior regarding these three cognitive processes (see Figure 5b).

The theory of visual perception , laid down by Gibson, explains that as an object approaches an observer, its image on the retina expands, forming the basis of human collision perception. In a crossing scene, when the vehicle's image expansion rate on the retina reaches a certain threshold, pedestrians will perceive that the vehicle is approaching, which is called the visual approximation phenomenon. A psychophysical model simplifies this expansion rate to the change in visual angle formed by the vehicle approaching the pedestrian's pupil, denoted ̇θ (Fig. 6a). Recent research shows that pedestrians use ̇θ as a key visual cue for observing approaching vehicles. However, although ̇θ provides spatial information, it does not convey when the vehicle reaches the pedestrian's location. In a crossing scenario, when a vehicle yields, pedestrians need time information to estimate whether the vehicle can stop in time. Lee's mathematical demonstration showed that a visual cue τ expressed as the ratio of θ to ̇θ may indicate the TTC of an approaching vehicle. Furthermore, the first time derivative of τ, denoted ̇τ, is used to detect whether the current deceleration rate is sufficient to avoid a collision. In addition, the study found that pedestrians may visually perceive an impending collision event at a given angle, that is, the azimuth angle, which is the angle between the vehicle and the pedestrian's gaze line (Figure 6b).

e96521d18b3e3e2f8d50367a1411aa1d.png

In addition to visual cues, pedestrian perception may depend on perceptual strategies. The study by Tian et al. suggests that pedestrian estimation of vehicle behavior may be a separate process or a sub-process of crossing decision-making. When there is a large traffic gap, pedestrians tend not to rely on vehicle driving behavior but more on the gap size. Likewise, Delucia points out that humans tend to use 'heuristic' visual cues such as θ and ̇θ when collision events are far away. However, as collisions become imminent, optical invariants such as τ dominate perception, providing richer spatiotemporal information.

In addition to the sensing mechanism, various factors may affect pedestrian perception. Research shows that older or child pedestrians face a higher risk of crashes due to age-related perceived limitations. Older pedestrians tended to rely more on distance rather than TTC to judge approaching vehicles, while children had difficulty detecting vehicles approaching at higher speeds. Distractions, especially those involving visual and manual components (such as using smartphones), divert considerable attentional resources and interfere with pedestrians' ability to observe traffic conditions. In contrast, cognitive distraction, such as listening to music, may not significantly affect pedestrian perception.

Decision Making At uncontrolled intersections without signal lights, pedestrians often interact with vehicles that may or may not yield. Without yielding, pedestrians often make crossing decisions by evaluating the separation between approaching vehicles, which is known as gap acceptance behavior (GA). This concept led to the development of critical gap models, including Raff's model, HCM2010's model, and Rasouli's model. In addition, the binary logit model treats the traversal decision as a binary variable and utilizes machine learning algorithms such as artificial neural network (ANN), support vector machine (SVM), and logistic regression (LR). For example, Kadali et al. used ANN to predict crossing decisions based on various independent variables (Fig. 6c), while Sun et al. used LR with variables such as pedestrian age, gender, group size, and vehicle type.

In scenarios involving yielding vehicles, crossing decisions tend to follow a bimodal pattern called bimodal crossing behavior (BC). Pedestrians are more likely to cross when the traffic gap is large enough or when the vehicle is about to stop. However, making decisions in this situation can be challenging because of the opposing relationship between decision cues and collision risk, which is negatively correlated with traffic clearance and positively correlated with vehicle speed. Zhu et al. classified crossing decisions into three groups based on vehicle speed and distance: crossing, dilemma conditions, and waiting (Fig. 6d). Furthermore, Tian et al. hypothesized that pedestrians adopt different decision-making strategies based on BC behavior and modeled crossing decisions as responses to different visual cues.

While the above methods simulate crossing decisions based on observed behavioral patterns, other models delve into the psychological mechanisms underpinning these decisions. Specifically, Tian et al. simulated pedestrians' GA behavior based on their visual cues and extended it in yield scenarios with more complex visual perception mechanisms. Wang et al. used a reinforcement learning (RL) model to capture pedestrian crossing behavior based on limited sensing mechanisms. Furthermore, one class of models, namely evidence accumulation (EA) models, such as the drift-diffusion model, proposes that crossing decisions are determined by the accumulation of visual evidence and noise, and once a certain threshold is reached, the decision is finalized. Integrating large-scale psychological theory, pedestrian crossing decisions are explained in detail (Fig. 6e). Additionally, game theory is applied to simulate crossing decisions when pedestrians and vehicles negotiate the right of way. Traditional game theory, the Sequential Chicken (SC) game and the Double Accumulator (DA) game are used to characterize the dynamic traversal decision.

The diversity of the environment and the heterogeneity of pedestrians further complicate crossing decision modeling. For example, crossing multiple lanes often involves pedestrians waiting at lane lines and accepting traffic gaps one by one, known as rolling gap behavior. Pedestrians waiting at the lane line may be more likely to accept a smaller traffic gap, whereas pedestrians waiting at the curb may be less likely to accept it. Another complex scenario is crossing a two-way road, which is both physically and cognitively challenging. Pedestrians need to consider vehicles on both sides. Likewise, crossing congested continuous traffic at intersections is challenging as pedestrians need to anticipate crossing gaps upstream of traffic and make trade-offs between safety and time efficiency. It is generally believed that as waiting times increase, pedestrians tend to accept more risky crossing opportunities. However, the latest evidence suggests that waiting-inclined pedestrians are more cautious and less likely to accept risky gaps. Regarding pedestrian heterogeneity, ANN and LR models are applied to characterize the influence of age on crossing decisions. Distractions, such as cell phone use, may also affect pedestrians' crossing decisions. Apply ANN to simulate the impact of mobile phone usage on crossing decisions. In addition, pedestrians often cross the road in groups, exhibiting group behavior. The behavior is described as the tendency of group members to maintain a certain distance from the center of the group. The EA model is used to characterize information cascades in group decision-making, taking into account the impact of previous road users' decisions.

Initiation and Movement Crossing Initiation Time (CIT) represents the time required for pedestrians to initiate crossing, reflecting the dynamic nature of their decision-making. Generally speaking, CIT is the duration between when a crossing opportunity becomes available and when the pedestrian begins to move. The drift-diffusion theory believes that CIT is affected by the accumulation of noise evidence in the cognitive system, reflecting the efficiency of pedestrian cognitive and motor systems. Various factors may affect CIT, including vehicle motion, age, gender, and distraction. Faced with higher vehicle speeds, pedestrians tend to initiate crossings more slowly. Additionally, female pedestrians tend to start crossing faster than men, and older people tend to start crossing earlier than younger pedestrians. The impact of distraction depends on its components.

In situations where a pedestrian is faced with a vehicle that does not yield, the risk of collision increases as the distance between the vehicle and the pedestrian decreases. Therefore, pedestrians often make quick decisions by evaluating "snapshots" of approaching vehicles. In these cases, the distribution of CIT is usually centered and right-skewed. Response time models, such as the exponential Gaussian model and the shifted Wald (SW) distribution, are used to simulate CIT under these conditions. For example, CIT is modeled as a variable following a SW distribution (Fig. 7a below).

In the vehicle yielding scene, as mentioned in the previous section, CIT exhibits a bimodal distribution. For the early CIT group, the distribution is similar to that in the no-yield scenario because pedestrians adopt similar decision-making strategies. However, for the late group, the distribution is complex and cannot be described by a standard response time distribution. EA models with time-varying evidence have been proposed to address this complexity, allowing the generation of CIT distributions with complex shapes (Fig. 7b below). Furthermore, the CIT in the vehicle yield scenario is modeled using the joint distribution of the response time model. Furthermore, an RL model is applied to learn the crossing initiation pattern of pedestrians.

After pedestrians initiate a crossing, they need to cross the road. Walking is a key part of crossing behavior and is influenced by many factors such as the presence of nearby vehicles, infrastructure, pedestrian age and distraction. Pedestrians adjust their walking trajectories to avoid vehicles. In multi-lane crossings, they tend to move to and wait on the lane lines, accepting traffic gaps in each lane in turn. When crossing, pedestrians often walk faster than normal walking speeds in other scenarios. Although gender did not have a significant effect on walking speed, adolescents and older adults walked slower. Distractions, such as cell phone use, may slow pedestrians down.

Behavior can be simulated using microscopic pedestrian motion models, including cellular automata (CA) models, social force (SF) models, and learning-based methods. CA models are discrete in space, time and state, making them ideal for simulating complex dynamic systems such as pedestrian-vehicle interactions. An SF model based on Newton's second law is used to simulate pedestrian-vehicle interactions and large-scale pedestrian flows (Fig. 7c below). The SF model was used to simulate the crossing behavior of pedestrian groups in complex interaction scenarios involving low-speed vehicles.

In contrast to the above-mentioned white-box models, there are also black-box models based on learning methods, which learn pedestrian walking behavior from natural data sets or predefined environments. For example, artificial neural networks (ANN) are employed to learn the walking behavior of pedestrians by taking into account the relative spatial and motion relationships between pedestrians and other objects extracted from videos. The output of the SF model is fed into the ANN as input to simulate a variety of pedestrian walking behaviors. A long short-term memory network (LSTM) pedestrian trajectory prediction model is proposed (Figure 7d below). In addition, RL and IRL models are also used to simulate pedestrian walking behavior. Apply RL model to learn the walking behavior of multiple pedestrians in SF environment. An IRL model is developed to learn pedestrian walking behavior from a video dataset.

9363d04d77d09171d10d1ca77c8cddfb.png
Applications involved in self-driving cars

In recent years, interest in studying the interaction between autonomous vehicles and pedestrians has gradually grown. This interest has led to a large number of studies applying theories and models of pedestrian crossing behavior to enhance or evaluate the performance of autonomous vehicles in these interactions (Table 2 below).

6e24193df23591104fb79d52aba69c5a.png

A common approach is to use learning-based methods that learn pedestrian intentions and trajectories from real-world datasets to aid autonomous vehicle decision-making. For example, a graph convolutional neural network-based pedestrian trajectory prediction model is proposed that takes into account past pedestrian trajectories to predict deterministic and probabilistic future trajectories for autonomous vehicle use cases. Other similar models aim to improve prediction accuracy by taking into account the social context of interactions. For example, an LSTM pedestrian trajectory prediction model is proposed that considers past trajectories, pedestrian head direction, and distance to approaching vehicles as inputs. Additionally, there are studies aimed at predicting pedestrian crossing intentions. SVM, LSTM and ANN are applied respectively to predict pedestrians' crossing intentions.

Learning methods have proven effective in predicting pedestrian trajectories and intentions. However, these models require large amounts of data to achieve strong performance and are limited in handling interaction cases that lack sufficient data. Additionally, the black-box nature of these models can make it difficult to interpret the generated trajectories and intentions, which poses a challenge to modeling decision-making in autonomous vehicles. To solve these problems, expert models have been developed. For example, the SF model has been modified to predict pedestrian trajectories of autonomous vehicles by incorporating more interaction details such as TTC and interaction angle between vehicles and pedestrians. In addition, the SF and CA models are also embedded into the autonomous vehicle decision-making module to represent pedestrian crossing behavior and guide the autonomous vehicle’s decision-making in interactions with pedestrians.

In addition, the traversal decision model has also been applied to autonomous vehicle research. For example, the crossing critical gap model is adopted to characterize pedestrian crossing decisions in its autonomous vehicle decision module. Apply its speed-distance model to design defensive and competitive interaction behaviors for autonomous vehicles. The LR model is used as a pedestrian crossing decision model in the self-driving car decision-making module proposed by him. To enhance the dynamic and interactive nature of crossing decisions, a game theory model is also used to simulate crossing decisions when negotiating the right of way with an autonomous vehicle. Researchers are also trying to use pedestrian perception theory or models to design decision-making strategies for autonomous vehicles. For example, autonomous vehicle-pedestrian coupling behavior was simulated using control theory based on visual cues, τ, and azimuth angles. Modeling yield behavior of autonomous vehicles and pedestrians using azimuth angles.

Interactive modeling

Interactive modeling techniques are critical for a variety of autonomous driving tasks, from traffic prediction to autonomous driving planning and decision-making. Understanding and modeling social interactions in autonomous driving is critical to predict scene dynamics and ensure safe autonomous driving behavior. Accurate predictions improve safety, while misunderstood self-driving behavior can lead to accidents. In addition, understanding the social impact of self-driving behavior can also influence surrounding traffic, such as by stopping early to encourage pedestrians to cross the street. Since interaction modeling techniques can be applied in different task domains, the authors focus on dividing them into different interaction modeling techniques regardless of the specific driving task for which they are designed.

First, a distinction can be made between learning methods and model-based methods. Extensive research has been conducted in the field of autonomous driving, utilizing machine learning and deep learning techniques. In learning methods, models are learned from large data sets. This family of methods does not require any prior knowledge of the system. Data-driven methods are trained on example data sets and then used to make predictions or decisions. In contrast, model-based approaches start with a theoretical understanding of the system. This prior knowledge is used to create a mathematical model of the system. Empirical data are then used to validate the model or adjust its parameters to minimize the difference between the model predictions and the data.

Another distinction is based on whether the approach explicitly exploits cognitive features of the human mind to explain human behavior, or only implicitly simulates interactions in an attempt to map environmental inputs to decisions/behaviors. The human behavior studies presented in Section 3 can serve as a guide for developing explicit methods. For example, game theory approaches take a more explicit approach, viewing traffic participants as rational road user quotients who actively consider each other's actions. On the other hand, as an example of a non-cognitive approach, the social forces approach provides a more empirical perspective, capturing the influence of participants on each other's behavior without explicitly detailing the processes that explain road users' reasoning during interactions. The authors propose to differentiate existing modeling approaches based on whether they simulate interactions explicitly or implicitly.

Based on these two criteria, the authors identified four major interaction modeling categories, which are shown in Figure 8 below.

6de88118eff6dafd503723f91db09ef7.png

Learning-based implicit methods

These methods rely on machine learning or deep learning techniques. Interactions are modeled implicitly, which means that road user behavior cannot be explained by the model. The model only learns input-output mappings from the data. Model learning can be achieved by leveraging interactive model architectures. Generally speaking, deep learning methods that use neural network architectures specialized for interaction fall into this category.

In this type of approach, the goal is to learn a probabilistic generative model that predicts the future behavior of a road user. The model is a probability distribution conditional on the state of the environment x, which includes the states of surrounding road users, and a set of learnable parameters θ.

aadaca1b392fd4c4a9839d458eec433b.png

Learning-based methods with cognitive features

These methods rely on explicitly handcrafted interaction features that are used as input to the learning system. This type of interaction features can include time intervals (TTC), relative distances, etc., reflecting certain cognitive processes behind human reasoning. For example, in , an LSTM exploiting inter-vehicle interactions was developed for classifying the lane-changing intentions of surrounding vehicles. The interaction features consist of a risk matrix that takes into account the worst-case TTC and relative distance of vehicles in the surrounding lanes. Graph convolutional networks also fall into this category because interactive features can be explicitly modeled in the adjacency matrix of the graph.

In this type of approach, the goal is to learn a probabilistic generative model that predicts the future behavior of road users similar to that in 1. In this case, the probability distribution can be conditional on the environment state x and explicitly handcrafted interaction features I(x).

370dad2600fe59bb11d1326229ac72ec.png

model-based non-cognitive approach

Modeling in these approaches is non-cognitive in that the interaction does not actively reason about the cognitive processes underlying road user behavior. This group of methods includes social forces and potential fields. Interactions are described by latent functions (or SFs), which contain a set of learnable parameters that can be tuned based on empirical data. Another group of methods includes driving risk field-based methods, which are based on the assumption that driver behavior is caused by risk-based fields. The advantage of model-based implicit methods is that they can be easily interpreted and can embed domain knowledge, such as traffic rules and scene context. Some models define a latent field and define road user actions as proportional to the gradient of this field.

2aa28fd39ddbd655d1a8ea9dd387dd78.png

Otherwise, the force can be modeled directly, eliminating the need for the gradient operation a F*(*x).

model-based cognitive approach

Model-based cognitive approaches describe the reasoning processes behind human decision-making. Two main categories of methods can be distinguished: utility maximization models and cognitive models.

In utility maximization approaches, humans are modeled as optimizers, choosing their actions to maximize their future utility.

80abd1c685d353e7a79a6319f24bea9e.png

These methods include game theory and Markov decision processes (MDP). In a game theory approach, road users are modeled as players competing or cooperating with each other, thus taking into account how they react to each other. The game theory framework provides a transparent and unambiguous solution for modeling dynamic interactions between human drivers, allowing a clear explanation of the decision-making process. However, since the computational complexity of this method is difficult to handle when the number of road users increases, it is difficult to meet the computational tractability requirements. Another possible solution is to model human behavior as road users in MDPs, which provides an excellent framework for modeling decisions where outcomes are affected by chance and decision-maker decisions. Solutions to MDPs can be found through learning methods, such as DRL algorithms or Monte Carlo tree search, or using dynamic programming techniques.

The second group of methods aims to capture the behavioral motivations behind road user behavior using psychological cognitive processes. This set of methods can include:

  • Stimulus-response models, in which the behavior of a driver or pedestrian depends on visual stimuli on the retina;

  • evidence accumulation, in which decisions are described as the result of accumulated evidence;

  • Theory of mind, which suggests that humans use their understanding of the thoughts and actions of others to make decisions. By predicting the actions of others and inferring their knowledge, humans can drive efficiently and safely.

9364cee8d35bdbf78b3a8b6100809b63.png

In the following sections, interaction modeling for each category will be analyzed in more detail. In particular, cognitive and non-cognitive learning methods are discussed in the next section. Model-based approaches to cognition have been discussed in detail in previous chapters and include social forces and potential fields, driving risk field models, theory of mind, stimulus-response models, and evidence accumulation models. Later chapters will cover utility model methods, including MDP and game theory.

learning based approach

Machine learning (ML) methods are widely used in various tasks of autonomous driving, including target detection, scene understanding, path planning and control. By learning from large amounts of data, ML methods can learn to make more accurate and efficient decisions than humans. This section will cover the implicit and explicit learning-based methods identified in the previous section and present related papers in more detail. Figure 9 below shows an overview of some learning-based methods.

99b8225117db27782e21d4d1539249cb.png

Thanks to recent advances in neural network learning representations, it is now possible to use an end-to-end driving approach that takes raw sensor data as input and outputs control commands such as steering and throttle to solve path planning and control problems. However, learning the entire driving task from high-dimensional raw perception data (e.g., LiDAR point clouds, camera images) is challenging because it involves learning perception and decision making simultaneously. In most works, the process of learning how to act assumes that scene representations are available to movement planning and decision-making modules. In practice, this requires splitting end-to-end driving into two main modules, one in which the self-driving car learns to see, and another in which it learns to act.

There are two main approaches to end-to-end autonomous driving planning and control tasks (learning how to act):

  • Imitation learning: Road users learn to imitate the behavior of experts.

  • Deep Reinforcement Learning (DRL): Road users try to learn how to act in a trial-and-error process in a simulated environment. The DRL method will be analyzed in more detail in later chapters.

Imitation learning is a machine learning paradigm in which road users perform tasks by imitating the behavior of expert demonstrators, making it a valuable method for training autonomous systems and robots. In [151], interaction features are learned through graph attention network (GAT). Inputs to the network include kinematic information about surrounding road users and feature vectors encoding scene representations from a bird's-eye view. The model is trained on synthetic data generated by professional drivers in the CARLA simulator. Imitation learning methods often perform well in scenarios similar to the training scenario, but often fail when the scenario deviates from the training distribution. Algorithms like Dataset Aggregation (DAgger) can improve the performance of imitation learning strategies by adding human-labeled data for unseen situations. However, asking experts to label new training samples can be expensive and infeasible.

Deep neural networks have been widely used in the context of scene understanding and motion prediction. [127] et al. proposed a social pooling operation in their neural network architecture to consider surrounding neighbors in crowd motion prediction. Similarly, a star topology network with max pooling operation is used to consider interactive features in multi-agent prediction. CIDNN uses LSTM to track the movement of each pedestrian in the crowd and assign weights to each pedestrian’s motion features based on their proximity to the target pedestrian for location prediction. The study of [129] created a dataset and proposed a framework named VP-LSTM to predict the trajectories of vehicles and pedestrians in crowded mixed scenarios by leveraging different LSTM architectures for heterogeneous road users. Generative adversarial networks (GANs) were applied in [130] to generate reasonable predictions for any road user in the scene. The common feature of these methods is the use of recurrent neural networks, combined with pooling operations, to capture spatiotemporal interaction features. During social pooling operation, the hidden states of surrounding road users become features used to predict the current road user's motion. Diffusion models are another group of deep learning techniques that are gaining popularity in modeling spatiotemporal trajectories and can be used to predict pedestrian and vehicle trajectories.

Graph convolutional networks (GCNs) have been widely used in trajectory prediction tasks with interacting road users. In these methods, the road structure is represented as a graph, where each node represents a traffic actor. Each node can carry information such as the category of the traffic participant (car, truck, pedestrian, etc.), its location or speed. Explicit interactions can be modeled in the adjacency matrix of the graph, while the implicit part consists of graph convolutional layers. GCNs are widely used in traffic prediction and more recently in combination with DRL in motion planning.

Other machine learning techniques that can be used to model interactions include Gaussian processes and probabilistic graphical models, including hidden Markov models.

utility-based approach

Utility-based road users use utility functions to guide decision making, assigning values ​​to possible world states and selecting actions that result in the highest utility. Unlike goal-based road users, which evaluate states based on goal satisfaction, utility-based road users can handle multiple goals and consider probabilities and costs of actions. Utility-based methods include Markov decision processes (MDP) and game theory models.

Markov decision process

MDP is a mathematical framework for modeling decision problems where the outcome is partly random and partly controlled by the decision maker. The modeling framework of MDP is shown in Figure 10 below. There are two main approaches to solving MDP problems: dynamic programming and reinforcement learning. Generally, the latter are more suitable for autonomous driving since they are better suited to high-dimensional state spaces.

497c98591095bc3989d99953170c5736.png
reinforcement learning

Reinforcement learning (RL) utilizes Markov decision processes (MDP) to model complex environments and includes a set of algorithms to learn policies that maximize expected rewards. Traditionally, dynamic programming has been a reliable way to achieve this goal, by iteratively calculating the value of each state, starting from the terminal state and working backward to the initial state. This method performs well when the state space is small. However, this can lead to a computational burden when facing RL challenges with large state spaces, such as in the field of autonomous driving. More commonly, RL enhanced with deep neural networks (DRL) is used. DRL algorithms may be better than dynamic programming algorithms in terms of sample efficiency and scalability, but they may also be more complex and difficult to train. For more detailed research on the application of DRL to autonomous driving, please refer to.

DRL solutions in autonomous driving will be classified according to the scenarios used, state space representation, action space and algorithms used. The status representation commonly used in DRL is shown in Figure 11 below:

  • Vector-based representation: In this representation type, information about surrounding vehicles, such as position and velocity, is contained in fixed-length vectors;

  • Bird's Eye View (BEV): A 2D image representation of the surrounding environment of the vehicle from a top perspective;

  • Occupancy grid representation: Similar to the BEV image, it is a 2D discrete representation of the environment surrounding the self-vehicle. It is a 2D or 3D grid of cells, with each cell assigned a probability of being occupied by an obstacle, as well as segmentation information about the entity type occupying the cell.

  • Graph representation: This is a way of representing the state of the environment around an autonomous vehicle as a graph. The nodes in the graph represent objects in the environment, such as vehicles, pedestrians, and traffic lights. Edges in the graph represent relationships between objects, such as distance or likelihood of potential collisions. Graph representations are compact and efficient and are a promising method for representing environmental states.

b51493f8d0b1286318bc7a4ac0e3560e.png

Vector-based representations work by representing objects in a compact and efficient manner, but at the expense of traffic information by limiting them to a fixed dimensional subset of surrounding vehicles. BEV images and occupancy rasters provide a simple way to represent the environment in a fixed way that can be easily updated. However, they may be inaccurate in environments with high levels of confusion or uncertainty. Graph representations can easily represent relationships between road users in a compact way. On the other hand, updating the graph can become complex and computationally expensive as the number of surrounding road users increases.

The action space can be continuous or discrete. Continuous actions usually include the longitudinal acceleration and steering angle of the own vehicle. Discrete actions often depend on the specific task being solved. For example, in a lane change scenario, discrete actions include changing lanes left, staying on the current road, or changing lanes right. Lower-level controllers regulate the vehicle's steering and acceleration to perform this action.

Although most DRL papers focus on vehicle-only traffic scenarios, the number of papers dealing with mixed traffic scenarios or vehicle-pedestrian interactions is more limited. Some research deals with crowd navigation of mobile robots. In [174], DRL is used to navigate a robot in a multi-agent environment. In [175], the model in [174] was improved by using attention-based neural networks and social pooling. In [176], an automatic braking system was developed using DQN road users. The authors implemented a trauma memory, similar to Prioritized Experience Replay (PER), for sampling from crash scenes. In [178], a DQN road user was trained to avoid collisions with crossing pedestrians and was further used to develop an ADAS system to assist drivers in pedestrian collision avoidance scenarios. Deshpande et al. used a four-layer grid state representation. In a similar scenario, the authors of [180] developed a SAC road user using continuous actions. By integrating SVO components in the reward function, vehicles can be trained to have different socially consistent behaviors, ranging from prosocial to more aggressive behaviors.

Deploying deep reinforcement learning (DRL) in real-world scenarios faces significant challenges and is an open research area. Some studies directly implement DRL strategies in practical applications without additional fine-tuning, demonstrating their effectiveness in scenarios such as unsignalized intersections. Transfer learning, a subfield of deep learning, is currently exploring the transfer of knowledge from simulation environments to the real world. Two main techniques include domain adaptation and domain randomization. In domain randomization, the method aims to have a training data set large enough to cover the real world as a specific situation. With domain adaptation, the goal is to learn a model from the source distribution that performs well on the target distribution.

Another issue related to DRL is that learning-based strategies have high training costs and are difficult to achieve semantic interpretation. Recently, some researchers have focused on interpretable learning algorithms and lifelong learning algorithms to address the above shortcomings.

Multi-agent reinforcement learning

When multiple RL road users are deployed into the real world and interact with each other, the problem becomes multi-agent reinforcement learning (MARL). To deal with multi-agent systems, several approaches are possible. The first approach is to use a centralized controller to manage the entire fleet. By increasing the state dimension to include all vehicles, with joint action vectors, the problem can again become a single-agent problem. The disadvantage is that the dimensionality of the state and action space increases, which may make learning more complex. Recently, graphical representations have been used to overcome the curse of dimensionality of problems. Another approach, inspired by Level-k game theory, is to use a single DRL learner but replace some surrounding road users with their previous copies. This technique is similar to self-play used in competitive DRL scenarios. The final approach is to adopt a MARL approach to problem formulation, where multiple learners work in parallel. In [187] proposed a multi-agent deep deterministic policy gradient (MADDPG) method that learns a separate centralized critic for each road user so that each road user can have different rewards function. See for details, which conducted an extensive survey of MARL. In autonomous driving, other applications of MARL can be found in .

Partially Observable Markov Decision Process

Partially Observable Markov Decision Processes (POMDPs) are generalizations of MDPs. An MDP is considered partially observable if the process state s cannot be directly observed by the decision maker. POMDP is computationally expensive but provides a general framework that can model a variety of real-life decision-making processes. Due to hardware improvements, POMDP applications in autonomous driving are becoming more and more popular. In [190], POMDP has been used to navigate mobile robots in crowds. The robot maintains beliefs about possible future goals of pedestrians. POMDP is also used for automotive decision making in the presence of pedestrians. In POMDP, road users around the self-vehicle are modeled as part of the environment and their intentions are modeled using belief vectors. In [189], the authors developed a multi-agent interaction-aware decision-making strategy where the problem was modeled as a POMDP and an attention-based neural network mechanism was used to model the interaction. POMDP has also been used to solve decision-making problems under ambient occlusion at intersections. For other applications of POMDP in interactive decision-making, see [193][194]. Traditional control methods typically deal with sensor uncertainty and planning sequentially, where a state estimator handles sensor noise and uncertainty and then uses a deterministic strategy to determine actions based on the estimated state. On the other hand, POMDP does not do such separation, and the policy is determined based on the belief state. Surrounding road users can be modeled explicitly as decision makers (MARL) or as the environment in which individual road users operate (RL or DRL).

8e5c8bccbbd4a5b1555177e52845e07a.png

game theory model

Game theory is a mathematical model that studies strategic interactions between rational road users. Game theory is mainly used in economics, but it also appears in autonomous driving. In particular, for autonomous driving, dynamic non-cooperative game theory is very important. A game is dynamic if it involves multiple decisions and the order of decisions matters; it is non-cooperative if each player pursues his own interests that partially conflict with the interests of others. Dynamic non-cooperative game theory includes both discrete-time and continuous-time games, and it provides a natural extension to optimal control of multi-agent environments.

Game theory studies equilibrium solutions under the assumption of optimal players, with several concepts applicable to trajectory games. Dynamic games are divided into open-loop and feedback games. Based on available information, open-loop assumes that each player can only obtain information about the initial state of the game. For feedback games, each road user has information about the current state of the game. Although the second type of game more accurately describes an autonomous driving setup, open-loop solutions are often preferred for their simplicity. Common equilibria in autonomous driving include open-loop Nash, open-loop Stackelberg, closed-loop Nash, and closed-loop Stackelberg equilibrium. For more details on this topic, see [197].

When a road user's dynamics must conform to a set of constraints, such as those to avoid a collision, equilibrium is known as generalized equilibrium. Numerical solutions to generalized equilibrium problems are studied in [220]. The disadvantage of the open-loop Nash equilibrium formulation is that players cannot directly infer how their behavior affects the behavior of surrounding road users. A first simplification in this regard is the open-loop Stackelberg equilibrium, applied for example in [203] in the context of autonomous drone racing. In a Stackelberg contest, the leader acts first and is followed in turn by subsequent players, allowing those with higher priorities to consider how those with lower priorities will plan their actions. In [207], the authors proposed a sequential dual-matrix game method for autonomous racing based on the open-loop Stackelberg game. Other applications of Stackelberg's formula can also be found. A recipe for solving the generalized feedback Nash equilibrium problem can be found in [223]. Sadigh et al. model autonomous vehicle-human interaction as a partially observable stochastic game in a Stackelberg competition. Humans estimate the autonomous car's plans and act accordingly, while the autonomous car optimizes its own actions, assuming indirect control over the human's actions.

Generally, game theoretic approaches face the following problems: (1) the computational complexity increases exponentially with the number of road users and the time perspective, (2) they assume utility functions that explain the behavior of other road users on the self-vehicle is known, and road users act rationally according to these reward functions - however it is known in game theory financial problems that humans often do not act rationally; (3) the behavior of road users may be random, and solving mixed or behavioral The calculation of the strategy becomes trickier. Naturally, game theory also has the great advantage of capturing behavioral interdependencies and exact solutions to some problems. Many papers in the field of game theory autonomous driving attempt to alleviate these problems by further simplifying the problem or finding approximate solutions. Now, we will look at some papers in this area and analyze their simplifying assumptions.

c00f00c2f450c9ebac4a5f991b7fa304.png

Level-k theory breaks the logic of Nash equilibrium rational expectations and assumes that people believe that others are less complex than themselves. This is Level-k inference, where the iterative process stops after k steps. Other road users are modeled as Level-k-1 participants. Level-k road users assume that all other road users are Level-(k-1) and make predictions based on this assumption and react accordingly. In [219], Level-k reasoning is applied to roundabout scenarios. This approach was also incorporated into a RL framework in [206]: the authors restricted the problem to two interacting road users and solved a Markov game with two vehicles using a DQN-based RL approach. In [218], Level-k reasoning is adopted to resolve conflicts at intersections. The authors show that conflicts can be easily resolved in situations where the self-vehicle is a Level-k road user and all surrounding vehicles are Level-k-1 or lower. However, when both road users are of the same level, the number of collisions increases, indicating the need for further improvements to handle scenarios with the same type of road users, which is crucial in the case of multiple autonomous vehicles .

To keep the computational complexity under control, the number of road users can be reduced by determining a subset of all road users that interact with the self-vehicle. The temporal perspective can also be restricted by taking into account remote perspective controllers or implying hierarchical game planning. The latter includes a combination of tactical planners with a short perspective and strategic planners with a long perspective. The first is responsible for accurately simulating the dynamics of the problem, and the second is responsible for determining the strategy using approximate dynamics.

Iterative linear quadratic (LQ) methods are increasingly common in robotics and control. The authors of [201] formulated the problem as a general and difference game with nonlinear system dynamics. In [202] they extended their method to systems with feedback linearized dynamics. Another way to solve game theory problems is to use iterative best responses to compute pure Nash equilibria, that is, Nash equilibria in pure strategies. The authors of [216] proposed a “sensitivity-enhanced” iterative best response solver. In [204], an online game-theoretic trajectory planner based on IBR is proposed. The planner is suitable for online planning and exhibits complex behavior in competitive racing scenarios. Williams et al. proposed an IBR algorithm, as well as an information theoretic planner, for controlling two ground vehicles in close contact.

In [13], Schwarting et al. proposed an alternative approach to iterative best response for solving Nash equilibrium problems, which is based on reformulating the optimization problem as a local single-layer optimization using Karush–Kuhn–Tucker conditions. In [137], game theory is used to model the decision making of other vehicles. They proposed a Parallel Game Interaction Model (PGIM) for providing positive and socially compliant driving interactions. To address environmental uncertainty, the Nash equilibrium concept of game theory is extended to POMDPs. In [215], the authors account for the presence of uncertainty in the intentions of other road users by constructing multiple assumptions about their goals and constraints.

Discussion and future challenges

In this comprehensive survey, two key components critical to the progress of autonomous driving are introduced: human behavior research and interaction modeling. These parts form the basis for understanding and optimizing complex interactions in autonomous driving scenarios. In this section, the challenges and research directions of interactive scenarios in future autonomous driving research will be highlighted.

human behavior research

Driven by society's strong desire for autonomous driving, the study of human behavior has once again become a hot topic in recent years, especially in the context of autonomous vehicles. To better understand pedestrian behavior during autonomous vehicle interactions, many challenges still need to be overcome.

Overall, the exploration of driver behavior models is a promising area of ​​research that promises substantial improvements in the safety and efficiency of transportation systems. However, considerable work remains to be done in the development and validation of these models. Future research should prioritize creating more comprehensive models that cover a wider range of factors, including the driver's psychological state, surrounding environment, and interactions with others on the road.

For pedestrian behavior research, an important challenge is communication. First, although most researchers agree on the effectiveness of eHMI, there is still a lack of consensus on its content, form, and perspective. An open question is whether eHMI should be anthropomorphic or non-anthropomorphic. Similar issues arise for text and non-text eHMI. Furthermore, due to the presence of multiple pedestrians on the road, current eHMIs are mainly designed for one-to-one encounters, which may mislead other pedestrians. There are many similar problems that hinder the standardization of eHMI. On the other hand, since implicit signals such as vehicle kinematics are widely accepted, common, common and reliable, their critical role cannot be ignored. While researchers have attempted to influence pedestrians by manipulating implicit signals such as vehicle deceleration rates, lateral distance, and pitch, these efforts have been insufficient to ensure safe and effective communication. These communication methods lack relevant theoretical support to prove the accurate and effective transmission of communication information. In addition, in terms of research methods, including vehicle driving behavior design, subjective and objective experimental design, etc., the lack of reliable research paradigms is also a problem. In addition, how to effectively and smoothly combine eHMI and implicit signals to take advantage of the advantages of both parties is also an interesting research direction.

Another challenge is pedestrian behavior research. Pedestrians' decision-making and behavioral patterns are affected by the interaction situation, traffic environment, and participant diversity. However, these aspects currently lack sufficient research attention. Existing studies often focus on specific and simple interaction situations to control variables or simplify research complexity. However, real life involves a large number of complex scenarios, including intersections on multi-lane, two-way or unstructured roads, intersections facing dense continuous traffic flows, scenarios where multiple pedestrians cross the road, etc. In addition, pedestrian heterogeneity, such as gender, age, distraction, and group effects, also play an important role in interactions. Notably, there is still a lack of consensus on many influencing factors, such as waiting time and distraction. Therefore, due to the lack of sufficient and reliable results, research conclusions mainly rely on assumptions, highlighting the lack of understanding of the basic mechanisms of pedestrian road behavior.

Regarding pedestrian behavior modeling, learning-based methods have become increasingly attractive in recent years. End-to-end deep neural networks can effectively capture complex behavioral mechanisms and have made significant progress in the fields of pedestrian intention prediction and trajectory prediction. However, its black box nature cannot be ignored. These methods require large amounts of data to achieve robust performance, which limits their scalability to data-insufficient sporadic cases. In addition, black-box models have difficulties in explaining their decision-making and behavioral logic, which brings new problems to modeling. In contrast, expert models, such as social force models, evidence accumulation models, or game theory models, have solid psychological and behavioral foundations, and their behavioral decision-making logic is clear and explainable. However, most of these models have only been validated on limited data sets or are still in the laboratory validation stage and lack extensive engineering practice. Therefore, the theory of expert models needs to be further improved and extensively verified on a large number of real data sets in the future. In addition, expert models and data-driven models have advantages in different aspects. A possible future trend is to find a balance point where both models can be used together.

Finally, given that only a small portion of the overall literature on autonomous driving explicitly considers pedestrian behavior, there is a need to increase the applications of pedestrian behavior models, which may include but are not limited to pedestrian behavior prediction, autonomous vehicle behavioral design, and virtual autonomous vehicle validation .

Interactive modeling

As autonomous driving technology continues to develop, research on interaction modeling will play a key role in solving challenges and promoting the development of safer and more reliable autonomous vehicles.

One prominent approach that has attracted attention in autonomous driving research is the use of learning-based methods. These approaches have the appeal of end-to-end solutions that directly map sensory inputs and destination knowledge to autonomous vehicle behavior. However, such systems may behave as black boxes, leading to interpretability problems in the event of failures and difficulties in validating the model. Additionally, the huge task of completing the entire driving process, i.e. learning the entire driving process, also poses significant challenges. Therefore, current research efforts decompose this task into subtasks, including route planning, perception, motion planning, and control, and utilize learning-based methods to address these partial challenges.

The advantages of learning interactive behaviors through imitation learning or simulation in deep reinforcement learning (DRL) methods are also increasing. However, challenges remain. Most deep learning-based decisions assume an ideal road scenario and perfect perception of the surrounding environment. However, real-world conditions often involve occlusions, sensor noise, and environmental anomalies. Maintaining system performance and processing partial or noisy information during these sporadic events is an ongoing research challenge. Uncertainty arises from the unpredictable behavior of surrounding traffic participants, as well as from sensor noise and vehicle models. Furthermore, models trained in simulation environments, such as DRL models, raise the question of how to bridge the gap between simulation and reality. Several strategies have been proposed, including making simulations more realistic, domain randomization, and domain adaptation. These methods are designed to enable models to cope with the unpredictability and complexity of the real world, ensuring that they effectively apply what they learn on the road.

Another alternative to learning-based methods is model-based methods. This group of methods includes game theoretical models, behavioral models (discussed in the previous section), social forces, and potential fields.

Game theory provides the flexibility and adaptability to effectively handle a variety of situations without relying on a specific data distribution. One of its key advantages is the ability to handle planning and prediction of road users in a given scenario. However, there is a computational trade-off. As the number of road users and time horizon increases, the computational burden also increases. Researchers have proposed several strategies to enhance game theory solutions, including hierarchical game theory formulations, limiting the optimization problem to surrounding road users to approximate solutions, level-k game theory, or improving the performance of nonlinear optimization solvers.

On the other hand, social force or potential field methods provide a fast computational solution. They can be used to predict the behavior of surrounding road users and can also be used to control autonomous vehicles. Social force models rely on simplifying assumptions about human behavior. They typically treat pedestrians as particles or road users with fixed characteristics, ignoring the cognitive aspects of human decision-making, which may lead to unrealistic representations of complex and dynamic human behaviors. Future research directions for these methods include integrating cognitive elements or contextual information such as road rules and traffic signals. Exploring the integration of machine learning techniques to improve the adaptability and predictive capabilities of social force models is also a possible future research direction.

Existing research mainly focuses on the interaction between vehicles, which undoubtedly plays a key role in autonomous driving. However, there is an urgent need to develop methods that can handle interactions with human road users, especially pedestrians. As the field of autonomous driving continues to develop, theories and models that reveal the communication and interaction between governance and various road users will become more technically important and are expected to promote safety and efficiency in autonomous driving scenarios.

reference

[1] Crosato, L., Tian, K., Shum, H.P.H., Ho, E.S.L., Wang, Y. and Wei, C. (2023), Social Interaction-Aware Dynamical Models and Decision-Making for Autonomous Vehicles. Adv. Intell. Syst. 2300575. https://doi.org/10.1002/aisy.202300575

The contributing author is a special guest of " Autonomous Driving Heart Knowledge Planet ", welcome to join the exchange!

① Exclusive video courses on the entire network

BEV perception , millimeter wave radar vision fusion , multi-sensor calibration , multi-sensor fusion , multi-modal 3D target detection , lane line detection , trajectory prediction , online high-precision map , world model , point cloud 3D target detection , target tracking , Occupancy, CUDA and TensorRT model deployment , large models and autonomous driving , Nerf , semantic segmentation , autonomous driving simulation, sensor deployment, decision planning, trajectory prediction and other learning videos ( scan the QR code to learn )

109303c68c84551c0a794e08753b7a27.png Video official website: www.zdjszx.com

② The first autonomous driving learning community in China

A communication community of nearly 2,400 people, involving 30+ autonomous driving technology stack learning routes. Want to know more about autonomous driving perception (2D detection, segmentation, 2D/3D lane lines, BEV perception, 3D target detection, Occupancy, multi-sensor fusion, Technical solutions in the fields of multi-sensor calibration, target tracking, optical flow estimation), autonomous driving positioning and mapping (SLAM, high-precision maps, local online maps), autonomous driving planning control/trajectory prediction, AI model deployment and implementation, industry trends, Job postings are posted. Welcome to scan the QR code below and join the Knowledge Planet of the Heart of Autonomous Driving. This is a truly informative place where you can communicate with industry leaders about various problems related to getting started, studying, working, and job-hopping, and share papers and code on a daily basis. +Video , looking forward to communication!

c0ecd9e9de09cea7a6a10c9b5bf26f21.png

③【Heart of Autonomous Driving】Technical Exchange Group

The Heart of Autonomous Driving is the first autonomous driving developer community, focusing on target detection, semantic segmentation, panoramic segmentation, instance segmentation, key point detection, lane lines, target tracking, 3D target detection, BEV perception, multi-modal perception, Occupancy, Multi-sensor fusion, transformer, large model, point cloud processing, end-to-end autonomous driving, SLAM, optical flow estimation, depth estimation, trajectory prediction, high-precision map, NeRF, planning control, model deployment and implementation, autonomous driving simulation testing, products Managers, hardware configuration, AI job search exchanges , etc. Scan the QR code to add Autobot Assistant WeChat invitation to join the group, note: school/company + direction + nickname (quick way to join the group)

3fac8c5138b2cf2f754e9e8f5deef9b6.jpeg

④【Heart of Autonomous Driving】Platform Matrix, welcome to contact us!

ea1f2c04136a4207b496efb28b2fc133.jpeg

Guess you like

Origin blog.csdn.net/CV_Autobot/article/details/135434691