Crowd-Robot Interaction paper reading

Paper information

Title : Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Authors : Changan Chen, Y uejiang Liu
Code Address : https://github.com/vita-epfl/CrowdNav
Source : arXiv
Time : 2019

Abstract

Moving in an efficient and socially norm-compliant manner is an important but challenging task for robots operating in crowded spaces. Recent work has shown the power of deep reinforcement learning techniques for learning social cooperation policies. However, as the population grows, their ability to cooperate decreases because they usually relax the problem into a one-way human-computer interaction problem. In this work, we hope to go beyond first-order human-robot interaction and model crowd-robot interaction (CRI) more explicitly.

We propose to
(i) rethink pairwise interactions with self-attention mechanisms and
(ii) jointly model human-robot and human-human interactions in a deep reinforcement learning framework. Our model captures the human-human interactions that occur in dense crowds, which indirectly affects the robot's expected capabilities.

Introduction

Navigating social etiquette is a challenging task.

Since communication between agents (e.g. humans) is not extensive, robots need to sense and predict the evolution of crowds, which may involve complex interactions (e.g. repulsion/attraction). Research work on trajectory prediction proposes several manual or data-driven approaches to model interactions between agents [12]–[15]. However, integrating these predictive models into the decision-making process remains challenging.

As an alternative, reinforcement learning frameworks have been used to train computationally efficient policies that implicitly encode interactions and cooperation between agents. Despite significant advances in recent work [19]–[22], existing models are still limited by two aspects: i
) The collective influence of a population is usually modeled by a simplified aggregation of pairwise interactions, such as max-min operator [19] or LSTM [22], which may not be able to fully represent all interactions;
ii) most methods focus on the one-way interaction between humans and robots, but ignore interactions within the crowd that may indirectly affect the robot. These limitations degrade the performance of collaborative planning in complex and congested scenarios
insert image description here

Background

Related Work

Early work mainly utilizes well-designed interaction models to enhance social awareness for robot navigation.
A seminal work is social force [23]-[25], which has been successfully applied to autonomous robots in simulated and real-world environments [26]-[28].

Another method named Interaction Gaussian Process (IGP) models the trajectory of each agent as a separate Gaussian process, and proposes an interaction potential term to couple the individual GPs for interaction [18], [29], [ 30]. In a multi-agent setting, the same strategy is applied to all agents, and response methods such as RVO [5] and ORCA [6] seek joint obstacle avoidance speeds under mutual assumptions. The main challenges faced by these models are that they rely heavily on hand-crafted features and do not generalize well to various scenarios of crowd cooperation.

Another work uses imitation learning methods to learn policy from demonstrations of desired behavior. [31]–[33] develop navigation policies that map various inputs (e.g., depth images, lidar measurements, and local maps) to control manipulation by directly imitating expert demonstrations. In addition to behavior cloning, inverse reinforcement learning has been used in [10], [11], [34] to learn latent cooperative features from human data using maximum entropy methods. Learning outcomes in these works are highly dependent on the scale and quality of demonstrations, which not only consumes resources, but also limits the quality of policies that humans strive to learn. In our work, we employ imitation learning to hot-start our model training.

Reinforcement learning (RL) methods have been intensively studied in the past few years and applied in various domains since it started to achieve excellent performance in video games [35]. In the field of robotic navigation, recent work has used reinforcement learning to learn sensorimotor policies in static and dynamic environments from raw observations [21], [36] and agent-level state information to learn social cooperation policies [19], [20 ] ,[twenty two]. To handle varying numbers of neighbors, the method reported in [19] adapts from two agents to the multi-agent situation via a max-min operation that takes the best action for the worst-case scenario of a crowd. A later extension uses an LSTM model to sequentially process the states of each neighbor in reverse order of distance from the robot [22]. In contrast to these simplifications, we propose a novel neural network model to explicitly capture the collective influence of crowds.

Problem Formation

We consider a navigation task in which a robot moves towards a goal through a crowd of n people. This can be formulated as a sequential decision problem in a reinforcement learning framework [19], [20], [22].
For each agent (robot or human), position p = [ px , py ] p = [p_x, p_y]p=[px,py] , velocityv = [ vx , vy ] v = [v_x,v_y]v=[vx,vy] and the radiusrrr can be observed by other agents. The robot also knows its unobservable state, including the target positionpg p_gpgand preferred speed vpref v_{pref}vpref. We assume the robot's velocity vt v_tvtCan be used in the action command at a_tatImmediately thereafter, vt = at v_t = a_tvt=at. let st s_tstRepresents the state of the robot, wt = [ wt 1 , wt 2 , . . . , wtn ] w_t = [w^1_t ,w^2_t , . .. ,w^n_t]wt=[wt1,wt2,...,wtn] means that humans at timettstate of t . The joint state of robot navigation is defined asstjn = [ st , wt ] s^{jn}_t = [s_t,w_t]stjn=[st,wt]
insert image description here

We follow the formulation of the reward function defined in [19], [20], which rewards task achievement while penalizing collisions or uncomfortable distances
insert image description here

Value Network Training

The value network is trained by time-difference method, standard experience replay and fixed-objective network techniques [19], [35]. As described in Algorithm 1, the model is first initialized by imitation learning using a set of demonstrator experiences (Lines 1-3), and then refined on the interaction experience (Lines 4-14). One difference from previous work [19], [20] is that the next state S jn t+1 in line 7 is obtained by querying the true value of the environment instead of approximating it with a linear motion model, thus mitigating the System dynamics problem training. During deployment, transition probabilities can be approximated by trajectory prediction models [12], [13], [15].
insert image description here

To effectively address problem (1), value network models need to accurately approximate the optimal value function V* that implicitly encodes social cooperation among agents. Previous work on this track did not fully model crowd interactions, which reduced the accuracy of value estimates for densely populated scenes. In the following sections, we propose a novel crowd-robot interaction model that can efficiently learn to navigate in crowded spaces.

Approach

When humans walk in densely populated scenes, they cooperate with others by predicting the behavior of nearby neighbors, especially those neighbors who are likely to participate in some future interactions. This motivates us to design a model that can compute relative importance and encode the collective influence of neighboring agents for socially compliant navigation. Inspired by social pooling [13], [15] and attention models [14], [44]–[48], we introduce a social attention network consisting of three modules: Interaction module:
Explicitly Model human-computer interaction and encode human-computer interaction through coarse-grained local maps.
• Pooling module: Aggregate interactions into fixed-length embedding vectors through self-attention mechanism.
• Planning module: Estimating the value of robot and crowd joint states for social navigation.

insert image description here

Parameterization

We follow the robot-centric parametrization in [19], [22], where the robot is located at the origin and the x-axis points to the robot's goal. The transformed state of the robot and the walking person is:
insert image description here

Interaction Module

Each person has an influence on the robot and is also influenced by his/her neighbors. Explicitly modeling all interaction pairs between humans leads to O(N2) complexity [14], which is computationally infeasible for policies that scale in dense scenarios. We address this problem by introducing a pairwise interaction module that explicitly models human-computer interaction while using a local map as a coarse-grained representation of human-computer interaction.
insert image description here

Given a neighborhood of size L, we construct an L × L × 3 map tensor M i centered on each person iMito encode the presence and velocity of neighbors, called the local map in Fig. 3:
insert image description here
We use a multi-layer perceptron (MLP) to combine the state of person i and the map tensor M i M_iMiand the state of the robot are embedded into a fixed-length vector ei e_ieiMiddle:
insert image description here
Embedding vector ei e_ieiis fed to the subsequent MLP to obtain the pairwise interaction features between robot and person i:
insert image description here
insert image description here

Pooling Module

Since the number of surrounding people can vary greatly in different scenes, we need a model that can process an arbitrary number of inputs into a fixed-size output.

Everett et al. [22] propose to feed all human states into an LSTM in descending order of distance from the robot [49]. However, the underlying assumption that the closest neighbor has the strongest influence is not always true. Some other factors, such as speed and direction, are also critical for correctly estimating the importance of a neighbor, reflecting how that neighbor potentially affects the robot's goal acquisition. Leveraging recent advances in self-attention mechanisms, where attention to an item in a sequence is obtained by looking at other items in the sequence [44], [46], [50], we propose a social attention pooling module Gain a data-driven understanding of the relative importance of each neighbor and the collective impact of the population.

Interactive embedding ei e_ieiConvert to attention score α i α_iaiAs follows:
insert image description here
Given the pairwise interaction vector hi h_i of each neighbor ihiand the corresponding attention scores α i α_iai, the final representation of the crowd is a weighted linear combination of all pairs:
insert image description here

Planning Module

Based on the compact representation of the crowd c, we construct a planning module for estimating the state value v for cooperative planning:
insert image description here

Implementation Details

The local map is a 4 × 4 grid centered on each person, and each grid has a side length of 1m. Functions ϕ e ( ⋅ ) , ψ h ( ⋅ ) , ψ α ( ⋅ ) , fv ( ⋅ ) \phi_e( ), ψ_h( ), ψ_α( ), f_v( )ϕe() _h() _a()fv() are (150,100), (100,50), (100,100), (150,100,100) respectively.

We implemented this strategy in PyTorch [51] and trained it with a batch size of 100 using Adam [52]. For imitation learning, we collected 3k demos using ORCA and trained the policy for 50 epochs with a learning rate of 0.01. For reinforcement learning, the learning rate is 0.001 and the discount factor γ is 0.9. The exploration rate of the ε-greedy strategy decays linearly from 0.5 to 0.1 in the first 5k episodes and remains at 0.1 in the remaining 5k episodes. RL training took about 10 hours on an i7-8700 CPU.

This work assumes that the robot has full kinematics, i.e. it can move in any direction. The action space consists of 80 discrete actions: 5 velocities exponentially distributed between (0, vpre f ] and 16 directions uniformly distributed between [0, 2π).

Experiments

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qin_liang/article/details/132131969