DeepGPS: Deep Learning Enhanced GPS Positioning in Urban Canyons

DeepGPS: Deep Learning Enhanced GPS Positioning in Urban Canyons

IEEE TMC|GPS Positioning in Urban Canyons Enhanced by Deep Learning
https://github.com/bducgroup/DeepGPS

Abstract

Global Positioning System (GPS) has brought benefits to many new applications in our daily life, such as navigation, car sharing and location-based services. While GPS works well in most places, its performance in urban canyons is notoriously poor due to signal reflections from non-line-of-sight satellites. Significant efforts have been made to mitigate the effects of inline-of-sight signals, and previous work has largely relied on accurate proprietary 3D city models or other third-party sources, which are not readily available.

In this paper, we introduce DeepGPS, a deep learning-enhanced GPS positioning system that can correct GPS estimates by considering only some simple contextual information. DeepGPS incorporates environmental factors , including building height and road distribution around the initial GPS position, and satellite status to describe the positioning context, and utilizes an encoder- decoder network model to implicitly learn the positioning context and GPS estimation from a large number of labeled GPS samples. complex relationship between. Thus, given the localization context, the model can accurately predict the correct location for each erroneous GPS estimate. We further improve the model with a new constraint mask to filter out invalid candidate locations and a simple transfer model for continuous localization.

Using a large bus trajectory dataset and on-site GPS measurements, a prototype system is implemented and experimentally evaluated. Experimental results show that DeepGPS significantly improves GPS performance in urban canyons, for example, effectively correcting 90.1% of GPS estimates on average and improving accuracy by 64.6%.

1 INTRODUCTION

Global Navigation Satellite Systems (GNSS), such as the well-known Global Positioning System (GPS), have benefited many intelligent applications, including navigation [65], instant delivery [69], ride-sharing [40], autonomous driving [12] and location-based service [34]. Precise positioning is critical for these applications to provide effective and efficient services. While GPS generally works well in most places, in urban canyons, GPS errors can be greater than 50 m [28].

Due to the reflection of satellite signals by high-rise buildings [33], GPS errors in urban canyons mainly come from multipath interference or the impact of non-line-of-sight (NLOS) reception . Although some hardware or software designs [18], [27], [44] can well deal with multipath interference, non-line-of-sight satellite signals at GPS receivers are still the main source of positioning errors in urban canyons. In principle, a GPS receiver needs to receive signals from at least four satellites to accurately triangulate its position [33]. In urban canyon scenarios, signals from non-line-of-sight satellites may be reflected by one or even multiple high-rise buildings and cause so-called pseudoranges (approximations of distances between target satellites and GPS receivers) to be larger, resulting in positioning errors.

Efforts made by earlier work to mitigate the effects of non-line-of-sight reception: An earlier work [21] applied a ray-tracing algorithm to satellite signals to compute pseudorange errors, which had to be reconstructed from reflective surfaces of street buildings using dedicated hardware, Examples include panoramic cameras [54] and lidar [57]. Based on proprietary 3D city models, most recent work exploits building geometry to compute signal paths between candidate locations around the GPS initial location and receiver observations [29], [39], [45], [ 48] or satellite visibility [23], [49], [60], [63], [68] and output the point with the best match as the solution. However, these works largely rely on accurate proprietary 3D city models or other third-party resources , such as panoramic images [39], which are not easily accessible and thus will largely limit their practical application.

Despite the limitations of previous work, they suggest that there should exist some mapping function f that maps the GPS estimate and its surroundings to the ground-truth position where the receiver is actually located . On the one hand, for a given location represented by a pair of longitude and latitude, the influence of its surrounding environmental factors on satellite signals is consistent over a long period of time. GPS satellites, on the other hand, operate regularly in a certain period, so their positions are predictable . Of course, it is impossible to enumerate all the localization contexts to find the mapping function f due to the huge computation and storage costs, and we can build a deep neural network to approximate the mapping function f by taking advantage of the powerful representation ability of deep learning to enhance the GPS positioning [35]. GPS trajectories [70] provide rich training data for learning deep neural network models for a given area. A well-trained model can provide fast and accurate GPS positioning to users in the area.

In this paper, we propose a deep learning-enhanced GPS positioning system, DeepGPS , which enables precise positioning in urban canyons. We build a deep neural network to implicitly capture the complex relationship between localization context and ground-truth positions from a large number of labeled GPS samples. The trained model acts as a black box, which can effectively convert wrong GPS estimates to correct locations , thus greatly improving localization accuracy in urban canyons.

However, instantiating DeepGPS needs to address three key challenges.

  1. First, there are multiple factors (such as layout and height of buildings, satellite distribution, human dynamics, etc.) that will affect GPS accuracy in urban canyons, and these factors exist in different forms, causing them to have different dimensions. How to correctly represent and fuse relevant localization context as input to deep neural networks without losing information is a challenge . To address this issue, we transform all contextual factors into matrices of the same size and combine them into an overall multi-source data input. Such a representation allows us to take advantage of recent advances in 2D image-based techniques in computer vision [32], [36] that help capture spatial correlations between environmental factors surrounding a GPS initial location.

  2. Second, predicting the correct location based on GPS estimates and other inputs can be modeled as a regression problem. Since locations are usually represented as a pair of latitude and longitude, it is difficult to train regression models due to the huge search space of floating point numbers. Therefore, how to model the position correction problem is crucial, as it also determines the architecture of deep neural networks. To address this challenge, we analyze the distribution of GPS errors in urban canyons and generate a set of candidate cells around the initial GPS position. We then cast the position correction problem as a classification problem, with the goal of predicting which cell is most likely to contain the ground truth position. To tackle this classification problem, we employ an encoder-decoder network. The model consists of an encoder that extracts features from the input matrix and two decoders, a distance decoder and a position decoder, that feed the feature maps generated by the encoder to predict localization errors and correct positions, respectively . More importantly, the two decoders share information from the same encoder and constrain each other to predict the most likely error and location , which can best match the observations. Furthermore, we embed environmental context into a constraint mask , which can improve model accuracy by filtering out inaccessible units.

  3. Third, most location-based applications usually need to track objects of interest, which requires continuous and accurate localization. Although we could leverage this model to correct each GPS estimate individually, this approach ignores the temporal correlation between user movements and is therefore inefficient. In fact, we can employ advanced techniques such as particle filtering [59] to track the user's location, however, these solutions require additional equipment to collect movement data. Therefore, making models well support continuous localization remains a challenge . To this end, we propose a simple continuous localization method based on a mobile model. We update the user's movement model with the latest fixed location, which helps to calculate the user's reachable area given the time interval between two localization instances. The reachable area provides different confidence levels to the candidate units and helps us accurately and efficiently identify units with the correct location.

In summary, this paper makes the following contributions:

  • To our knowledge, we are the first team to improve GPS performance in urban canyons by learning the relationship between different localization environments and GPS estimates.
  • We design and implement DeepGPS, built on an encoder-decoder network architecture, which can accurately predict positioning errors and correct positions from erroneous GPS estimates and multi-source data .
  • We incorporate domain knowledge into a new constraint mask that further improves the neural network and propose a mobile model for continuous localization .
  • A prototype system was developed and experimentally evaluated. Extensive evaluation results based on a large bus GPS trajectory dataset and on-site GPS measurements demonstrate the effectiveness of DeepGPS. On average, our system can effectively correct 90.1% of GPS estimates, improving positioning accuracy by 64.6%.

The remainder of this paper is organized as follows. We present background and motivation in Section 2. The design of DeepGPS is elaborated and implemented in Sections 3 and 4, respectively. Performance evaluation is carried out in Section 5. We review related work in Section 6 and present the Discussion in Section 7. Finally, Section 8 concludes the paper.

2 BACKGROUND AND MOTIVATION

In this part, we introduce the background of GPS, discuss the performance of GPS in urban canyons, and then analyze the previous work to inspire our design of DeepGPS.

Please add a picture description

2.1 The Prime of GPS

The GPS navigation system consists of three components: the satellite constellation, the ground station, and the user's GPS receiver. Specifically, the satellite constellation consists of 32 satellites, each orbiting the Earth every 12 hours and continuously broadcasting its position and other metadata over the Earth [33]. Metadata includes various properties of each satellite, such as the satellite's position. The ground station monitors and manages all satellites by sending them parameters that control their orbits and trajectories.

In addition, the satellites carry stable atomic clocks that are precisely synchronized with the clocks of ground stations. A GPS receiver receives signals from available satellites and calculates how far each signal has traveled, known as a pseudorange, by multiplying the speed of light by the signal's propagation delay. According to the position of the observed satellite and its pseudo-range, the receiver uses the least squares method to estimate its position ( x , y , z ) under the following constraints (x,y,z)(x,y,z )
( x − xi ) 2 + ( y − yi ) 2 + ( z − zi ) 2 = ℓ i \sqrt{(x-x_i)^2 + (y-y_i)^2 + (z - z_i) ^2} = ℓ_i(xxi)2+(yyi)2+(zzi)2 =i
For i = 1 , 2 , ⋅ ⋅ ⋅ , mi=1,2,···,mi=1,2,⋅⋅⋅,m , wheremmm is the number of satellites visible to the receiver,( xi , yi , zi ) (x_i,y_i,z_i)(xi,yi,zi) is thesecondThe position of i satellites, andℓ i ℓ_iiis the receiver and the iiMeasured distance between i satellites. In practice, the receiver's clock is not synchronized with the satellite's clock, so the receiver must have at least four satellites in order to determine its position and the unknown time offset between the receiver and the satellites.

A stand-alone GPS receiver is subject to many error sources that can severely impact GPS positioning performance [33]. For example, inconsistent atmospheric conditions can affect the propagation speed of GPS signals, causing ionospheric and tropospheric delays. In addition, the Doppler effect due to the rotation of the earth and the velocity of satellites also affects the propagation time of GPS signals and introduces positioning errors. In addition, both multipath interference receiving the same signal directly from the satellite and receiving it through reflections, and non-line-of-sight satellites receiving the signal only through reflections will affect pseudorange estimates. Finally, it is worth noting that although satellite clocks are very accurate, they are still not perfect, with a clock error of about 8.64 to 17.28 nanoseconds per day [3]. For the sake of simplicity, we only mention a few main error sources in GPS positioning, more analysis on GPS errors can be found in [3], [33].

Various techniques exist, such as differential GPS [25], to compensate for errors caused by inconsistent atmospheric conditions and the Doppler effect in modern GPS receivers [39]. Unlike GPS receiver clock errors, which can be fixed by receiving more satellite signals, satellite clock errors can be corrected using fitting polynomial models [52]. Two other sources of error, multipath interference and non-line-of-sight satellites, remain major challenges for GPS positioning, especially in urban canyons [16], [20], [23] .

2.2 How GPS Performs in Urban Canyons

In urban canyons, satellite signals are often reflected or blocked by densely packed buildings. As a result, GPS receivers fail to receive enough readings or incorrectly estimate pseudoranges, resulting in larger positioning errors , such as more than 50 m [28]. As shown in Figure 1, the user's receiver receives the direct signal (i.e. the green line) and the reflected signal (i.e. the blue line) from satellite a. These two signals multipath to the receiver and interfere with its positioning. Because satellite B is blocked by buildings, the receiver can only receive the reflected signal from satellite B, so the receiver calls satellite B a non-line-of-sight satellite. Compared to the direct signal, the reflected signal will travel a greater distance, say 100 meters [39], resulting in a larger estimated pseudorange. Unfortunately, GPS receivers cannot reliably distinguish between direct and reflected signals. Therefore, both multipath and NLOS reception are the main causes of GPS errors in urban areas .

To understand the real-world performance of GPS, we analyze two real-world GPS datasets collected from regularly operating buses equipped with GPS sensors and several modern smartphones. See Section 5.1 for more details on our GPS dataset. The positioning error statistics of these GPS data are shown in Figure 2. As the bus travels along the predetermined route within the downtown area, we find that the GPS positioning error is relatively large, with an 80th quantile error of 13.0 meters and an average error of 9.1 meters, as shown in Figure 2. Although modern smartphones have enhanced their localization components with advanced techniques, such as wireless signal-based mobile localization [24], [30], we still see considerable errors, as shown in Fig. 2, e.g., the average and 80th percentile errors were 7.0 m and 11.9 m, respectively. Figure 3 further shows a concrete example, where we see that the trajectory observed by the smartphone GPS receiver has a large deviation from the actual trajectory, with an average positioning error of up to 12.3 meters. Therefore, effective techniques are urgently needed to improve GPS accuracy in urban canyons .

2.3 Motivation

The root cause of GPS errors in urban canyons is the complex and dynamic urban environment, especially satellite signal reflections caused by high-rise buildings. Therefore, we assume that there is some relationship between GPS performance and the environment around the GPS receiver. This assumption is reasonable: on the one hand, for a given location represented by a pair of latitude and longitude, the influence of its surrounding environmental factors on satellite signals is consistent over a long period of time. On the other hand, GPS satellites operate regularly according to a certain period, so their positions can be predicted [33]. Thus, for a position p, we can enumerate all possible combinations of available satellites, including satellite identities and their positions in the sky, and for each combination of satellites there will be an observation p of the position estimated by the GPS receiver ^ \hat pp^. can be mapped by the function fff to model the actual positionppp and GPS estimatedp ^ \hat pp^relationship, the mapping function fff is defined as:
f ( p , c , s ) → p ^ , f (p, c, s) → \hat p,f(p,c,s)p^,
among them,ccc toppp encodes the surrounding urban environment, whilesss represents the distribution of satellites when GPS positioning is invoked. In practice we want to estimatep ^ \hat pp^Infer the ground truth position pp inp , so the function is expressed asf ( p ^ , c , s ) → pf(\hat p, c, s)→pf(p^,c,s)p

Existing works [22], [47], [58] usually assume the GPS error, ie, ppp andp ^ \hat pp^The difference between , follows a Gaussian distribution, while some studies [64] report that GPS errors actually follow a Rayleigh distribution. In fact, both Gaussian and Rayleigh distributions lack generality for modeling practical GPS estimates over large areas. For example, Wu et al. [58] had to fit a private Gaussian distribution to the GPS estimates for each small road segment. These statistical models omit important localization contexts, such as buildings and satellites, and thus are not sufficient for the mapping function fff for modeling.

Indeed there exists a lot of work that considers building and satellite information to improve GPS performance. They use a proprietary 3D city model to approximate the function fff , by implicitly assigning positionppp is tied to visible satellites, or explicitly tracksppThe signal path between p and the observing satellite.

  • For the former category of research work, they studied and extended the idea of ​​shadow matching [23], [49], [60], [63], [68] to improve localization accuracy. The shadow matching algorithm determines the user's location from candidate locations by comparing satellite visibility with GPS receiver measurements [23]. At a given initial GPS position p ^ \hat pp^In the case of p ^ \hat pp^A certain surrounding area is evenly divided into grids, and each grid serves as a candidate for the actual location. For each candidate location, the algorithm predicts satellite visibility based on the 3D city model and satellite position. A given satellite is visible if its direct signal to the candidate location cannot be blocked by obstacles such as buildings. At the same time, satellite visibility can also be estimated by receiving signals from GPS receivers. Since reflected non-line-of-sight signals can lead to incorrect visibility estimates, shadow matching algorithms typically assume that satellites with received signal strengths greater than a predefined threshold are visible. Finally, the candidate locations whose satellite visibility best matches the signal-based satellite visibility estimate are considered as shadow matching solutions.

  • Work in the latter category either corrects for pseudoranges by recovering a “virtual direct path” [39] (e.g., restoring the real path to the red dotted line of satellite B in Fig. Similarities between signal paths [29], [45].

However, both shadow-matching-based and satellite-signal-path-based methods rely heavily on proprietary 3D city models, which are not readily available . Worse, it is difficult to correctly set the threshold for determining satellite visibility in shadow matching, and recalculating pseudoranges can be a huge computational overhead, which largely limits their usefulness on ordinary devices such as smartphones sex.

Inspired by the theory of deep reinforcement learning, which uses deep neural networks as function approximators to learn the Q-function [13], we propose to use a powerful deep neural network to approximate the mapping function fff , to better describe the complex relationship between GPS estimation and localization context, including urban environment, satellite distribution, etc. It is feasible and beneficial to build such a deep neural network model.

  • First, due to the powerful representation capability of deep learning [35], we can input rough contextual information into the deep neuralc and satellite distributionsss instead of proprietary 3D city models.
  • Second, we can have abundant GPS data to train the model well. Due to the widespread use of GPS devices in vehicles (such as taxis and buses) and smart devices (such as smartphones), we can accumulate a large number of GPS measurements as training samples [41].
  • Finally, although training a deep learning model can be time-consuming, inference is extremely fast [53]. Compared with previous work [23], [39], [45], [49], deep learning-based solutions require a large amount of 3D city model storage and a large amount of real-time computing, which can be trained and deployed on the cloud. model, and quickly respond to each request to fix GPS estimates.

challenge . Despite its great advantages, implementing a deep learning-enhanced GPS positioning system is not trivial due to the following challenges.

First, multiple key factors such as building height and satellite status may affect GPS performance in urban canyons while they are in different modes and in different dimensions. Therefore, how to represent them and feed them correctly into deep neural networks without losing information is a challenge.

Second, locations are usually expressed as a pair of latitude and longitude, which is difficult to predict due to the large search space. Therefore, how to design the network architecture, including the form of input and target, and exploit more opportunities to optimize the search space remains to be explored.

Third, location-based applications often require tracking objects of interest. Although we could invoke the model to fix each individual GPS estimate, however, this approach is inefficient and may have low accuracy. Therefore, how to make deep neural networks support continuous localization is a challenge.

3 DESIGN OF DeepGPS

In this section, we detail the design details of DeepGPS and discuss how to extend DeepGPS for continuous positioning.

3.1 System Overview

Please add a picture description

Figure 4 shows the system architecture of DeepGPS, which consists of four main modules: input representation, encoder, distance decoder, and position decoder. At a high level, DeepGPS incorporates multiple factors affecting GPS positioning in urban canyons and represents them in a matrix of the same size. Based on the encoder-decoder network design, DeepGPS extracts latent features from the input matrix, and then simultaneously predicts the positioning error and the correct position given the GPS estimate.

  • The input representation module (§3.2) takes into account the satellite state, fix time and surrounding environment (i.e. road and building information around the raw coordinates output by the GPS receiver) and represents them as a matrix. DeepGPS uses an encoder-decoder network model to implement a deep neural network for approximating the mapping function fff (§3.3)。
  • The encoder module is designed to extract high-level features from the input matrix.
  • The distance decoder and position decoder modules use the exported feature maps to generate distance probability vectors and position probability matrices, respectively. The distance probability vector indicates possible positioning errors, while the position probability matrix indicates the correct position for the current GPS fix instance.
  • In addition, DeepGPS builds a constraint mask from the environment matrix, filters out impossible locations from the location probability matrix, and outputs a final matrix where the location with the largest value is the final solution.
  • Furthermore, DeepGPS maintains a movement model learned from the user's latest movements and utilizes this model to optimize the output for efficient continuous localization (§3.4).

3.2 Multi-Source Data Fused Input Representation

In addition to inconsistencies in atmospheric conditions and Doppler effects, multiple factors may affect GPS performance in urban canyons, which should be considered as inputs for DeepGPS. As mentioned in Section 2.3, GPS errors in urban canyons are mainly caused by satellite signal reflections. In principle, whether individual objects are reflected or not depends on the satellite position and the height of buildings around the GPS receiver. In practice, people usually walk or drive on roads in urban areas, so road information is important to constrain the possible locations of users. In addition, the specific timing of the GPS fix instance should also be considered for the following reasons. First, human mobility is regular and consistent [66], so time can be an implicit indicator of urban dynamics, which can affect GPS positioning. Second, navigation satellites operate on a regular basis, and time can also be used to indirectly encode information about satellite distribution. In summary, we take the satellite state, time and surrounding environment as the input for DeepGPS . In particular, we describe the environment using the information of roads and buildings nearby the raw coordinates provided by the receiver.

We prefer multi-source data fusion as input instead of building deep neural network models for each input data source individually . Such operations can avoid information loss of the original data and troublesome training of multiple models. To this end, we represent multi-source data in matrices of the same size, which can well capture the spatial relationship between objects of interest. We detail how each data source is represented as follows.

Please add a picture description

(1) Representation of surrounding environment

Surrounding environment converted to environment matrix

When a fix is ​​required, the GPS receiver will provide an estimated position p ^ \hat p in the form of **[latitude, longitude]p^, and provide the error** of the estimated position uncertainty. In principle, the actual position ppp in the estimated positionp ^ \hat pp^is the center of the circle, and its radius is the GPS error. Therefore, we consider the surrounding environment, especially the height and layout of buildings and the geographical distribution of roads, to search for the actual location p.

We construct an environment matrix M e M_eMeto represent these environmental factors that may affect the GPS positioning instance. We first choose one with p ^ \hat pp^is the center, and the side length is 2 R 2R2 a square area of ​​R , and then divide this area intoc × cc × cc×cell of c . These cells serve asppCandidate positions where p may be located. Each cell in the square area corresponds to the matrixM e M_eMean element of . For each element in the matrix, its value is set according to the following rules:

  • i) If the cell is part of a building, its value is set to the building height;
  • ii) if the cell is part of a road, the value is zero;
  • iii) Otherwise, it is −1 -11 , which means the corresponding cell is inaccessible. According to our analysis of a large amount of GPS data, we conservatively set R = 50 meters, which is greater than 99% of the GPS error, as shown in Fig. 2. Furthermore, we set the cell size to c = 1 m to achieve fine-grained position fixation. Therefore, the matrixM e M_eMeThe size is 100x100 100x100100×100 . Figure 5(a) shows how we divide the target area into candidate units and construct the corresponding environment matrixM e M_eMe, as shown in Figure 5(b).

(2) Representation of satellite statuses

Satellite state conversion to sky map matrix

In addition to the estimated position, we can also obtain satellite metadata from the receiver, which includes the azimuth, elevation, and signal-to-noise ratio (i.e., SNR) of each satellite. Given the received satellite metadata, a sky map can be drawn to illustrate the satellite geometry at a given ground station [43]. Figure 6(a) shows an example sky map drawn with satellite metadata collected in our field experiments. Each circle represents a satellite detected by the GPS receiver, and the number serves as the satellite identification. The color of each circle indicates the satellite's signal strength: green is very good, yellow is fair, and red is not good.

Sky maps are an effective representation of satellite status, and previous work usually utilizes sky maps to filter non-line-of-sight satellites [50], [60], or perform shadow-matching based position fixation with the help of 3D city models [23], [ 49], [68]. Therefore, we are trying to convert the sky map into a matrix as part of the input for DeepGPS . We call this matrix the skymap matrix M s M_sMs, because it can encode the relative positions and SNR values ​​of all detected satellites. We will matrix M s M_sMsSet to match M e M_eMethe same size, and embed the information of each satellite on the sky map into the matrix M s M_sMsmiddle. First, we align the sky map and M s M_sMscenter of. Then, for each detected satellite, use the satellite's elevation and azimuth to map its position from the sky map to M s M_sMs. Specifically, the elevation angle of a satellite determines its position in the matrix M s M_sMsThe position on M s and M_sMsThe distance between the centers, while the azimuth of the satellite determines the angle clockwise from the "up" direction. If a position on matrix Ms "has" a satellite, its value is set to the SNR value of the corresponding satellite; otherwise, its value is zero. Figure 6(b) shows the skymap matrix for the example skymap in Figure 6(a).

(3) Representation of timestamp

Convert Timestamp to Timestamp Matrix

Typically, the timestamp of a GPS fix instance is represented as a number. According to the operating rules of GPS satellites, each satellite circles the earth every 12 hours, which means that in theory it will periodically return to the same place. However, due to the simultaneous rotation of the Earth, the relative position from the satellite to the receiver at the same location may be different after one orbital period. Recent studies [11], [61] report that the revisit period of GPS satellites is variable. Specifically, the revisit periods of the GPS satellites are slightly different, in the range of 240 s and 250 s, each satellite's revisit period is slightly earlier than one day. Furthermore, the average satellite revisit period is 246 seconds less than one day [11]. For simplicity, we set the revisit period for all satellites to be 86154 s (i.e., 24 × 3600−246)3, which means that after such a revisit period, a receiver at a fixed location will “see” the same GPS satellites . Based on the revisit cycle of GPS satellites, we start from the time stamp ttt generates timestamp matrixM t M_tMt

Please add a picture description

First, we set the given timestamp ttt is converted to a 7-dimensional vectorV t V_tVt. Vector V t V_tVtEach element of is derived from a specific operation as shown in Table 1. Specifically, Vt[0] and Vt[1] represent the conditions of the Earth's rotation period and the satellite revisit period, respectively. Note that the total number of seconds for the revisit period is set to 86154 seconds, and the total number of seconds for the calendar day (i.e. 24 hours) is 86400 seconds. Vector V t V_tVtThe 3rd to 6th elements of are the result of applying sine and cosine to Vt[0] and Vt[1], which helps to find the period quickly [10]. Additionally, if ttThe element Vt [6] is set to 1 before noon of the day and 0 otherwise.

Then, we apply a simple multi-layer perception (MLP) model to convert the vector V t V_tVtConvert to embedding vectors. The MLP model consists of five layers, including an input layer, three hidden layers, and an output layer. Figure 7 shows the MLP model structure. Specifically, the input layer takes the 7-dimensional vector V t V_tVtAs input; three hidden layers with 100, 1000 and 2000 nodes respectively; while the output layer will generate an embedding vector of size 10000. Furthermore, we adopt rectified linear unit (ReLU) as the activation function. The derived embedding vectors are reshaped to match M e M_eMeand M s M_sMsA matrix with the same size 100x100. For a given timestamp ttt , we call this matrix the timestamp matrixM t M_tMt, which implies a periodic feature of the GPS fix time .

3.3 Deep Neural Network Design

Given these input matrices, a deep neural network can be built to directly predict the correct location. However, since both latitude and longitude are floating-point numbers, the search space is infinite, which makes the derived models difficult to train and use in practice . Instead, we set the target output to be a matrix and configure the desired model to predict which cell is the correct location. The center of the predicted cell is considered a fixed location. At the same time, we don't actually need to correct all positioning results, because many GPS estimates may be accurate enough for upper-level applications. Therefore, we want our model to also predict the positioning error, and then we can determine whether to fix the current GPS estimate or not, depending on the requirements of the location-based application.

To this end, we model the desired deep neural network as an encoder-decoder architecture, where two decoders feed back to the encoder, as shown in Figure 4. Specifically, encoder EEE from the input matrices (i.e.M e , M s and M t M_e, M_s and M_tMeMsand Mt) in the learning feature map MMM , andMMM is fed to the distance decoderD dist D_{dist}Ddistand position decoder D posi D_{posi}Dposi, to predict the localization error and the corrected position , respectively . An additional advantage of the dual-decoder design is that, given the same feature map MMM , the position decoder and the range decoder can be bound to each other and make their predictions best match the receiver's observation and localization context.

Model structure

Encoder : Since the input to our model is an image-like matrix, we construct an encoder E with a series of 2D convolutional layers to extract from the localization context represented by the environment matrix, skymap matrix, and timestamp matrix. spatial features. In addition, we have in encoder E and position decoder D posi D_{posi}Dposi4 and 6 Resnet blocks are respectively adopted in [20] to form a deep neural network that can simultaneously explore enough feature space and avoid gradient vanishing.

Distance decoder : We propose a distance decoder, D dist : M → V dist D_{dist}:M→ V_{dist}Ddist:MVdist, to predict the localization error given the feature map M. Decoder D dist D_{dist}DdistApply two 2D convolutional layers to process M, then flatten the intermediate results and feed into four linear layers to derive the final output V dist V_{dist}Vdist. We prefer to predict an error interval rather than a precise error that can be modeled as a regression problem . Therefore, we define the output as V dist V_{dist}VdistProbability vector, where each element corresponds to an error interval and whose values ​​indicate probabilities. In our implementation, we set the separation to 2 meters, V dist V_{dist}VdistThe size is 25 dimensional because more than 99% of the GPS errors are less than 50 meters. Therefore, V dist V_{dist}Vdisttarget iii elements means error in( 2 ∗ i , 2 ∗ ( i + 1 ) ] (2*i, 2*(i+1)](2i2(i+1 )] meters.

Positional decoder : Similarly, the positional decoder, D posi : M → R posi D_{posi}:M→R_{posi}Dposi:MRposiis designed to predict the correct position p′p′ from the feature map Mp ' . As shown in Figure 4,D posi D_{posi}DposiSharing a similar but opposite structure with Encoder E, Encoder E first applies 6 Resnet blocks to the feature map M, and then uses two 2D transposed convolutional layers to rescale the intermediate results, ending with a 2D convolutional layer. Output R posi R_{posi}Rposiis a matrix of the same size as the input matrix, so R posi R_{posi}RposiEach element in corresponds to the environment matrix M e M_eMeCandidate units defined in . Instead of labeling the cells containing ground truth locations as 1 and the rest as 0, we set the target locations as Gaussian peaks. Therefore, R posi R_{posi}Rposiis a probability matrix, the element with the greatest probability implies the position of the ground truth position. Compared with one-hot encoding, Gaussian peak representation helps to avoid vanishing gradients during model training [14].

Constraint Mask : Position Decoder D posi D_{posi}DposiAny cell can be predicted as a target, however, some cells occupied by buildings or obstacles clearly cannot contain the correct location. Therefore, we propose the constraint mask C env C_{env}Cenv, which embeds prior knowledge of the surrounding environment to constrain D posi D_{posi}DposiOutput. Specifically, if the corresponding unit is not accessible, then C env C_{env}CenvAn element of is set to 0; otherwise it is set to 1, and the corresponding cell is a valid candidate.

Loss function

We use one-hot encoding to give the distance decoder D dist D_{dist}DdistPrepare target vector V true V_{true}Vtrue. For each GPS sample, we use the GPS estimate and actual position to calculate its true positioning error e, and then calculate the error interval j = ⌊ e 2 ⌋ j=⌊\frac{e}{2}⌋j=2e , where e falls in the interval. Then, we passV true V_{true}Vtruejj ofSet the value of j elements to 1, and set the value of the remaining elements to 0, and prepare the target vectorV true V_{true}Vtrue. Since we model localization error prediction as a classification problem , we employ a cross-entropy loss to measure D dist D_{dist}DdistThe output V dist V_{dist}Vdistand target V true V_{true}VtrueThe distance between , which is defined as:
L dist = − ∑ i = 1 nbilog ( pi ) L_{dist} = −\sum_{i=1}^{n}b_i log(p_i)Ldist=i=1nbilog(pi)
in thatbi b_ibiis a binary indicator (if iii interval is true as 1, otherwise 0),pi p_ipiright iiSoftmax probability of i interval, n isV dist V_{dist}Vdistthe size of. By default, we set n to 25.

For the positional decoder, we measure the augmented output R env R_{env } using mean squared error loss (i.e. L2 norm loss)Renvand target R true R_{true}RtrueThe distance between , which is a Gaussian representation of the ground truth position. Therefore, the position decoder D posi D_{posi}DposiThe loss function of is defined as:
L posi = ∣ ∣ R true − R env ∣ ∣ 2 L_{posi} = ||R_{true} − R_{env} ||^2Lposi=∣∣RtrueRenv2
where∣ ∣ ⋅ ∣ ∣ 2 ||·||^2∣∣2 is the L2 norm loss.

Finally, the total loss function of our model is L dist L_{dist}LdistL posi L_{posi}LposiThe weighted sum of , ie. ,
L overall = λ × L dist + L posi L_{overall}=λ×L_{dist}+L_{posi}Loverall=l×Ldist+Lposi
where λ is a regularization parameter to balance the influence of the distance decoder and position decoder on the encoder . To determine the correct setting for λ, we train D dist D_{dist} separatelyDdistand D posi D_{posi}Dposito observe their losses, and then set λ so that the losses of the two decoders are well balanced. According to our experiments, we finally set λ = 0.001 λ=0.001l=0.001 , which achieves the best predictive performance of both decoders.

Model training

We train DeepGPS using a large number of labeled GPS samples, as well as road networks, building measurements, and satellite data. For each GPS positioning instance and its ground-truth position, we construct the environment matrix M e M_e by exploiting the information of road network and building heightMe, the timestamp matrix M t from the positioning time M_tMtand the satellite matrix M s M_s from the corresponding sky mapMs. Furthermore, we pass one-hot encoding and the matrix R true R_{true}RtrueConstructed a 25-dimensional vector V true V_{true}Vtrue, a matrix that labels cells containing actual locations as Gaussian peaks. Vector V true V_{true}Vtrueand matrix R true R_{true}RtrueRespectively, the distance decoder D dist D_{dist}Ddistand position decoder D posi D_{posi}Dpositarget output.

With proper setting of λ λλ and loss functionL overall L_{overall}LoverallIn the case of , we train the network as a whole using labeled GPS samples. Since the distance loss and position loss are added to update the common encoder, the two decoders can mutually access each other's information, leading to more accurate predictions. It is worth noting that we treat the MLP model as part of the encoder E, and the MLP model takes the time ttt is converted to a timestamp matrixM t M_tMt, so we train the MLP model together with the encoder.

3.4 Extend to Continuous Localization

With enough localization instances, user movement information can be inferred, which can be used to further improve location fixation by reducing location uncertainty. Specifically, DeepGPS introduces a movement model, as shown in Figure 4, which aims to roughly estimate the user's movement speed, and use this speed to calculate the reachable area to constrain future positioning .

Mobility model

mobility model . DeepGPS uses the latest k corrected positioning instances, especially the position ( pi − 1 ′ , . . . , pi − k ′ ) (p^{′}_{i−1},...,p^{′} _{i−k})(pi1,...,pik) and corresponding positioning time( ti − 1 , . . . , ti − k ) (t_{i−1},...,t_{i−k})(ti1,...,tik) , to infer the user's movement. For simplicity and generality, DeepGPS only estimates the user's average moving velocityvvv , because other movement information, such as direction of movement, requires additional sensor data. For a given user, DeepGPS continuously updates the average velocityvvv
v = ∑ j = 1 k − 1 L ( p i − j ′ , p i − j − 1 ′ ) t i − j − t i − j − 1 k − 1 v=\frac{\sum_{j=1}^{k-1} \frac{L\left(p_{i-j}^{\prime}, p_{i-j-1}^{\prime}\right)}{t_{i-j}-t_{i-j-1}}}{k-1} v=k1j=1k1tijtij1L(pij,pij1)
其中 L ( p i − j ′ , p i − j − 1 ′ ) L\left(p_{i-j}^{\prime}, p_{i-j-1}^{\prime}\right) L(pij,pij1) to calculate the positionpi − j ′ p_{ij}^{\prime}pijand pi − j − 1 ′ p_{ij-1}^{\prime}pij1the distance between. We set k to an available number for initial use when there are insufficient positions for velocity calculations.

Mobility improved position fixing

Please add a picture description

Mobility improves location capabilities . Once the average moving speed vvWith v ready, we can derive additional mobility constraints for position correction. For the GPS estimated positionp ^ i \hat p_ip^i, on the one hand, we construct a side length 2 R 2R2 R withp ^ i \hat p_ip^ias the center square area, and divide the square area into c × cc × cc×unit of c . On the other hand, construct the positionpi − 1 ′ p^{′}_{i−1}pi1is the center of the circle, which is the corrected position of the last positioning instance, and its radius is S = v × ∆ t S=v × ∆tS=v×∆t , where,∆t ∆tt is positioningp ^ i \hat p_ip^itime ti t_itiAnd position p ^ i − 1 \hat p_{i-1}p^i1Time ti − 1 t_{i−1}ti1time difference between. The circled area indicates the user's reachable area based on the user's recent movement speed. Combining GPS location and reachable area, the cells in the intersection of the square area and the circle area are most likely to contain the correct location , while other cells in the square area are less likely. Based on this intuition, we construct a matrix called the motion mask C mob C_{mob}Cmob, each element of which corresponds to a cell in the square region. Specifically, if the cell (or part) is covered by a circle, its value is set to 1; otherwise, we set its value to ρ ( 0 ≤ ρ < 1 ) ρ(0≤ρ<1)ρ ( 0r<1 ) , which means that the cell may contain a confidence levelρThe correct position of ρ . It is worth noting that if we setρ = 1 ρ=1r=1 , we actually disabled the positioning feature for mobility improvements. Figure 8 illustrates an example of using motion information and GPS estimates to determine a motion mask.

The motion mask will be applied after the constraint mask, instead of applying the constraint mask and the motion mask together to the raw output of the positional decoder. The reason behind this is that we can make the mobility model independent of the deep neural network, thus relieving the overhead of model training . Therefore, we combine the mobile mask C mob with R env C_{mob} with R_{env}Cmobwith Renvdot product, R env R_{env}Renvis the processing result of applying the constraint mask to the raw output of the position decoder, and then derives the final probability matrix R mob = R env ⊙ C mob R_{mob}=R_{env}⊙C_{mob}Rmob=RenvCmob, which is corrected based on the mobility information. Finally, output R mob R_{mob}RmobThe center of the cell with the maximum value in is used as the correction position of DeepGPS.

In fact, our mobility model can be further enhanced with richer mobility data and advanced tracking techniques . For example, if the inertial measurement unit (IMU) sensor data of the user's smartphone is available, we can obtain more information about the user's movement [62], such as moving speed and direction, and utilize more advanced techniques such as particle filters [59 ] to track the user's movement, thereby refining the reachable area to greatly reduce the location uncertainty. We consider this study as our future work.

4 IMPLEMENTATION

We designed and implemented a prototype on the cloud (as server) and several Android smartphones (as clients), as shown in Figure 9. The Android client collects raw GPS measurements and uploads them to the cloud server, which is responsible for predicting the error distance and correct position. We detail the implementation of each component and the entire workflow as follows.

Client on Android . We implemented the client component on a smartphone using the Android operating system, which allowed us to easily access raw GPS measurements (including GPS estimated positions and errors, satellite metadata, etc.) via the Android API [2]. Clients can log raw GPS measurements for data analysis. If the user needs a more accurate location in an urban canyon, she can trigger a "LOCATION UPDATE" button to fix the GPS estimate, powered by the DeepGPS model on the cloud. If necessary, users can also enable the automatic location update function by turning on the "AUTO" option, which will allow DeepGPS to analyze the user's rough movement information for continuous positioning.

Servers on the cloud . We implemented the DeepGPS model in PyTorch 1.8.1 (CUDA 11.1) [7], the CPU of the server is Intel(R) Core(TM) i7-10700K 3.80GHz, the GPU is RTX3090, and the memory is 48GB. To train our model, we use Adam as the optimizer and set the learning rate α=1e−5 and the batch size to 128. In addition, we deployed the DeepGPS model on the cloud, and users can access location services anytime and anywhere. Cloud deployment will also alleviate the computational and storage overhead of smartphone clients [37].

On the server side, we also maintain a spatial database implemented in PostgreSQL [6] using the spatial database extension program PostGIS [5] for efficient data query. A spatial database is used to store the test city's road network and building measurements (for example, building heights and boundaries).

The spatial database interacts with the DeepGPS model for data retrieval and logging. On the one hand, given an estimated location from a client, DeepGPS can retrieve road network and building information around the location to construct an environment matrix. On the other hand, when the user enables continuous positioning, DeepGPS will record the user's historical positioning data into the database to dynamically update the user's movement model.

workflow . As shown in Figure 9, DeepGPS performs position fixation on every request, as shown below.

① When precise positioning is required, the client first obtains the GPS estimated position p ^ \hat p through the Android APIp^and other satellite metadata, and these data (including p ^ \hat pp^, timestamp t and satellite metadata) and the error threshold δ are communicated to the server. Note that the threshold δ is optional, the user can have DeepGPS always correct the GPS estimates. If necessary, the threshold δ can be specified by some location-based applications according to their requirements for positioning accuracy.

② After receiving the request, the DeepGPS model retrieves the position p ^ \hat p from the spatial databasep^The surrounding road network and building data, and then construct the environment matrix M e M_e by utilizing multi-source data respectivelyMe, time stamp matrix M t M_tMtand satellite matrix M s M_sMs. Once these input matrices are ready, and δ is provided, the model will first utilize the distance decoder D dist D_{dist}DdistTo predict the possible error e. If e is less than the threshold δ, DeepGPS will p ^ \hat pp^Consider the correct position p ′ p^{′}p , becausep ^ \hat pp^Accurate enough for the user's application. Otherwise, DeepGPS calls position decoder D posi D_{posi}DposiTo predict the correct position p ′ p^{′}p' . Since we have no way of knowing the ground truth position, DeepGPS has no further action after predicting the correct position. If δ is not provided,D posi D_{posi}Dposito infer p ′ p^{′}p

③ Once the positioning request is processed, DeepGPS will record a record < t , p ^ , p ′ > <t, \hat p, p^{′}><tp^p> record into the database. This record indicates that the requester at timettt visited positionp ′ p^{′}p , and will be appended to the requester-specific file. These records help DeepGPS to dynamically update the movement model of a given user.

④ Finally, the server sends the position p ′ p^{′}p' sent back to the client.

5 PERFORMANCE EVALUATION

In this section, we evaluate the performance of DeepGPS using a large bus GPS trajectory dataset and real-time GPS measurements collected using an Android client.

5.1 Experiment Setup

We conducted all experiments in Shenzhen, China, which has the second most skyscrapers in the world [9].

Dataset

We collected five different types of data for performance evaluation. Specifically, road network and building measurement data are used to represent the positioning environment, and satellite data describe the distribution and status of satellites. Furthermore, we use bus trajectory data and on-site GPS measurements as localization instances for model training and testing.

  1. road network. We download the road network of Shenzhen City from OpenStreetMap (OSM) [4]. The OSM file contains all roads and points-of-interest POIs in our test city, such as lakes and meadows, in the form of nodes (i.e., points), ways (i.e., roads), and relationships (i.e., attributes of POIs). In particular, we can leverage relations to distinguish whether a given cell is accessible or not.

  2. Building survey data. We obtain architectural survey data from collaborators. This survey file is similar in format to the OSM file and contains information about the layout, height, and characteristics of all buildings in Shenzhen. Specifically, node lists are connected end-to-end to form the outline of each building, while height and building attributes (such as name) are labeled with relations. This data helps us identify building heights and less-traveled areas in cities.

  3. bus trajectory data. Public buses equipped with GPS devices can regularly report their status to the operations center. Generally speaking, buses in Shenzhen City run regularly according to fixed routes and schedules, sending reports every 5 seconds [56]. Each report includes timestamps, GPS location, driving speed, direction, status, and more. We prepared a bus GPS trajectory dataset, which was collected by 16690 buses on June 12, 2020, covering 1845 routes. These bus lines cover most of the urban areas of Shenzhen. We have a total of 41,540,968 bus reports.

  4. Real-time GPS measurements. By installing our Android client on smartphones (see Section 4), five volunteers drove their private cars to collect GPS measurements every 5 seconds along some planned routes in the city center of Nanshan District, Shenzhen. Their smartphones include Huawei Mate 10 pro, Mate 30 pro, Mate 40 pro and Samsung Galaxy Note 5. Finally, we collected 16814 valid localization samples.

  5. satellite data. Our Android client can collect satellite metadata, so when collecting positioning samples, the real live GPS dataset contains satellite information. However, these bus reports do not include satellite metadata. To compensate for this, we accessed CelesTrak [1] to retrieve historical satellite metadata given each bus's reported timestamp and actual position. Thus, we can supplement all bus reports with corresponding satellite data.

Ground truth collection

Ground truth collection. Similar to previous work [22], [53], [58], we use an advanced Hidden Markov-based map matching algorithm [47] to map GPS sequences and real-time GPS measurements of bus trajectories to road segments. Since both the public bus and the test vehicle follow the planned route, we can use this prior knowledge to validate the map matching results and manually correct these wrong matching results. Finally, for each GPS estimated position p ^ \hat pp^, we treat its projection on the matching road segment as the ground truth ppp

Testing regions and model training

Test area and model training. Since our goal is to improve GPS performance in urban canyons, we chose three downtown districts of Nanshan, Futian, and Bao'an in Shenzhen as our test areas. These three areas have the highest density of high-rise buildings in Shenzhen, and we denote them as Area N, Area F, and Area B, respectively. We also collected actual GPS measurements for the N area.

We keep the bus reports belonging to these three regions for experimentation. Considering that different regions have different environments, we train a dedicated DeepGPS model for each region using each region's own bus reports. For each region, we use 70% of the bus reports to train its model while reserving the remaining 30% for testing. All live GPS data is for testing purposes only. Furthermore, we later compare the performance of the unified model trained using data from the three regions with these custom models.

Performance metrics.

We define the following three metrics to evaluate the performance of DeepGPS.

  • precision. Prediction accuracy is defined as the average distance between the location output by the model and the ground truth .
    accuracy = 1 N ∑ i = 1 NL ( pi , pi ′ ) , accuracy = \frac{1}{N}\sum_{i=1}^{N}L(p_i, p^{′}_i),accuracy=N1i=1NL(pi,pi) ,
    where N is the number of GPS samples,pi p_ipiand pi ′ p^{′}_ipiare the iii ground truth and correct position, and the functionL ( pi , pi ′ ) L(p_i, p^{′}_i)L(pi,pi) returns the distance between them.

  • effective ratio. In addition to accuracy, we also define effective ratios to evaluate the localization improvement of DeepGPS over raw GPS estimates. Specifically, we define the effective ratio as the proportion of samples whose correct position predicted by DeepGPS is closer to the ground truth than the position estimated by GPS, that is,
    ratio = ∑ i = 1 NI ( L ( pi , pi ′ ) < L ( pi , p ^ i ) ) N × 100 % ratio =\frac{∑_{i=1}^{N}\mathbb{I}(L(p_i, p^{′}_i)<L(p_i,\hat p_i ))}{N} × 100\%ratio=Ni=1NI(L(pi,pi)<L(pi,p^i))×100%
    of which, if conditionaaa is true, then the indicator functionI ( a ) \mathbb{I}(a)I ( a ) will return 1; otherwise 0.

  • The forecast was wrong. Since our model can predict the positioning error distance, we define the average prediction error to evaluate DeepGPS as follows:
     error = ∑ i = 1 N ∣ arg ⁡ max ⁡ ( V disti ) − arg ⁡ max ⁡ ( V true i ) ∣ N × 2 \text { error }=\frac{\sum_{i=1}^{N}\left|\arg \max \left(\mathbf{V}_{dist}^{i}\right )-\arg \max \left(\mathbf{V}_{\text {true }}^{i}\right)\right|}{N} \times 2 error =Ni=1N argmax(Vdisti)argmax(Vtrue i) ×2
    where argmax(V) returns the index of the largest element in vector V. Recall that each interval in the vector V corresponds to 2 meters, so multiply the mean index offset by 2 to get the prediction error.

We evaluate the positional decoder D posi D_{posi} using the precision and effective ratioDposi, while using the prediction error to evaluate the distance decoder D dist D_{dist}Ddist. In the experiments below, we set λ = 0.001 and δ = 0 to force the position decoder to correct each GPS estimate. By default, we set k=5 and ρ=0.4 for the mobility model and c=1 m for the cell size.

5.2 Evaluation on Bus Data

Please add a picture description

Overall performance

For each test area, we train a custom model using GPS samples collected in that area. Furthermore, we train a unified model based on the training data of the three regions. To study the generalization ability of DeepGPS, we also apply the model of one region to the other two regions for cross-validation. In this subsection, we have disabled the mobility model for the experiments.

As shown in Table 2, we find that the unified model achieves high localization accuracies for the three regions, with accuracies of 4.2 m, 4.1 m, and 3.2 m, respectively. Compared with the unified model, the accuracy of the customized model is much higher . Specifically, each custom model achieves the best accuracy on its own region, for example, the best accuracy results of the models trained on the datasets of Region N, Region F and Region B are 3.6 meters, 2.3 meters and 2.3 meters. Although we still observe relatively high accuracy applied across regions. For example, the model trained for region N achieves an average accuracy of 4.8 meters on the test data of region F, which is much better than the average GPS error of region F (i.e., 7.2 meters). These results show that the DeepGPS model has good generalizability and can achieve the best performance if trained specifically using samples from the target region. Compared with the original GPS error, the verification rate of DeepGPSim for GPS positioning is 57.6%,

Figure 10 demonstrates a concrete example where DeepGPS fixes the GPS estimate correctly. Due to the influence of surrounding buildings, GPS incorrectly locates the user on adjacent road segments, as shown in Figure 10(a). We further manually investigate the localization environment, as shown in Figure 10(b), which is a typical street canyon in our test city.

Please add a picture description

We observe similar results for effective ratios and forecast errors, as shown in Tables 3 and 4, respectively. From the experimental results in Table 3, it can be seen that although the unified model can achieve high effective rationing in the three areas, that is, the average effective rate is 85.3%, each customized model has the highest effective rate in the target area, and the average effective rate 90.1%. Even though the custom model is trained on samples from one region and used to correct GPS estimates in other regions, DeepGPS still performs well, with an efficiency greater than 72.0%. The results in Table 3 show that DeepGPS can obtain better corrected positions than GPS raw output in most cases.

Table 4 gives the DeepGPS distance decoder D dist D_{dist}Ddistevaluation results. We found that the prediction error was small, ie, 1.4 to 4.1 meters. In particular, if we train and test the DeepGPS model on the same region, the average prediction error is only 1.7 meters, which corresponds to V_{dist} only on the 25-dimensional vector V distVdistAn interval offset in . So $D_{dist} is doing pretty well. According to the above experimental results, we found that DeepGPS has good versatility and positioning performance. On average, DeepGPS improves GPS positioning accuracy by 64.6%, effectively correcting GPS estimates in 90.1% of cases.

Effect of building heights

Please add a picture description

We use the environment matrix M e M_e containing building height informationMe, which explores the effect of building height on localization performance in urban canyons. For each GPS sample, we compute the building heights around the GPS receiver as M e M_eMeThe average of the elements in that have a value greater than zero. Figure 11 compares the positioning accuracy of GPS and DeepGPS at different building heights. Note that each value x on the x-axis indicates a range of building heights, [x,x+20) meters. In general, taller buildings are more likely to reflect or even block satellite signals, significantly affecting positioning. Accuracy results for both GPS and DeepGPS deteriorate when there are tall buildings around the receiver. However, DeepGPS still performs better than GPS with much smaller positioning errors.

As shown in Figure 12, we observe that building height also affects the performance of DeepGPS on the effective ratio and prediction error metrics. In general, taller buildings lead to a lower effective ratio of DeepGPS while increasing prediction error.

Effect of positioning time

We also investigate the effect of time on the accuracy of GPS and DeepGPS and plot the results in Fig. 13. We see that the accuracy of the two systems varies slightly at different times of the day. DeepGPS always works much better than GPS. In addition, we observe in Figure 15 that the effective ratio and forecast error have similar trends. From these two graphs, we observe a slight drop in the performance of DeepGPS over several hours (e.g. 11am and 11pm). However, the reason remains to be further explored.

5.3 Evaluation on Real-field Data

In this subsection, we further evaluate DeepGPS using in situ GPS measurements. Specifically, we will investigate the performance of each model component and the impact of some important parameters.

Visualization

Please add a picture description

We collect raw GPS measurements using modern smartphones in street canyons in District N and visualize some of the GPS data in Figure 14. Compared to the ground-truth position shown in Figure 14(a), the smartphone's GPS estimate is very noisy, with large deviations from the ground-truth. On the contrary, DeepGPS can effectively correct these erroneous GPS estimates , as shown in Fig. 14(c). In addition to the comparison of discrete locations, Figure 3 also compares three complete trajectories, namely the ground-truth trajectory, the smartphone trajectory, and the corrected trajectory from DeepGPS. Figure 3 shows that DeepGPS improves GPS performance by drastically reducing the error from 12.3 meters to 5.2 meters.

Impact of different model components

Please add a picture description

Effects of different model components. To understand how the input data sources and functional components affect the performance of DeepGPS, we conduct various ablation experiments. We use GPS performance as a baseline and compare with different system designs. In the following experiments, we remove each component and train the remaining models. All variant models are trained and tested using the same training and testing datasets, respectively. The result is shown in Figure 16, where "wo" is the abbreviation of without, and "dist.dec." stands for the distance decoder.

In general, we find that these input data, namely building heights, road information and timestamps, have a much greater impact on the performance of DeepGPS than other modules such as constraint masks and mobility models . If we omit the input of building or road information, the performance of DeepGPS drops severely to 6.9 meters, the accuracy is similar to GPS error, and the effective rate is less than 60%. Therefore, it is proved that the environment is an important factor affecting the performance of GPS positioning in urban canyons.

From Figure 16, we are surprised to find that time has the greatest impact on localization performance . Without inputting the timestamp matrix, the accuracy of DeepGPS drops to 7.5 meters, which is even larger than the GPS error (i.e., 7.2 meters), while the efficiency is as low as 39.3%. This phenomenon can be explained as follows. In urban canyons, multipath and non-line-of-sight satellites are the main causes of GPS performance degradation . For a GPS receiver at a specific location, whether the received GPS signal is reflected depends mainly on the surrounding buildings and satellite distribution, that is, the satellite geometry that the receiver can observe. Considering that GPS satellites operate periodically, multipath and non-line-of-sight satellites at this location have inherent repeatability characteristics due to the approximate repetition of satellite geometry when the environment around the receiver remains constant [19 ]. Figure 16 shows that the sky map also affects the performance of DeepGPS. However, its impact is limited compared to other input data sources, with accuracy and efficiency dropping to 4.4 m and 73.4%, respectively. Timestamps are a more critical factor than sky maps, capturing the important relationship between satellite geometry and positioning performance.

In our deep neural network model, we use the distance decoder and the position decoder to train the encoder together, and the results in Fig. 16 also demonstrate the effectiveness of the distance decoder, which improves the accuracy from 4.3m to 3.6m. It confirms that the distance decoder can influence the position decoder by adding implicit constraints on the position prediction .

We propose two kinds of feature masks, namely constraint mask and motion mask , to further improve DeepGPS. Constraint masks use environmental information to do things like map matching, and reduce localization errors by filtering out impossible cells, improving accuracy by 10%. In addition, the moving mask further improves the accuracy of DeepGPS by about 5%.

Impact of k and ρ

The mobility model utilizes the latest k corrected fixes to calculate the user's moving speed, and configures the reliability ρ for unreachable cell divisions. Therefore, we conduct experiments to study the influence of k and ρ on localization accuracy . As shown in Figure 17, the accuracy generally increases for a given ρ when we use more correct positions, i.e., increase k for velocity calculations. However, when k ≥ 5, the accuracy improvement brought by more localization samples is negligible.

As the confidence ρ increases, the accuracy increases for the setting of k ≤ 5; for other k values, when ρ ≤ 0.4, the positioning accuracy increases, but as ρ increases, the positioning accuracy becomes worse. When the velocity calculation is not accurate enough (for example, in the case of k ≤ 5 samples), DeepGPS tends to treat all candidate cells equally, and tends to choose a larger ρ to obtain better positioning accuracy. When we can accurately estimate the movement velocity (i.e., in the case of k ≥ 5 samples), the user's actual location should be likely to be in those cells covered by the reachable area (i.e., the green cells), so we need a small ρ to filter out unreachable cells. However, too small a value of ρ (e.g., 0.1) means that DeepGPS will blindly trust the mobility model and may erroneously filter out correct cells for some positioning instances that occasionally fall in unreachable cells. Therefore, too small ρ is a bit aggressive and will hurt the localization accuracy. On the other hand, a larger value of ρ will weaken the filtering ability of the mobile mask and wrongly select some unreachable cells as output, thus reducing the positioning accuracy. From Figure 17, we conclude that k=5 and ρ=0.4 are good settings to achieve the best accuracy while avoiding extra computation.

Processing time

We evaluate the efficiency of DeepGPS and give the average processing time of four key modules in Table 5. To process each request, the input module takes the most time, say 34.1ms to build the three matrices, while the encoder or any decoder takes very little time, say < 4ms. Furthermore, it will take milliseconds to derive the correct position by searching for the element with the greatest probability in the final output matrix. The total processing time for DeepGPS to correct GPS estimates is about 43 milliseconds. As a clear comparison, the state-of-the-art method Gnome [39] takes several seconds to correct the GPS estimate (not including the time for offline precomputation), which recalculates pseudoranges based on 3D building models and satellite positions for position correction. Therefore, it is effective to train a deep learning model to infer the correct position.

Impact of cell size

Figure 18 shows the localization accuracy and end-to-end processing time under various settings of cell size c. In general, larger cells reduce the size of the input matrix and thus the overall processing time. However, they also introduce larger positioning errors. For example, a 4 m × 4 m cell results in a positioning error of 8.6 m, which is larger than the GPS error. Conversely, a 1m x 1m cell improves accuracy to 3.6m at the cost of only 10ms added latency. Since the increased processing time is small, DeepGPS adopts 1 m × 1 m cells for better localization performance.

6 RELATED WORK

Significant efforts have been made to improve GPS accuracy in urban areas. A vehicle can map its position to a road segment by combining various techniques such as map matching [47], [53] and dead reckoning [31], as well as GPS estimation. Pedestrian users can utilize cellular/WiFi signals [24], [30], inertial sensors [15], [62], magnetic compass [55] and barometer [26] on their smartphones to enhance localization. Other techniques have been proposed, such as cooperative GPS [17] and differential GPS [25], to improve GPS accuracy by sharing positioning information among multiple receivers. For example, Chen et al. [17] proposed BikeGPS to achieve precise positioning of shared bicycles in urban canyons by sharing GPS reception among a group of bicycles. However, these efforts require additional sensors or cooperation between multiple GPS receivers. Additionally, they will not address the root cause of GPS errors in urban canyons, non-line-of-sight satellite signals.

Previous work explored mitigation schemes for non-line-of-sight signal reception, which can be divided into three categories, name-tracking-based methods [21], [67], shadow-matching-based methods [20], [23], [49], [50], [60], [63], [68] and satellite signal path based methods [29], [39], [45], [48]. The first class of studies [21], [67] applied ray-tracing algorithms to satellite signals to correct for pseudorange errors. For example, Zhang et al. [67] proposed a 3D map database-assisted GNSS-based co-localization method, which uses a ray-tracing algorithm to correct the NLOS pseudoranges of each GPS receiver, and employs factor graph optimization techniques to co-localize Optimize positioning between multiple receivers. However, this approach relies on collaboration between the 3D building model and multiple GPS receivers. For high-quality ray tracing, some methods require reconstruction of reflective surfaces of street buildings using specialized hardware, such as panoramic cameras [54] or lidar [57].

In Section 2.3, we discuss shadow matching-based methods and satellite signal path-based methods that exploit proprietary 3D city models and real-time satellite information to explicitly correct GPS estimates. As a representative work of the former category, Ng et al. [49] implemented shadow matching for smartphones using a machine learning classifier to distinguish between line-of-sight and non-line-of-sight satellites. However, shadow-matching based methods mainly rely on accurate 3D city models, and removing non-line-of-sight satellites may reduce the number of available satellite readings and make it impossible to calculate receiver positions. Moreover, a recent work in the latter category, Gnome [39], uses panoramic images from Google Street View to adjust the building heights of 3D city models, and then estimates ground-truth locations from candidate grids by exploiting these building data. Gnome relies heavily on third-party resources, which are not widely available and incur a huge computational overhead. Therefore, the limited availability of 3D building models will largely limit the practical application of these methods.

With the widespread availability of GPS trajectory data [70], many works have attempted to measure and calibrate GPS errors from a statistical perspective [28], [42], [46], [51], [58]. For example, Ma et al. [42] used historical bus GPS trajectory data to evaluate the GPS environmental friendliness of urban road segments. Wu et al. [58] assumed that GPS errors follow a Gaussian distribution and localized a single GPS location to a road segment based on a statistical model specifically learned from GPS data for that road segment. Different from these works, we utilize a large number of GPS samples to train a deep neural network that translates GPS estimates to correct locations and can serve large urban areas.

7 DISCUSSION

In this section, we discuss the performance comparison, implementation, update and data release of DeepGPS.

Please add a picture description

Accuracy comparison with other work . Previous work relies heavily on accurate proprietary 3D building models, which are not easily accessible, and thus, we cannot achieve direct performance comparisons with existing methods. Instead, we summarize their average localization accuracy based on their experimental results and compare DeepGPS with the three representative works [39], [49], [67] on localization accuracy performance metrics mentioned in the previous section . As shown in Table 6, we find that DeepGPS significantly outperforms existing methods, achieving better localization accuracy, e.g., 53.8%, 40.0%, and 41.9% accuracy improvements, respectively.

Deploy on smartphones . When deploying DeepGPS on smartphones, we have to address two key challenges. First, the execution of DeepGPS depends on a trained model, which is somewhat large in size, i.e. about 170MB, and other resources, such as road network and building survey data, which are usually of large size. For example, the road network file in Shenzhen is about 344MB, and the building measurement data is about 420MB. Therefore, the deployment of DeepGPS consumes at least 934MB of memory, which is a relatively huge storage overhead for ordinary smartphones. Second, DeepGPS will incur considerable computational cost. Given a positioning request, the system needs to query the road network and building survey data to retrieve the environment information around the initial GPS position, and then calculate the environment matrix, sky map matrix and timestamp matrix. These three matrices are then fed into the model to extrapolate localization errors and correct positions. Currently, we leave these calculations to powerful servers, not smartphones.

In the future, we will investigate how to compress deep neural network models while maintaining their performance by investigating some advanced model compression techniques [38]. In addition, we found that GPS actually performed well in most areas, except those with a high concentration of tall buildings. Thus, we can conduct a survey of the distribution of GPS errors across the city and identify areas where GPS is performing poorly. For each region, we prepare an environment package that only contains the road network and building survey data for that region. Compared with city-scale road network and building survey data, such district-level environmental packages would be much smaller and smartphone-friendly. Thus, users can download the packages they need. However, more efforts are needed to reduce the computational overhead.

System resilience and renewal . DeepGPS remains effective even when the urban environment changes, such as the construction of new buildings and/or the demolition of old buildings. This is because our deep neural network model captures the general mapping relationship between GPS estimates and localization context. Once urban environmental changes are recorded in road network files or building measurements, such information can be immediately encoded in environmental matrices. Furthermore, we propose the constraint mask C env C_{env}Cenv, which embeds prior knowledge of the surrounding environment (such as buildings) to constrain the prediction of the correct location. Thus, if an area is occupied by a new building, then in the constraint mask C env C_{env}Cenv, units covered by these buildings will be marked as unavailable. With these designs, our system can efficiently correct GPS estimates and thus be resilient to environmental changes.

Despite the novel design described above, we still recommend retraining the model periodically using the latest source data, including newly collected GPS samples, the latest road network and building measurements. Periodic model retraining aims to update the mapping relationship between GPS estimation and positioning context in time. Specifically, after a given period of time, say three months, we can retrain the model with the latest resource data. Generally, the retraining process can be completed within a few hours. For example, it takes about 6.6 hours to train a model for Shenzhen, China. During model retraining, existing models can still be used to serve location requests. Once the model retraining is complete, we will replace the old model with the updated model to provide more accurate positioning services.

Reproducibility data release . We share the source code of the DeepGPS implementation and real live GPS samples [8] for the community to replicate our results and inspire future research.

8 CONCLUSION

This paper proposes DeepGPS, which utilizes an encoder-decoder network model to predict the correct position for each erroneous GPS estimate. More specifically, DeepGPS incorporates multiple factors that affect GPS accuracy in urban canyons (building heights, road distribution, time, and GPS satellite status) through two parallel decoders to predict positioning errors and correct positioning positions. We further enhance DeepGPS with a novel constrained mask design by filtering out inaccessible candidate locations, and achieve continuous localization using a simple yet effective movement model. The system has been implemented and evaluated. Extensive experiments based on large-scale bus trajectory datasets and on-site GPS measurements demonstrate that DeepGPS can significantly enhance GPS localization in urban canyons, e.g., effectively correcting 90.1% of GPS estimates on average and improving accuracy by 64.6%.

Guess you like

Origin blog.csdn.net/Sky_QiaoBa_Sum/article/details/130784415