python - geo binning - averaging values within a geo boundary

user3206440 :

With data like below, - captures measurements at various close locations

Lat Long    val
35.611053   139.628525  -72.82
35.61105336 139.6285236 -78.04
35.61105373 139.6285223 -72.99
35.61105409 139.6285209 -69.04
35.61105445 139.6285195 -65.4
35.61105482 139.6285182 -66.68
35.61105518 139.6285168 -65.82
35.61105555 139.6285155 -64.47
35.61105591 139.6285141 -71.26
35.61105627 139.6285127 -68.36
35.61105664 139.6285114 -74.48
35.611057   139.62851   -74.27
35.61105736 139.62851   -77.97
35.61105773 139.62851   -68.66
35.61105809 139.62851   -70.21
35.61105845 139.62851   -76.05
35.61105882 139.62851   -88.83
35.61105918 139.62851   -73.17
35.61105955 139.62851   -67.63
35.61105991 139.62851   -71.85
35.61106027 139.62851   -77.42
35.61106064 139.62851   -71.08
35.611061   139.62851   -79.27

Need to perform binning operation on this data - that is to get mean of all the values in val every 0.1x0.1 meters. One approach could be to find the edges ( like NW, SW, NE & SE) and divide it into a set of 0.1x0.1 meter grids and lookup values within each grid and compute average and attribute to the lat/long at the center of the grid so that we have results like below.

Lat Long    Mean_val    Sample_count

While the proposed approach may be naive, wanted also know if there could be an approach based on pandas

Omar Aldakar :

Simple solution to average data by 0.1 m * 0.1 m area

To do that you must convert your latitude,longitude coordinate into an x,y coordinate.

Here I use the utm module:

x,y,_,_ = utm.from_latlon(latitude, longitude) 

After that you can create a new column which represent your x,y coordinate in decimeter :

def apply_fun (raw):
    x,y,_,_ = utm.from_latlon(raw['Lat'],raw['Long']) 
    return str(np.round(x*10))+"|"+str(np.round(y*10))

Then add it to your dataframe :

x = df.apply(lambda row : apply_fun(row),axis=1)
df.insert(3,'Group',x)

and you apply the groupby function :

gdf = df.groupby(['Group']).agg({"Lat":["mean"],"Long":["mean","count"],"val":["mean"]})
gdf = gdf.reset_index().drop(columns=['Group'],level=0)
gdf.columns = [' '.join(col) for col in gdf.columns]

And we are done ! :)

Generalization of the previous solution

To group data by k1 meters * k2 meters area, just modify this function :

def apply_fun (raw):
    x,y,_,_ = utm.from_latlon(raw['Lat'],raw['Long']) 
    return str(np.round(x/k1))+"|"+str(np.round(y/k2)) 

Criticism of the previous solution

As I indicated previously to solve this problem, we have to convert the lat, long into x, y coordinates.

In the previous solution I converted the lat,long to utm coordinates. The utm system is a cartographic projection which divides the earth into 120 areas : 60 north and 60 south. So when we do :

x,y,area_number,NS = utm.from_latlon(raw['Lat'],raw['Long'])

(x,y) is our position in the (area_number,NS) area. We can conclude that our solution work if and only if our sensors are in the same UTM area.

We can also do this conversion using the ECEF conversions which directly converts lat,long into x, y coordinates. I do not know the precision of these methods and as we are asked for precision to the tenth of a meter I prefer to choose the utm convertion which look more accurate.

If you want to use the ECEF method done like this :

import pyproj
def gps_to_ecef_pyproj(lat, lon, alt):
    ecef = pyproj.Proj(proj='geocent', ellps='WGS84', datum='WGS84')
    lla = pyproj.Proj(proj='latlong', ellps='WGS84', datum='WGS84')
    x, y, z = pyproj.transform(lla, ecef, lon, lat, alt, radians=False)

    return x, y, z

x,y,z = gps_to_ecef_pyproj(raw['Lat'],raw['Long'],0)

(I take the code from here : https://gis.stackexchange.com/questions/230160/converting-wgs84-to-ecef-in-python)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=16964&siteId=1
GEO
GEO
GEO