Python pandas simple application data processing

There are two data sets as follows: The
first data set is the drug use data of the five states of VA, OH, PA, VA, and KY from 2010 to 2017. The
Insert picture description here
second data set is the information of all the villages in the United States, including the latitude and longitude, and the population. The latitude and longitude are mainly used here. .
Insert picture description here
What needs to be done is to find the corresponding latitude and longitude in the data set 2 through the FIPS number in the data set 1, and then calculate the distance between all villages in pairs according to the latitude and longitude, and output it to a csv file.
Code

import numpy as np
import pandas as pd
from math import radians, cos, sin, asin, sqrt

def geodistance(lng1,lat1,lng2,lat2):#定义根据经纬度计算距离的函数
    #lng1,lat1,lng2,lat2 = (120.12802999999997,30.28708,115.86572000000001,28.7427)
    lng1, lat1, lng2, lat2 = map(radians, [float(lng1), float(lat1), float(lng2), float(lat2)]) # 经纬度转换成弧度
    dlon=lng2-lng1
    dlat=lat2-lat1
    a=sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    distance=2*asin(sqrt(a))*6371*1000 # 地球平均半径,6371km
    distance=round(distance/1000,3)
    return distance

df2 = pd.read_csv('E:/Data/simplemaps/uscities.csv')#读经纬度数据
df2=df2.drop_duplicates('county_fips')//去重
df2=df2[df2['state_id'].isin(['WV','OH','PA','VA','KY'])]//筛选要求的五个州信息数据
df2.sort_values(by='county_fips')//排序

df = pd.read_excel('E:\Data\MCM_NFLIS_Data.xlsx',engine='openpyxl',sheet_name='Data')
#df=pd.DataFrame(data)
df=df.sort_values(by='FIPS_Combined')
df=df.drop_duplicates('FIPS_Combined')

df3=pd.merge(df,df2,left_on="FIPS_Combined",right_on="county_fips")//拼接
df_loc=df3[['FIPS_Combined','lat','lng']]#能找到的经纬度数据
lat= df_loc[['lat']].values
lng= df_loc[['lng']].values
#df_dis=pd.DataFream(columns=['src','des','disrance'])
dis =[]
for i in range(len(lat)):
    for j in range(len(lng)):
        if(i!=j):
            dis.append([df_loc.iloc[i]['FIPS_Combined'],df_loc.iloc[j]['FIPS_Combined'],geodistance(lat[i],lng[i],lat[j],lng[j])])
df_loc=pd.DataFrame(dis,columns=['src', 'des', 'diatance'])
df_loc.to_csv('E:\Data\DistanceData.csv')//输出至csv文件

Calculated result
Insert picture description here

Guess you like

Origin blog.csdn.net/u011612364/article/details/113277792