There are two data sets as follows: The
first data set is the drug use data of the five states of VA, OH, PA, VA, and KY from 2010 to 2017. The
second data set is the information of all the villages in the United States, including the latitude and longitude, and the population. The latitude and longitude are mainly used here. .
What needs to be done is to find the corresponding latitude and longitude in the data set 2 through the FIPS number in the data set 1, and then calculate the distance between all villages in pairs according to the latitude and longitude, and output it to a csv file.
Code
import numpy as np
import pandas as pd
from math import radians, cos, sin, asin, sqrt
def geodistance(lng1,lat1,lng2,lat2):#定义根据经纬度计算距离的函数
#lng1,lat1,lng2,lat2 = (120.12802999999997,30.28708,115.86572000000001,28.7427)
lng1, lat1, lng2, lat2 = map(radians, [float(lng1), float(lat1), float(lng2), float(lat2)]) # 经纬度转换成弧度
dlon=lng2-lng1
dlat=lat2-lat1
a=sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
distance=2*asin(sqrt(a))*6371*1000 # 地球平均半径,6371km
distance=round(distance/1000,3)
return distance
df2 = pd.read_csv('E:/Data/simplemaps/uscities.csv')#读经纬度数据
df2=df2.drop_duplicates('county_fips')//去重
df2=df2[df2['state_id'].isin(['WV','OH','PA','VA','KY'])]//筛选要求的五个州信息数据
df2.sort_values(by='county_fips')//排序
df = pd.read_excel('E:\Data\MCM_NFLIS_Data.xlsx',engine='openpyxl',sheet_name='Data')
#df=pd.DataFrame(data)
df=df.sort_values(by='FIPS_Combined')
df=df.drop_duplicates('FIPS_Combined')
df3=pd.merge(df,df2,left_on="FIPS_Combined",right_on="county_fips")//拼接
df_loc=df3[['FIPS_Combined','lat','lng']]#能找到的经纬度数据
lat= df_loc[['lat']].values
lng= df_loc[['lng']].values
#df_dis=pd.DataFream(columns=['src','des','disrance'])
dis =[]
for i in range(len(lat)):
for j in range(len(lng)):
if(i!=j):
dis.append([df_loc.iloc[i]['FIPS_Combined'],df_loc.iloc[j]['FIPS_Combined'],geodistance(lat[i],lng[i],lat[j],lng[j])])
df_loc=pd.DataFrame(dis,columns=['src', 'des', 'diatance'])
df_loc.to_csv('E:\Data\DistanceData.csv')//输出至csv文件
Calculated result