How to find the location of the densest points on the map?

  Recently, I encountered a small demand point in my work. It is probably necessary to display on the map the location with the densest point density among a bunch of points. I didn't think of a good method at first, so I used a very simple strategy - averaging the coordinates of all points. This method is easy to use most of the time, because all points in most cities are basically around a certain center. point to the surrounding divergence. But when we actually used it online, we encountered two special cases.

  The first is that when the point distribution shows an abnormal shape, such as dumbbell-shaped data distributed at both ends, your average method will find the place with the sparsest data density in the middle, just like what we encountered in the Chengdu data, as shown below The red point in is the center point calculated by the average value.
Insert image description here
  Another abnormal case is when the data shows a circular distribution, such as the data in Beijing. The center of Beijing is the Forbidden City. It is impossible for us to have a point. If we directly calculate the average, the calculated center point is near the Forbidden City. Here The data is instead the sparsest, as shown in the figure below.
Insert image description here
  Later, I checked the information and learned that the kernel density method can solve the problems we encountered. After experiments, we found that the effect is not bad, so I will share it with you here. The idea of ​​kernel density is also very simple, which is to traverse all points, calculate the total kernel density from other points to the current point, and then find the point with the largest average density. As a simple example, given a point, if some other point is close to this point, the density value will be high, otherwise it will be far away. The average of the density sum from this point to all other points is the final density value of this point, here We can directly use the reciprocal of the distance as the kernel function, but this kernel function is linear, and the final result is not much different from my average.

  Optimize the idea, if the distance of a certain point is farther, should the density value brought by it be smaller? The predecessors also thought so, so there were many nonlinear kernel functions, and I finally used the Gaussian kernel. After adjusting the bandwidth of the kernel function, the density values ​​brought by other points will also be distributed normally with distance. The way of attenuation is as shown in the figure below. For example, the farther the coordinate value of the vertical axis is, the lower the value. The sigma in the figure is the bandwidth of our kernel function.
Insert image description here
  Next, look at the calculation process and effect. Since we are a Java system, my final implementation is to use java to call the simle package. The overall code is as follows:

	private double[] getHotpot(double[][] data) {
    
    
		// 创建高斯核
		MercerKernel<double[]> kernel = new GaussianKernel(0.02);

		// 计算所有点的核密度估计
		double[] densities = new double[data.length];
		for (int i = 0; i < data.length; i++) {
    
    
			for (int j = 0; j < data.length; j++) {
    
    
				densities[i] += kernel.k(data[i], data[j]);
			}
			// 计算平均密度
			densities[i] /= data.length;
		}

		// 找出密度最大的点
		int maxDensityIndex = 0;
		for (int i = 1; i < densities.length; i++) {
    
    
			if (densities[i] > densities[maxDensityIndex]) {
    
    
				maxDensityIndex = i;
			}
		}
		return data[maxDensityIndex];
	}

  Here I used 0.02 for the bandwidth (sigma in the Gaussian kernel). This is also the result of multiple debuggings. If it is too large, the calculated density value will be closer to the global average. If it is too small, several points will appear together. , but there are no other points around, we still take the above two abnormal cases to see the effect of the kernel density method. The first is the data of Chengdu dumbbell type.
Insert image description here
Next is Beijing's annular data
Insert image description here
  . In the picture above, I used sklearn in python to implement kernel density and folium to draw the map. The complete code is also posted for your reference.

# -*- coding: utf-8 -*-
import folium
import pandas as pd
from sklearn.neighbors import KernelDensity
import numpy as np

def getCenterPoint(sites):
    points = sites[['latitude', 'longitude']].values
    weights = sites['score'].values
    
    # 实例化KernelDensity对象
    kde = KernelDensity(kernel='gaussian', bandwidth=0.02)

    # 对数据进行拟合
    kde.fit(points) 

    # 使用KDE模型评估每个点的密度
    log_densities = kde.score_samples(points)

    # 密度最高的点是评估密度最高(即,log_densities值最大)的点
    highest_density_point = points[np.argmax(log_densities)]

    print(highest_density_point.tolist())
    return highest_density_point.tolist()

# 创建一个以给定经纬度为中心的地图,初始缩放级别设为14
m = folium.Map(zoom_start=14)

for i, s in data.iterrows():
    # 在地图上添加一个点标记
    folium.Marker(
        location=[s['latitude'], s['longitude']],  # 经纬度
        popup=s['resblock'], 
    ).add_to(m)
# 保存为html文件
centerPoint = getCenterPoint(cityDf)
folium.Marker(
    location=centerPoint,  # 经纬度
    popup='中心点',  # 弹出内容
    radius=50,
    icon=folium.Icon(color="red", icon="info-sign")
).add_to(m)

m.location = centerPoint

m.save('map.html')

Guess you like

Origin blog.csdn.net/xindoo/article/details/132515004