How to effectively pick points that lowers mean distance to known points?

Victor :

So, you have got a set of "explored" points in space, as well as a set of "unexplored" points. You want to pick K unexplored points to explore so that the mean distance from the unexplored points to their closest explored point is minimized.

Sketch of the problem

Could this be done more effectively than by brute force picking the unexplored points one by one and measuring the mean distance?

I have got the python function below that gets the job done. But it is not feasible for large sets as it gets very slow. I want to use this for a set of at least hundreds of thousands of unexplored points. So it needs to be more effective. I do not need an optimal solution, a good approximation would do!

Could this somehow be done without the nested for-loops?

Or could somehow only the most likely points be chosen for evaluation?

All ideas will be highly appreciated!

import numpy as np

explored = np.random.rand(100,3)
unexplored = np.random.rand(100000,3)

def k_anchors(explored, unexplored, K):

    anchors = np.empty((K, unexplored.shape[1]))

    for j in range(K):
        proximity_sum = np.zeros((len(unexplored),))

        for k in range(len(unexplored)):
            temp_results = np.concatenate(( explored, unexplored[k].reshape((-1,3)) ))
            proximity = np.zeros((len( unexplored ),))

            for i in range(len( unexplored )):
                i_prox = (abs((unexplored[i,:] - temp_results))).sum(axis=1)
                proximity[i] = i_prox.min()

            proximity_sum[k] = proximity.sum()

        idx = np.argmin( proximity_sum )
        anchors[j,:] = unexplored[ idx ]
        unexplored = np.delete(unexplored, idx, 0)
        explored = np.concatenate(( explored, unexplored[ idx ] ))

    return anchors

print( k_anchors(explored, unexplored, 5) )

SOLUTION

The problem was solved with a variation on the K means algorithm as proposed by Barış Can Tayiz, and it worked like a charm.

In short, I initialized the explored points as centroids, along with K random points. Only the K random points were then varied upon fitting the data. For me, the number K did not need optimizing as I now every time the function is called how many points I will be able to explore.

Thanks to everyone who took precious time to discuss and answer this question!

Barış Can Tayiz :

You can use unsupervised learning algorithms for that purpose. For example, If you select k = 3 for k means, the closest points to centers must be explored. Selecting k is another problem. You can reach that looking this article https://medium.com/analytics-vidhya/how-to-determine-the-optimal-k-for-k-means-708505d204eb. You can use for Within-Cluster-Sum of Squared Errors (WSS) the difference of n+1th - nth / nth - n-1th. This ratio will give the best k while measuring the WSS.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=346237&siteId=1