Table of contents
This article was originally created by CSDN Dianyunxia, and the original text link . If you didn't see this article in Dianyunxia's blog, then this is a shameless crawler.
1. Algorithm principle
MiniBatchKMeans
is KMeans
a variant of the algorithm that uses mini-batches to reduce computation time, while these multiple batches still try to optimize the same objective function. A mini-batch is a subset of the input data, sampled randomly at each training iteration. These mini-batches greatly reduce the amount of computation required to converge to a local solution. Unlike other Kmeans
algorithms that reduce convergence time, mini-batches Kmeans
typically yield results that are only slightly worse than standard algorithms.
The algorithm iterates between two steps. In the first step, a random sample is drawn from the dataset to form a mini-batch. They are then assigned to the nearest centroid. In the second step, the centroids are updated. Unlike Kmeans
, this variant algorithm is based on a per-sample basis. For each sample in the mini-batch, the assigned centroid is updated by taking the stream average of the sample and all previous samples assigned to that centroid. This has over time