Algorithm study notes understanding

clustering

K-means algorithm

A type of clustering algorithm.

Clustering : When given a set of data, group similar data into groups. In clustering, each group is called a "cluster".

The similarity varies according to the algorithm, depending on the distance between data points, coordinates, etc.

The K-means algorithm classifies data by the distance from each cluster center point

First, prepare the data to be clustered. Then, confirm the number of clusters.

The characteristic of the K-means algorithm is to determine the number of clusters in advance.

This time we will use 3 clusters. Randomly set 3 points as the center points of the clusters.

 

Calculate and determine the center point of the closest cluster from each data.

Each data is classified into certain clusters.

Computes the center of gravity of each clustered data and moves the cluster's center point there.

As the center point moves, the center point  closest to each data will change .

Calculate the center point of the closest cluster again and classify each data into a cluster.

Repeat " classify each data into clusters " and " move to the center of gravity of the center point " until the center point converges.

 

 

The center point has converged and the operation ends.

It is mathematically proven that when the operation is repeated, the center point will converge somewhere.

This completes the clustering.

We can see that the data points have been properly grouped with other similar points.

Let's see what happens when we perform K-means on the same data with a cluster size of 2.

Randomly set the center point.

 Repeat " classify each data into clusters " and " move to the center of gravity of the center point " until the center point converges.

This time, the two data blocks located on the left and bottom are classified as one cluster.

Since the K-means method must determine the number of clusters in advance, it may not get significant results when the number of settings is inappropriate.

There are several ways to estimate an appropriate number of clusters, such as analyzing the data beforehand, or trying the K-means algorithm by changing the number of clusters several times.

Next, using the same data, changing the location of the cluster centroids, and trying to perform K-means what happens.

  Repeat " classify each data into clusters " and " move to the center of gravity of the center point " until the center point converges.

The center point has converged.

 Unlike before, the two data blocks in the upper and lower right corners are classified as a cluster.

We can see that the K-means method has the characteristic that the cluster structure is different according to the position of the first center point which is randomly set.

list search

linear search

 An algorithm for searching tuples from an array.

Let's explore the number 6. 

First, check the leftmost number in the array.

 Compare with 6, if it matches, the search ends, if not, check one digit to the right.

Repeat the comparison until 6 is found.

We found 6, so the search ends.

We can see that a linear search is an easy way to repeat the comparison sequentially from the beginning.

When the amount of data is large, the number of comparisons increases, which takes time.

binary search 

 An algorithm for searching an element from a sorted array.

Let's search for the number 6.

First, check the number in the center of the array. At this time it was 5.

Compare 5 with the 6 we will be searching for, and since 5 is less than 6, we can see that 6 is to the right of 5.

 From the candidates, remove numbers that are no longer needed.

Check the number in the center of the remaining array, this time it is 7. 

Comparing 7 and 6, since 6 is smaller than 7, we can see that 6 is to the left of 7.

From the candidates, remove numbers that are no longer needed.

 

Check the number in the center of the remaining array, this time it is 6.

 6=6, found the number.

Binary search utilizes sorted arrays to efficiently search for numbers by halving the search number.

 

 

 

 

Guess you like

Origin blog.csdn.net/m0_62110645/article/details/129880998