Starfruit Python machine learning 6-kNN algorithm 1: Euclidean distance formula

My CSDN blog column: https://blog.csdn.net/yty_7

Github address: https://github.com/yot777/

 

Intuitive solution ideas for scatter charts: closer

Back to the previous section: Carambola Python Machine Learning 5, we finally got the following scatter plot:

The blue dots (labeled 1) seem to be concentrated in the lower left part of the figure , and the orange dots (labeled 0) seem to be concentrated in the upper right part of the figure .

Let's add two more points A and B to the scatterplot. Think about what their labels should be.

Students may have guessed the answer:

Point A is closer to the blue point , it should be label 1

Point B is closer to the orange point and should be label 0

Now that " closer " is just an intuitive feeling, is there a more scientific calculation? Of course there is! This is the Euclidean distance formula.

One-dimensional distance measurement

Let us first take a one-dimensional straight line as an example, as shown in the figure:

As shown in the figure, there are two points, A and B, on the one-dimensional line, with point A at position 0 and B at position 5. The distance between point A and point B is: | AB | = | 5-0 | = 5

Now add a point C at 2, which point is closer?

The answer is obvious. Calculate the distance between two points to know the result.

The distance from point C to point A is: | CA | = | 2-0 | = | 2 | = 2

The distance from point C to point B is: | CB | = | 2-5 | = | -3 | = 3 (absolute value)

| CA | <| CB |, so point C is closer to point A.

Generally, the x_{1}, x_{2} formula for the distance between two points in one dimension is |x_{1}-x_{2}|

Two-dimensional distance measurement

Now we will make the problem a bit more complicated, extending from a one-dimensional line to a two-dimensional plane, as shown in the figure:

The position coordinate of point A is (0, 0), and the position coordinate of point B is (3, 4). Now what is the distance between AB (that is, the length of the red line)?

Some students may have noticed it all at once: Isn't this the Pythagorean theorem? Hook three strands, four strings and five strings! Completely correct! The specific calculation is as follows:

 | AB| = \sqrt{\left ( x_{B}- x_{A} \right )^{2}+\left ( y_{B}- y_{A} \right )^{2}}=\sqrt{\left ( 3- 0 \right )^{2}+\left ( 4- 0 \right )^{2}}=\sqrt{25}=5

That is to say, the formula for the distance between two points on a two-dimensional plane is: the square of the difference of the abscissa, plus the square of the difference of the ordinate, and then the root sign can be used.

Now, we add a point C in the above figure, the position coordinates are (5, 2), and then find the distance from point C to points A and B (that is, the length of the blue and orange lines):

Substitute the coordinates of point A, point B, and point C into the formula of the distance between two points on the two-dimensional plane.

| CA|=\sqrt{\left ( x_{C}- x_{A} \right )^{2}+\left ( y_{C}- y_{A} \right )^{2}}=\sqrt{\left ( 5- 0 \right )^{2}+\left ( 2- 0 \right )^{2}}=\sqrt{29}

| CB|=\sqrt{\left ( x_{C}- x_{B} \right )^{2}+\left ( y_{C}- y_{B} \right )^{2}}=\sqrt{\left ( 5- 3 \right )^{2}+\left ( 2- 4 \right )^{2}}=\sqrt{8}

| CB | <| CA |, so point C is closer to point B. 

Three-dimensional distance measurement

Enter the formula directly. Interested students can draw and draw by themselves:

| AB| = \sqrt{\left ( x_{B}- x_{A} \right )^{2}+\left ( y_{B}- y_{A} \right )^{2}+\left ( z_{B}- z_{A} \right )^{2}}

It should be noted that the three-dimensional distance formula only adds a vertical dimension z on the basis of the two-dimensional distance formula

The distance between each dimension is still calculated as the square of the difference, and then the square of the difference of all dimensions is summed, and finally the square root is derived.

Don't think that 3D is the root of cubes!

n-dimensional distance measure: Euclidean distance formula

When the dimension exceeds three dimensions, it is difficult to show the distance in the form of intuitive drawing. How to calculate it?

In the first step, we need to abstract the coordinates of the 1D, 2D, 3D midpoints into a vector representation, ie

\boldsymbol{a}=\begin{pmatrix} a_{1}\\ a_{2}\\ a_{3}\\ ...\\ a_{n}\\ \end{pmatrix}   , \boldsymbol{b}=\begin{pmatrix} b_{1}\\ b_{2}\\ b_{3}\\ ...\\ b_{n}\\ \end{pmatrix}

Again, vectors are represented by lowercase letters and matrices are represented by uppercase letters !

Before the one-dimensional \boldsymbol{a}=\begin{pmatrix} a_{1}\end{pmatrix}=(0)example, \boldsymbol{b}=\begin{pmatrix} b_{1}\end{pmatrix}=(5), ,\boldsymbol{c}=\begin{pmatrix} c_{1}\end{pmatrix}=(2)

| AB| = \sqrt{\left ( a_{1}- b_{1} \right )^{2}}=\sqrt{\left ( 0- 5 \right )^{2}}=5

Before the two-dimensional \boldsymbol{a}=\begin{pmatrix} a_{1}\\ a_{2}\end{pmatrix}=\begin{pmatrix} 0\\ 0\end{pmatrix}example, \boldsymbol{b}=\begin{pmatrix} b_{1}\\ b_{2}\end{pmatrix}=\begin{pmatrix} 3\\ 4\end{pmatrix}, ,\boldsymbol{c}=\begin{pmatrix} c_{1}\\ c_{2}\end{pmatrix}=\begin{pmatrix} 5\\ 2\end{pmatrix}

| AB| = \sqrt{\left ( a_{1}- b_{1} \right )^{2}+\left ( a_{2}- b_{2} \right )^{2}}=\sqrt{\left ( 0- 3 \right )^{2}+\left ( 0- 4 \right )^{2}}=5

In three dimensions

| AB| = \sqrt{\left ( a_{1}- b_{1} \right )^{2}+\left ( a_{2}- b_{2} \right )^{2}+\left ( a_{3}- b_{3} \right )^{2}}

Generalized to n dimensions, the distance formula is:

| AB| = \sqrt{\left ( a_{1}- b_{1} \right )^{2}+\left ( a_{2}- b_{2} \right )^{2}+\left ( a_{3}- b_{3} \right )^{2}+...+\left ( a_{n}- b_{n} \right )^{2}}=\sqrt{\sum_{i=1}^{n}\left ( a_{i}- b_{i} \right )^{2}}

This is the Euclidean distance formula .

Back to the original classification problem in this section:

We can use the Euclidean distance formula to get the distance from point A to each point in the scatterplot.

After getting the distance? Please see the explanation in the next section.

 

to sum up

Intuitive solution to scatter plots: find closer points.

Euclidean distance formula:

| AB| = \sqrt{\left ( a_{1}- b_{1} \right )^{2}+\left ( a_{2}- b_{2} \right )^{2}+\left ( a_{3}- b_{3} \right )^{2}+...+\left ( a_{n}- b_{n} \right )^{2}}=\sqrt{\sum_{i=1}^{n}\left ( a_{i}- b_{i} \right )^{2}}

In one dimension:

|  AB |  = |  a - b |

In two dimensions:

| AB| = \sqrt{\left ( a_{x}- b_{x} \right )^{2}+\left ( a_{y}- b_{y} \right )^{2}}

In three dimensions:

| AB| = \sqrt{\left ( a_{x}- b_{x} \right )^{2}+\left ( a_{y}- b_{y} \right )^{2}+\left ( a_{z}- b_{z} \right )^{2}}

 

My CSDN blog column: https://blog.csdn.net/yty_7

Github address: https://github.com/yot777/

If you think this chapter is helpful to you, welcome to follow, comment and like! Github welcomes your Follow and Star!

Published 55 original articles · won praise 16 · views 6111

Guess you like

Origin blog.csdn.net/yty_7/article/details/105464008