Introduction to Neural Networks (2)

Continuing from the last book, in this section we introduce how the computer finds the appropriate linear equation, that is, how does the computer learn?

Let's look at a simple example. As shown in the figure below, there are three blue dots and three red dots. Look for a straight line that distinguishes these points. For a computer, it may randomly select a linear equation from a certain position (as shown in the right picture below). This straight line spatially divides the entire sample into two areas-the blue area and the red area. It can be seen that the classification effect of this line is relatively poor, so we need to move it, that is, make this line closer to the two misclassified points in the figure below (the red point in the blue area and the blue point in the red area) ), so as to make the classification result better and better.

The following will teach you how to make the straight line closer to the target point. As shown in the figure below, suppose the equation of the straight line is 3x_ {1} + 4x_ {2} -10 = 0, the red point in the figure is the wrong point, and its coordinate is \left ( 4,5 \right ). To make the straight line close to the red point, we can do this: extract the coefficients of the straight line equation, and at the same time add offset unit 1 to the red point coordinates, change it \left ( 4,5 ,1\right ), and then subtract correspondingly (as shown in the figure below), and use the obtained value as a straight line The coefficients of the equation are sufficient.

          

but! The linear equation obtained in this way moves larger, which may make the classification result worse when there are more data points, so we hope that the straight line will move slightly to the red dot. So the concept of Learning Rate is introduced. From the above analysis, it can be seen that the learning rate should be a small number. Here, assuming that the learning rate is 0.1, then 0.1 is multiplied by the red dot coordinate value of the offset unit, and then subtracted (as shown in the figure below). Get the new straight line equation 2.6x_{1}+3.5x_{2}-10.1=0. At this point, you will be surprised to find that the straight line is closer to the red dot of the misclassification! ! ! That's right, it's that simple and magical.

Similarly, if there is a blue point \left ( 1,1 \right )in the red area, you can also make the straight line closer to this point as described above. But pay attention to adding, not subtracting, when seeking the new linear equation parameters. Please keep this method in mind, you will use it frequently in the future!

      

Summarize the above knowledge points (the pseudo code is shown in the figure below). For n-dimensional data, the weights and biases are randomly assigned at first, and then the results of all points are calculated. We perform the following operations for the misclassified points:

1. The points predicted to be 0 are the blue points that are assigned to the red area. We take the current weight plus the learning rate multiplied by the coordinates of the point as the new weight value, and add the current bias to the learning rate as the new bias.

2. The point predicted to be 1, that is, the red point assigned to the blue area. We take the current weight minus the learning rate multiplied by the coordinates of the point as the new weight value, and the current bias minus the learning rate as the new bias.

Practice below. Use the perceptron algorithm to classify the data below.

0.78051,-0.063669,1
0.28774,0.29139,1
0.40714,0.17878,1
0.2923,0.4217,1
0.50922,0.35256,1
0.27785,0.10802,1
0.27527,0.33223,1
0.43999,0.31245,1
0.33557,0.42984,1
0.23448,0.24986,1
0.0084492,0.13658,1
0.12419,0.33595,1
0.25644,0.42624,1
0.4591,0.40426,1
0.44547,0.45117,1
0.42218,0.20118,1
0.49563,0.21445,1
0.30848,0.24306,1
0.39707,0.44438,1
0.32945,0.39217,1
0.40739,0.40271,1
0.3106,0.50702,1
0.49638,0.45384,1
0.10073,0.32053,1
0.69907,0.37307,1
0.29767,0.69648,1
0.15099,0.57341,1
0.16427,0.27759,1
0.33259,0.055964,1
0.53741,0.28637,1
0.19503,0.36879,1
0.40278,0.035148,1
0.21296,0.55169,1
0.48447,0.56991,1
0.25476,0.34596,1
0.21726,0.28641,1
0.67078,0.46538,1
0.3815,0.4622,1
0.53838,0.32774,1
0.4849,0.26071,1
0.37095,0.38809,1
0.54527,0.63911,1
0.32149,0.12007,1
0.42216,0.61666,1
0.10194,0.060408,1
0.15254,0.2168,1
0.45558,0.43769,1
0.28488,0.52142,1
0.27633,0.21264,1
0.39748,0.31902,1
0.5533,1,0
0.44274,0.59205,0
0.85176,0.6612,0
0.60436,0.86605,0
0.68243,0.48301,0
1,0.76815,0
0.72989,0.8107,0
0.67377,0.77975,0
0.78761,0.58177,0
0.71442,0.7668,0
0.49379,0.54226,0
0.78974,0.74233,0
0.67905,0.60921,0
0.6642,0.72519,0
0.79396,0.56789,0
0.70758,0.76022,0
0.59421,0.61857,0
0.49364,0.56224,0
0.77707,0.35025,0
0.79785,0.76921,0
0.70876,0.96764,0
0.69176,0.60865,0
0.66408,0.92075,0
0.65973,0.66666,0
0.64574,0.56845,0
0.89639,0.7085,0
0.85476,0.63167,0
0.62091,0.80424,0
0.79057,0.56108,0
0.58935,0.71582,0
0.56846,0.7406,0
0.65912,0.71548,0
0.70938,0.74041,0
0.59154,0.62927,0
0.45829,0.4641,0
0.79982,0.74847,0
0.60974,0.54757,0
0.68127,0.86985,0
0.76694,0.64736,0
0.69048,0.83058,0
0.68122,0.96541,0
0.73229,0.64245,0
0.76145,0.60138,0
0.58985,0.86955,0
0.73145,0.74516,0
0.77029,0.7014,0
0.73156,0.71782,0
0.44556,0.57991,0
0.85275,0.85987,0
0.51912,0.62359,0

The program code is as follows:

import numpy as np
# Setting the random seed, feel free to change it and see different solutions.
np.random.seed(42)

def stepFunction(t):
    if t >= 0:
        return 1
    return 0

def prediction(X, W, b):
    return stepFunction((np.matmul(X,W)+b)[0])

# TODO: Fill in the code below to implement the perceptron trick.
# The function should receive as inputs the data X, the labels y,
# the weights W (as an array), and the bias b,
# update the weights and bias W, b, according to the perceptron algorithm,
# and return W and b.
def perceptronStep(X, y, W, b, learn_rate = 0.01):
    # Fill in code
    for i in range(len(X)):
        y_hat = prediction(X[i],W,b)
        if y[i]-y_hat == 1:
            W[0] += X[i][0]*learn_rate
            W[1] += X[i][1]*learn_rate
            b += learn_rate
        elif y[i]-y_hat == -1:
            W[0] -= X[i][0]*learn_rate
            W[1] -= X[i][1]*learn_rate
            b -= learn_rate
    return W, b
    
# This function runs the perceptron algorithm repeatedly on the dataset,
# and returns a few of the boundary lines obtained in the iterations,
# for plotting purposes.
# Feel free to play with the learning rate and the num_epochs,
# and see your results plotted below.
def trainPerceptronAlgorithm(X, y, learn_rate = 0.01, num_epochs = 25):
    x_min, x_max = min(X.T[0]), max(X.T[0])
    y_min, y_max = min(X.T[1]), max(X.T[1])
    W = np.array(np.random.rand(2,1))
    b = np.random.rand(1)[0] + x_max
    # These are the solution lines that get plotted below.
    boundary_lines = []
    for i in range(num_epochs):
        # In each epoch, we apply the perceptron step.
        W, b = perceptronStep(X, y, W, b, learn_rate)
        boundary_lines.append((-W[0]/W[1], -b/W[1]))
    return boundary_lines

Experimental results:

The green dashed line is the straight line equation updated each time, and the black line is the final straight line equation.

 

This article mainly introduces the classification of linear data , so how to classify non-linear data ? See you in the next section.

Guess you like

Origin blog.csdn.net/ting_qifengl/article/details/100577181