Machine learning algorithms Summary 2: Perceptron

Perceptron put forward in 1957 by the Rosenblatt, a neural network and support vector machine -based. Model Machine perceptual linear classification second category classification, the input feature vector instance, is an example of the output classes, taking two values +1 and -1, the discriminant model belongs to the model, aims to obtain training data for the divided linear separating hyperplane .
1. Model :
hypothetical data set satisfies linear separability from the input space to the output space as the decision function:
Here Insert Picture Description
w is the weight (or weight vectors), b bias, w · x represents an inner product of w and x , sign is the sign function, namely:
Here Insert Picture Description
1. inside vector product (dot product, product number): the corresponding position of the two vectors eleven multiplication result of the summation point multiplication result is a scalar;
2. vector outer product (vector cross product, the vector product): the result is a vector cross product operation, and the coordinate plane perpendicular to the outer product of these two vectors composed of two vectors.
Reference Bowen
matrix multiplication of three kinds: reference Bowen
Here Insert Picture Description
2. Strategy :
perceived loss function machine is all misclassified points to the total distance of the hyperplane S.
Here Insert Picture Description
Norm : measure a spatial vector (or matrix) of each vector length or size.
(1) L0 norm: The number of nonzero vector element;
(2) Ll norm: vector absolute value sum of the elements;
(. 3) L2 of norm: square value of each element of the vector sum of squares. Reference Bowen
above formula is not a continuous parameter w b and differentiable function, deduced through the loss of function of the perceptron as follows:
Here Insert Picture Description
Wherein M is a set of misclassified points.
Algorithm 3 :
optimization method is gradient descent, perceptron algorithm has two forms: the original form and dual form .
(1) original form
using a gradient descent method constantly minimized objective function, note: not a minimal process gradient so that all points M of misclassification decreased, but a randomly selected point so that a gradient descent misclassification.
Here Insert Picture Description
(2) The dual form
Here Insert Picture Description
of training instances in the form of dual form occurs only within the product, so that the inner product between the training set of examples is calculated and stored in a matrix, this matrix is the Gram matrix:
Here Insert Picture Description

Released five original articles · won praise 3 · Views 167

Guess you like

Origin blog.csdn.net/qq_35946628/article/details/104353527