Perceptron Learning Algorithm
1. Conditions
The sample set is linearly separable
2. Principle
Find a hyperplane/line to separate the two types of samples
h ( x i 1 , x i 2 , ⋯ , x i d ) = sign ( ∑ j = 1 d w j x i j − θ ) h(x_{i1},x_{i2},\cdots,x_{id}) = \text{sign}(\sum\limits_{j=1}^d w_jx_{ij} -\theta) h(xi 1,xi2,⋯,xid)=sign(j=1∑dwjxij−θ) , i = 1 , 2 , ⋯ , n i=1,2,\cdots,n i=1,2,⋯,n
w j w_j wjCan be regarded as the weight of biological neurons, xij x_{ij}xijCan be regarded as the stimulation of biological neurons, θ \thetaθ is the threshold. When∑ j = 1 dwjxij > θ \sum\limits_{j=1}^d w_jx_{ij}>\thetaj=1∑dwjxij>Neurons are excited at θ , ∑ j = 1 dwjxij < θ \sum\limits_{j=1}^d w_jx_{ij} <\thetaj=1∑dwjxij<Neuron inhibition at theta . Therefore, the method is called the perceptron learning algorithm
Order xi 0 = 1 x_{i0}=1xand 0=1 , w 0 = θ w_0=\theta w0=θ , i = 1 , 2 , ⋯ , n i = 1,2,\cdots,n i=1,2,⋯,n 即 x ⃗ i = [ 1 x i 1 x i 2 ⋯ x i n ] T \vec x_i = \begin{bmatrix} 1&x_{i1}&x_{i2}&\cdots&x_{in} \end{bmatrix}^T xi=[1xi 1xi2⋯xin]T , w ⃗ = [ θ w 1 w 2 ⋯ w n ] T \vec w = \begin{bmatrix} \theta&w_1&w_2&\cdots&w_n \end{bmatrix}^T w=[iw1w2⋯wn]T 则 h ( x ⃗ i ) = sign ( w ⃗ T ⋅ x ⃗ i ) h(\vec x_i)=\text{sign}(\vec w^T\cdot\vec x_i) h(xi)=sign(wT⋅xi)
-
Construct loss function
- L ( h ) = ∑ i = 1 n I ( h ( x ⃗ i ) ≠ y i ) L(h) = \sum\limits_{i=1}^n\mathbb{I}(h(\vec x_i) \neq y_i) L(h)=i=1∑nI(h(xi)=yi)
is the number of misclassified samples under the current assumption, but this function is not continuous, and it is difficult to find the optimal value by mathematical methods - L ( w ⃗ ) = − ∑ x ⃗ ∈ y w ⃗ T ⋅ x ⃗ L(\vec w) = -\sum\limits_{\vec x\in}y\vec w^T \cdot\vec x L(w)=−x∈∑ywT⋅x
When the sample is misclassified, yyy 与 w ⃗ T ⋅ x ⃗ \vec w^T\cdot\vec x wT⋅xdifferent sign
- L ( h ) = ∑ i = 1 n I ( h ( x ⃗ i ) ≠ y i ) L(h) = \sum\limits_{i=1}^n\mathbb{I}(h(\vec x_i) \neq y_i) L(h)=i=1∑nI(h(xi)=yi)
-
Find the corresponding hypothesis hh when the loss function takes the minimum value of 0h