Perceptron algorithm and BP neural network

Description: perceptron has been proposed in 1957, it is arguably one of the oldest methods of classification. It is the originator of many algorithms, such as BP neural network. Although today it seems in many classification model several times generalization ability is not strong, but it's the principle of it is worth a good study. Perceptron algorithm to learn, there will be a great help after learning neural network, the depth of learning.

First, Perceptron Model

(1), a hyperplane defined

Order W . 1 , W 2 , ... W n- , V are real (R), wherein at least one W I is not zero, all satisfy the linear equation W . 1 * X . 1 + W 2 * X 2 + .. . + W n- * X n- = V

Point = X-[X . 1 , X 2 , ... X n- ] group consisting of a called space R hyperplane.

It can be seen from the definition: the hyperplane is a collection point. Set a point X, and the vector = W [W . 1 , W 2 , ... W n- ] inner product is equal to v

Special, if so v is equal to 0, the training set for a point X:

X = W * W . 1 * X . 1 + W 2 * X 2 + ... + W n- * X n- > 0, will be marked as a Class X

X = W * W . 1 * X . 1 + W 2 * X 2 + ... + W n- * X n- <0, the X-mark is another

(2), the linear data set can be divided into

For the data set {= T (X- . 1 , Y . 1 ), (X- 2 , Y 2 ) ... (X- N , Y N )}, X- I  Belongs to R & lt n- , Y I  Belongs to {-1,}. 1 , i = 1,2, ... N

If the presence of a hyperplane S: w * X = 0

The correct classification dataset all sample points, called T linearly separable sets of data.

The so-called correctly classified, that is: * if X-W I > 0, then the sample points (X- I , Y I ) is Y I  is equal to 1

If the X-* W I <0, then the sample points (X- I , Y I ) is Y I  is equal to -1

Thus, given the hyperplane w * X = 0, the data set for any one point T (X- I , Y I ), both Y I (W * X- I )> 0, so that all the sample points T are correctly classified.

If there is a point (X- I , Y I ), such that Y I (w * X I ) <0, called the hyperplane w * X Category failure point, the point is a point of misclassification.

(3), Perceptron Model

f (X) = sign (w * X + b), where sign is the signum function.

Perceptron model, corresponding to a hyperplane w * X + b = 0, the parameter is hyperplane (w, b), w is the hyperplane normal vector, b is the intercept hyperplane.

Our goal is to find a (w, b), all the sample points can be linearly separable sets of data T is correctly classified into two categories.

 

Second, Perceptron strategy

Key policy is to define loss function, i.e., are capable of such constructs a minimum loss function structure

 

Third, Perceptron algorithm

Input to the algorithm is m samples, each sample corresponding to the n-dimensional feature category and a binary output 1 or -1, as follows: (x (0) 1, x (0) 2, ... x (0) n , y0), (x (1) 1, x (1) 2, ... x (1) n, y1), ... (x (m) 1, x (m) 2, ... x ( m) n, ym) (x1 (0), x2 (0), ... xn (0), y0), (x1 (1), x2 (1), ... xn (1), y1), ... (x1 (m), x2 (m), ... xn (m), ym)

    Separating hyperplane output model coefficient vector θ

    Step algorithm is as follows:

    (1) define all x0x0 1. Selects the initial value and the initial value of the step size α θ vector. vector θ can be set to zero-vector is set to 1 in steps. It should be noted that, since the solution is not unique machine perception, both the initial use will affect the final iteration θ vector results.

    (2) Select a training set which misclassified points (x (i) 1, x (i) 2, ... x (i) n, yi) (x1 (i), x2 (i), .. .xn (i), yi), i.e. expressed by a vector (x (i), y (i)) (x (i), y (i)), this point should satisfy: y (i) θ ∙ x (i ) ≤0y (i) θ ∙ x (i) ≤0

    (3) iterating a random vector [theta] gradient descent: θ = θ + αy (i) x (i) θ = θ + αy (i) x (i)

    (4) Check the training set in if there misclassification of points, if not, the algorithm ends, at this time of θ vector is the final result. If so, proceed to step 2.

 

Fourth, with Perceptron Perceptron neural network code implementation

net=newp([0 2],1);
inputweights=net.inputweights{1,1};
biases=net.biases{1};

net=newp([-2 2;-2 2],1);
net.IW{1,1}=[-1 1];
net.IW{1,1}
net.b{1}=1;
net.b{1}
p1=[1;1],a1=sim (t, p1) 
p2 = [ 1 - 1 ], a2 = sim (t, p2) 
p3 = {[ 1 ; 1 ] [ 1 - 1 ]}, a3 = sim (net, p3) 
p4 = [ 1  1 ; 1 - 1 ], a4 = sim (net, p4) 
net.IW { 1 , 1 } = [ 3 , 4 ]; 
net.b { 1 } = [ 1 ]; 
a1 = sim (t, p1) 

not = init (net); 
wts = net.IW { 1,1}
bias=net.b{1}
net.inputweights{1,1}.initFcn='rands';
net.biases{1}.initFcn='rands';
net=init(net);
bias=net.b{1}
wts=net.IW{1,1}
a1=sim(net,p1)

net=newp([-2 2;-2 2],1);
net.b{1}=[0];
w=[1 -0.8]
net.IW{1,1}=w;
p=[1;2];
t=[1];
a=sim(net,p)
e=t-a
help learnp
dw=learnp(w,p,[],[],[],[],e,[],[],[],[],[])
w=w+dw
net.IW{1,1}=w;
a=sim(net,p)


P=[-0.5 1 0.5 -0.1;-0.5 1 -0.5 1];
T=[1 1 0 1]
net=newp([-1 1;-1 1],1);
plotpv(P,T);
plotpc(net.IW{1,1},net.b{1});
%hold on;
%plotpv(P,T);
net=adapt(net,P,T);
net.IW{1,1}
net.b{1}
plotpv(P,T);
plotpc(net.IW{1,1},net.b{1})
net.adaptParam.passes=3;
net=adapt(net,P,T);
net.IW{1,1}
net.b{1}
plotpc(net.IW{1},net.b{1})
net.adaptParam.passes=6;
net=adapt(net,P,T)
net.IW{1,1}
net.b{1}
plotpv(P,T);
plotpc(net.IW{1},net.b{1})

plotpc(net.IW{1},net.b{1})
a=sim(net,p);
plotpv(p,a)

p=[0.7;1.2]
a=sim(net,p);
plotpv(p,a);
hold on;
plotpv(P,T);
plotpc(net.IW{1},net.b{1})

P=[-0.5 -0.5 0.3 -0.1 -40;-0.5 0.5 -0.5 1.0 50]
T=[1 1 0 0 1];
net=newp([-40 1;-1 50],1);
plotpv(P,T);
hold on;
linehandle=plotpc(net.IW{1},net.b{1});
E=1;
net.adaptParam.passes=3;
while (sse(E))
    [net,Y,E]=adapt(net,P,T);
    linehandle=plotpc(net.IW{1},net.b{1},linehandle);
    drawnow;
end;
axis([-2 2 -2 2]);
net.IW{1}
net.b{1}
net=init(net);
net.adaptParam.passes=3;
net=adapt(net,P,T);
plotpc(net.IW{1},net.b{1});
axis([-2 2 -2 2]);
net.IW{1}
net.b{1}

net=newp([-40 1;-1 50],1,'hardlim','learnpn');
plotpv (P, T); 
linehandle = plotpc (net.IW { 1 }, net.b { 1 }); 
e = 1 ; 
net.adaptParam.passes = 3 ; 
not = init (net); 
linehandle = plotpc (net.IW { 1 }, net.b { 1 });
while (sse (e)) 
[no, Y, e] = adapt (no, P, T); 
linehandle = plotpc (net.IW { 1 }, net.b { 1 }, linehandle); 
end; 
axis ([ - 2  2 - 2  2 ]); 
net.IW { 1}
net.b{1}

net=newp([-40 1;-1 50],1);
net.trainParam.epochs=30;
net=train(net,P,T);
pause;
linehandle=plotpc(net.IW{1},net.b{1});
hold on;
plotpv(P,T);
linehandle=plotpc(net.IW{1},net.b{1});
axis([-2 2 -2 2]);

p=[1.0 1.2 2.0 -0.8; 2.0 0.9 -0.5 0.7]
t=[1 1 0 1;0 1 1 0]
plotpv(p,t);
hold on;
net=newp([-0.8 1.2; -0.5 2.0],2);
linehandle=plotpc(net.IW{1},net.b{1});
net=newp([-0.8 1.2; -0.5 2.0],2);
linehandle= plotpc (net.IW { 1 }, net.b { 1 }); 
e = 1 ; 
not = init (net);
while (sse (e)) 
[no, y, e] = adapt (not, p, t); 
linehandle = plotpc (net.IW { 1 }, net.b { 1 }, linehandle); 
drawNow; 
end;

 

matlab operating results:

                                                                                                                             figure 1

 

Five, BP neural network

  (1) The basic idea

After the neural network is also known as BP propagation learning feedforward neural network, it is a typical neural network. Is a back-propagation learning algorithm embodied BP training process, the process requires a supervised learning; feedforward network is a structure embodied as a BP network architecture, FIG. 2 is a typical feedforward the neural network. this neural network structure clear, easy to use, and efficiency is very high, so it has been widely appreciated and applied. Backpropagation algorithm iteratively processing continuous adjustment element connected to the neural networks weight of the final output such that the minimum error and expected results. Widely used classification system, he also includes training and the use of two stages.

                                                                               figure 2

 

(2) algorithmic process

BP neural network algorithm training phase flowchart and pseudocode shown below:

                                                                             图:3

步骤一、初始化网络权重

步骤二、向前传播输入(前馈型网络)

步骤三、反向误差传播

步骤四 、网络权重与神经元偏置调整

步骤五、判断结束

 

(3)BP神经网络 代码实现

% BP网络
net=newff([-1 2;0 5],[3,1],{'tansig','purelin'},'traingd')
net.IW{1}
net.b{1}

p=[1;2];
a=sim(net,p)
net=init(net);
net.IW{1}
net.b{1}
a=sim(net,p)
%net.IW{1}*p+net.b{1}
p2=net.IW{1}*p+net.b{1}
a2=sign(p2)
a3=tansig(a2)
a4=purelin(a3)
net.b{2}
net.b{1}

net.IW{1}
net.IW{2}
0.7616+net.b{2}
a-net.b{2}
(a-net.b{2})/ 0.7616
help purelin

p1=[0;0];
a5=sim(net,p1)
net.b{2}
net=newff([-1 2;0 5],[3,1],{'tansig','purelin'},'traingd')
net.IW{1}
net.b{1}
%p=[1;];
p=[1;2];
a=sim(net,p)
net=init(net);
net.IW{1}
net.b{1}
a=sim(net,p)
net.IW{1}*p+net.b{1}
p2=net.IW{1}*p+net.b{1}
a2=sign(p2)
a3=tansig(a2)
a4=purelin(a3)
net.b{2}
net.b{1}

P=[1.2;3;0.5;1.6]
W=[0.3 0.6 0.1 0.8]
net1=newp([0 2;0 2;0 2;0 2],1,'purelin');
net2=newp([0 2;0 2;0 2;0 2],1,'logsig');
net3=newp([0 2;0 2;0 2;0 2],1,'tansig');
net4=newp([0 2;0 2;0 2;0 2],1,'hardlim');

net1.IW{1}
net2.IW{1}
net3.IW{1}
net4.IW{1}
net1.b{1}
net2.b{1}
net3.b{1}
net4.b{1}
net1.IW{1}=W;
net2.IW{1}=W;
net3.IW{1}=W;
net4.IW{1}=W;
a1=sim(net1,P)
a2=sim(net2,P)
a3=sim(net3,P)
a4=sim(net4,P)
init(net1);
net1.b{1}
help tansig
p=[-0.1 0.5]
t=[-0.3 0.4]
w_range=-2:0.4:2;
b_range=-2:0.4:2;

ES=errsurf(p,t,w_range,b_range,'logsig');
pause(0.5);
hold off;
net=newp([-2,2],1,'logsig');
net.trainparam.epochs=100;
net.trainparam.goal=0.001;
figure(2);
[net,tr]=train(net,p,t);
title('动态逼近')
wight=net.iw{1}
bias=net.b
pause;
close;
p=[-0.2 0.2 0.3 0.4]
t=[-0.9 -0.2 1.2 2.0]
h1=figure(1);
net=newff([-2,2],[5,1],{'tansig','purelin'},'trainlm');
net.trainparam.epochs=100;
net.trainparam.goal=0.0001;
net=train(net,p,t);
a1=sim(net,p)
pause;
h2=figure(2);
plot(p,t,'*');
title('样本')
title('样本');
xlabel('Input');
ylabel('Output');
pause;
hold on;
ptest1=[0.2 0.1]
ptest2=[0.2 0.1 0.9]
a1=sim(net,ptest1);
a2=sim(net,ptest2);

net.iw{1}
net.iw{2}
net.b{1}
net.b{2}

 

matlab运行结果:

                                                                                                  图:4

 

Guess you like

Origin www.cnblogs.com/twzh123456/p/11611878.html