Manual implementation of multi-layer perceptron (BPNet) based on weka

1. BP network

1.1 Single layer perceptron

A single-layer perceptron has only one layer of neurons, and its model structure is as follows 1 :

insert image description here

For weight wwTo update w , we use the following formula:

wi = wi + Δ wi Δ wi = η ( y − y ^ ) xi (1) w_i=w_i+\Delta w_i \\ \Delta w_i=\eta(y-\hat{y})x_i\tag{1}wi=wi+ΔwiΔwi=h ( yy^)xi(1)

Among them, yyy is the label,y ^ \hat{y}y^is the model output value, η \etaη is the learning rate,xi x_ixiinput for the model.

In this way, we realize the parameter adjustment of a layer of perceptron.

However, since the single-layer perceptron is linear, its fitting ability is very limited, and it cannot handle nonlinearly separable problems (such as XOR).

1.2 Multilayer Perceptron

If it is a multilayer perceptron, parameter adjustment is more complicated, and an error backpropagation algorithm is required. The network structure is shown in the figure. Here we assume that we only consider the neural network with one hidden layer:

insert image description here

The specific parameters of the network are shown in the figure above.

Its error on the kth sample is:

E k = 1 2 ∑ j = 1 l ( y ^ j k − y j k ) E_k=\frac{1}{2}\sum_{j=1}^l(\hat{y}_j^k-y_j^k) Ek=21j=1l(y^jkyjk)

For any parameter update:
v = v + Δ vv=v+\Delta vv=v+v _

For the parameter www

Δ whj = − η ∂ E k ∂ whj \Delta w_{hj}=-\eta \frac{\partial E_k}{\partial w_{hj}}Δwhj=hwhjEk

∂ E k ∂ w h j = ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j ⋅ ∂ β j ∂ w h j \frac{\partial E_k}{\partial w_{hj}}=\frac{\partial E_k}{\partial \hat{y}_j^k} \cdot \frac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \frac{\partial \beta_j}{\partial w_{hj}} whjEk=y^jkEkβjy^jkwhjβj

β i \beta_i biis the input of the output layer, according to its definition,

∂ β j ∂ w h j = b h \frac{\partial \beta_j}{\partial w_{hj}}=b_h whjβj=bh

where bh b_hbhis the output of the hidden layer.

Because each layer has to go through an activation function, here we use the sigmoid function, which has a very good property:

f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f'(x)=f(x)(1-f(x)) f(x)=f(x)(1f(x))

So there is:

g j = − ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j = − ( y ^ j k − y j k ) f ′ ( β j − θ j ) = y ^ j k ( 1 − y ^ j k ) ( y j k − y ^ j k ) \begin{aligned} g_j=& -\frac{\partial E_k}{\partial \hat{y}_j^k} \cdot \frac{\partial \hat{y}_j^k}{\partial \beta_j} \\ =&-(\hat{y}_j^k-y_j^k)f'(\beta_j-\theta_j)\\ =&\hat{y}_j^k(1-\hat{y}_j^k)(y_j^k-\hat{y}_j^k) \end{aligned} gj===y^jkEkβjy^jk(y^jkyjk)f (bjij)y^jk(1y^jk)(yjky^jk)

Among them, θ j \theta_jijis the bias of the jth output neuron.

In this way, we get wwThe update amount of w ( note: there is no need to add a negative sign here, because we are already in gj g_jgjadded in ),

Δ w h j = η g j b h \Delta w_{hj}=\eta g_j b_h Δwhj=the gjbh

Δ θ j = − η gj Δ vih = η ehxi Δ γ h = − η eh \Delta \theta_j=-\eta g_j\\ \Delta v_{ih}=\eta e_h x_i \\ \Delta \gamma_h=-\ eta e_hD ij=the gjv _ih=ηehxiD ch=ηeh

In the BP algorithm, assuming we only have a 3-layer network, one input layer, one hidden layer, and one output layer, then the number of parameters is d × q + q × l + q + ld\times q+q\times l+q+ld×q+q×l+q+l (where d, q, and l are the number of neurons in the input layer, hidden layer, and output layer respectively), we can iteratively calculate the parameters according to the above formula.

According to different input data, how to set the number of neurons in the hidden layer? I used an empirical formula here, referring to the empirical formula for the number of nodes in the bp hidden layer .

The text points out:

q = ( d + l ) × 2 3 q=(d+l)\times \frac{2}{3} q=(d+l)×32

insert image description here

2. Implementation based on weka code

package weka.classifiers.myf;

import weka.classifiers.Classifier;
import weka.core.*;
import weka.core.matrix.Matrix;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;

/**
 * @author YFMan
 * @Description 自定义的 BP 神经网络 分类器
 * @Date 2023/6/7 11:30
 */
public class myBPNet extends Classifier {
    
    
    /*
     * @Author YFMan
     * @Description //主函数
     * @Date 2023/6/7 19:14
     * @Param [argv 命令行参数]
     * @return void
     **/
    public static void main(String[] argv) {
    
    
        runClassifier(new myBPNet(), argv);
    }

    // 训练数据
    private Instances m_instances;

    // 当前正在训练的样例
    private Instance m_currentInstance;

    // 训练总次数
    private final int m_numEpochs;

    // 二值化过滤器,将标称属性转换为二值属性,进而输入到神经网络中
    private final NominalToBinary m_nominalToBinaryFilter;

    // 学习率
    private final double m_learningRate;

    // 当前的训练误差
    private double m_error;

    // 输入维度(输入层神经元数量)
    private int m_numInputs;

    // 隐藏层维度(隐藏层神经元数量)
    private int m_numHidden;

    // 输出维度(输出层神经元数量)
    private int m_numOutputs;

    // 输入层的输出
    private Matrix m_input;

    // 输入层到隐藏层的权重矩阵
    private Matrix m_weightsInputHidden;

    // 隐藏层到输出层的权重矩阵
    private Matrix m_weightsHiddenOutput;

    // 隐藏层的阈值
    private Matrix m_hiddenThresholds;

    // 输出层的阈值
    private Matrix m_outputThresholds;

    // 隐藏层的输出
    private Matrix m_hiddenOutput;

    // 输出层的输出
    private Matrix m_output;

    // 输出层对应的标签
    private Matrix m_labels;

    /*
     * @Author YFMan
     * @Description // BP 网络 构造函数
     * @Date 2023/6/7 19:04
     * @Param []
     * @return
     **/
    public myBPNet() {
    
    
        // 训练数据
        m_instances = null;
        // 当前样例
        m_currentInstance = null;
        // 训练误差
        m_error = 0;
        // 二值化过滤器,将标称属性转换为二值属性
        m_nominalToBinaryFilter = new NominalToBinary();
        // 训练总次数
        m_numEpochs = 500;
        // 学习率
        m_learningRate = .3;
    }

    /*
     * @Author YFMan
     * @Description //Sigmoid函数
     * @Date 2023/6/7 19:53
     * @Param [x]
     * @return double
     **/
    double sigmoid(double x) {
    
    
        return 1.0 / (1.0 + Math.exp(-x));
    }

    /*
     * @Author YFMan
     * @Description //前向传播
     * @Date 2023/6/7 19:47
     * @Param []
     * @return void
     **/
    void forwardPropagation() {
    
    
        // 初始化输入层的输出
        for (int i = 0; i < m_numInputs; i++) {
    
    
            m_input.set(0, i, m_currentInstance.value(i));
        }

        // 计算隐藏层的输出
        m_hiddenOutput = m_input.times(m_weightsInputHidden);
        // 减去阈值
        m_hiddenOutput = m_hiddenOutput.minus(m_hiddenThresholds);
        // 激活函数
        for (int i = 0; i < m_numHidden; i++) {
    
    
            m_hiddenOutput.set(0, i, sigmoid(m_hiddenOutput.get(0, i)));
        }

        // 计算输出层的输出
        m_output = m_hiddenOutput.times(m_weightsHiddenOutput);
        // 减去阈值
        m_output = m_output.minus(m_outputThresholds);
        // 激活函数
        for (int i = 0; i < m_numOutputs; i++) {
    
    
            m_output.set(0, i, sigmoid(m_output.get(0, i)));
        }
    }

    /*
     * @Author YFMan
     * @Description //反向传播
     * @Date 2023/6/7 20:20
     * @Param []
     * @return void
     **/
    void backPropagation() {
    
    
        // 获取当前样例的标签
        m_labels = new Matrix(1, m_numOutputs);
        for (int i = 0; i < m_numOutputs; i++) {
    
    
            // 将标签转换为矩阵(最后部分的 binary 对应 类标签 标值)
            m_labels.set(0, i, m_currentInstance.value(m_numInputs + i));
        }


        // 计算 g = m_output * (1 - m_output) * (m_labels - m_output)
        // 输出层的误差
        Matrix m_outputError = m_output.copy();
        m_outputError = m_outputError.times(-1);
        // 创建一个全为 1 的矩阵 用于 1 - m_output
        Matrix one = new Matrix(1, m_numOutputs, 1);
        m_outputError = m_outputError.plus(one);
        m_outputError = m_outputError.arrayTimes(m_output);
        m_outputError = m_outputError.arrayTimes(m_labels.minus(m_output));

        // 计算 隐藏层到输出层的权重矩阵的增量 deltaWeightsHiddenOutput = m_learningRate * g * b_h
        // 隐藏层到输出层的权重矩阵的增量
        Matrix m_deltaWeightsHiddenOutput = new Matrix(m_numHidden, m_numOutputs);
        for(int h = 0; h < m_numHidden; h++) {
    
    
            for(int j = 0; j < m_numOutputs; j++) {
    
    
                m_deltaWeightsHiddenOutput.set(h, j, m_learningRate * m_outputError.get(0, j) * m_hiddenOutput.get(0, h));
            }
        }

        // 计算 隐藏层到输出层的阈值的增量 deltaOutputThresholds = - m_learningRate * g
        // 输出层的阈值的增量
        Matrix m_deltaOutputThresholds = m_outputError.times(-m_learningRate);

        // 计算 隐藏层的误差 hiddenError = b_h * (1 - b_h) * g * m_weightsHiddenOutput.transpose()
        Matrix one1 = new Matrix(1, m_numHidden, 1);
        // 隐藏层的误差
        Matrix m_hiddenError = m_hiddenOutput.copy();
        m_hiddenError = m_hiddenError.times(-1);
        m_hiddenError = m_hiddenError.plus(one1);
        m_hiddenError = m_hiddenError.arrayTimes(m_hiddenOutput);
        m_hiddenError = m_hiddenError.arrayTimes(m_outputError.times(m_weightsHiddenOutput.transpose()));

        // 计算 输入层到隐藏层的权重矩阵的增量 deltaWeightsInputHidden = m_learningRate * hiddenError * x_i
        // 输入层到隐藏层的权重矩阵的增量
        Matrix m_deltaWeightsInputHidden = new Matrix(m_numInputs, m_numHidden);
        for(int i = 0; i < m_numInputs; i++) {
    
    
            for(int h = 0; h < m_numHidden; h++) {
    
    
                m_deltaWeightsInputHidden.set(i, h, m_learningRate * m_hiddenError.get(0, h) * m_input.get(0, i));
            }
        }

        // 计算 输入层到隐藏层的阈值的增量 deltaHiddenThresholds = - m_learningRate * hiddenError
        // 隐藏层的阈值的增量
        Matrix m_deltaHiddenThresholds = m_hiddenError.times(-m_learningRate);

        // 更新 隐藏层到输出层的权重矩阵
        m_weightsHiddenOutput = m_weightsHiddenOutput.plus(m_deltaWeightsHiddenOutput);
        // 更新 隐藏层到输出层的阈值
        m_outputThresholds = m_outputThresholds.plus(m_deltaOutputThresholds);

        // 更新 输入层到隐藏层的权重矩阵
        m_weightsInputHidden = m_weightsInputHidden.plus(m_deltaWeightsInputHidden);
        // 更新 输入层到隐藏层的阈值
        m_hiddenThresholds = m_hiddenThresholds.plus(m_deltaHiddenThresholds);

        // 更新训练误差
        m_error += calculateError();
    }


    /*
     * @Author YFMan
     * @Description //计算训练误差
     * @Date 2023/6/7 21:04
     * @Param []
     * @return double
     **/
    double calculateError() {
    
    
        // 计算误差
        double error = 0;
        for (int i = 0; i < m_numOutputs; i++) {
    
    
            error += Math.pow(m_labels.get(0, i) - m_output.get(0, i), 2);
        }
        error /= 2;
        return error;
    }

    /*
     * @Author YFMan
     * @Description //训练BP网络
     * @Date 2023/6/7 19:44
     * @Param []
     * @return void
     **/
    void train() {
    
    
        // 遍历epochs次训练
        for (int i = 0; i < m_numEpochs; i++) {
    
    
            // 初始化训练误差
            m_error = 0;
            // 遍历训练数据
            for (int j = 0; j < m_instances.numInstances(); j++) {
    
    
                // 获取当前训练样例
                m_currentInstance = m_instances.instance(j);
                // 前向传播
                forwardPropagation();
                // 反向传播
                backPropagation();
            }
            // 打印训练误差
            System.out.println("Epoch " + (i + 1) + " error: " + m_error);
            // 训练误差小于0.01,停止训练
            if (m_error < 0.01) {
    
    
                break;
            }
        }
    }

    /*
     * @Author YFMan
     * @Description //根据训练数据训练BP网络
     * @Date 2023/6/7 19:12
     * @Param [instances 训练数据]
     * @return void
     **/
    public void buildClassifier(Instances instances) throws Exception {
    
    
        // 训练数据
        m_instances = instances;
        // 获取训练数据的标签
        m_nominalToBinaryFilter.setInputFormat(m_instances);
        m_instances = Filter.useFilter(m_instances, m_nominalToBinaryFilter);
        m_numInputs = 0;
        // 遍历训练数据,获取输入维度、隐藏层维度、输出维度
        for (int i = 0; i < instances.numAttributes(); i++) {
    
    
            // 获取输入维度
            if (i != instances.classIndex()) {
    
    
                if(instances.attribute(i).numValues()<=2){
    
    
                    m_numInputs += 1;
                }
                else{
    
    
                    m_numInputs += instances.attribute(i).numValues();
                }
            } else {
    
     // 获取输出维度
                if(instances.attribute(i).numValues()<=2){
    
    
                    m_numOutputs = 1;
                }
                else{
    
    
                    m_numOutputs = instances.attribute(i).numValues();
                }
            }
        }
        // 隐藏层维度(输入维度+输出维度) * 2 / 3
        m_numHidden = (m_numInputs + m_numOutputs) * 2 / 3;
        // 初始化输入层
        m_input = new Matrix(1, m_numInputs);

        // 初始化权重矩阵
        m_weightsInputHidden = Matrix.random(m_numInputs, m_numHidden);
        m_weightsHiddenOutput = Matrix.random(m_numHidden, m_numOutputs);

        // 初始化隐藏层阈值向量
        m_hiddenThresholds = new Matrix(1, m_numHidden);
        // 初始化输出层阈值向量
        m_outputThresholds = new Matrix(1, m_numOutputs);
        // 训练误差
        m_error = 0;
        // 训练
        train();

    }

    /*
     * @Author YFMan
     * @Description //根据样例分类数据,返回各个类别的概率
     * @Date 2023/6/7 19:11
     * @Param [instance 输入样例]
     * @return double[] 各个类别的概率
     **/
    public double[] distributionForInstance(Instance instance) throws Exception {
    
    
        // 将输入样例由标称属性转换为二值属性
        m_nominalToBinaryFilter.input(instance);
        m_currentInstance = m_nominalToBinaryFilter.output();

        // 前向传播
        forwardPropagation();
        // 如果输出维度为1,返回概率
        double[] result;
        if (m_numOutputs == 1) {
    
    
            result = new double[2];
            result[0] = 1 - m_output.get(0, 0);
            result[1] = m_output.get(0, 0);
        }
        else{
    
     // 否则,进行归一化处理
            result = m_output.getRowPackedCopy();
            // 归一化处理
            double sum = 0;
            for (double v : result) {
    
    
                sum += v;
            }
            for (int i = 0; i < result.length; i++) {
    
    
                result[i] /= sum;
            }
        }
        return result;
    }
}

  1. Zhou Zhihua. "Machine Learning" ↩︎

Guess you like

Origin blog.csdn.net/myf_666/article/details/131253903