Neural network case analysis

Neural network is a data model that simulates the neural thinking mode of the human brain. There are many types of neural networks, including BP neural network, convolutional neural network, multi-layer perceptron MLP, etc. The most classic neural network is multi-layer Perceptron MLP (Multi-Layer Perception), SPSSAU uses this model by default. Similar to other machine learning models (such as decision trees, random forests, support vector machines, SVM, etc.), when building a neural network model, the data is first divided into a training set and a test set. The training set is used to train the model, and the test set is used to test the model. The pros and cons, and the neural network model can be used for feature importance identification, data prediction, or trained models for deployment engineering use, etc.

Neural network case

1  background

The 'iris classification data set' used by the neural network in this part is used for case demonstration. It has a total of 150 samples, including 4 feature attributes (4 independent variables X), and the label (dependent variable Y) for the iris flower category, a total of It includes three categories: bristle iris, color-change iris and Virginia iris (hereinafter referred to as categories A, B and C).

2  Theory

The principle of neural network can be seen in the figure below.

In principle, first input the feature item X, that is, the independent variable item. When using the neural network model, the feature item The activation function constructs some "pseudo feature items" (that is, feature items that do not exist in fact, are completely constructed by the model, and are unexplainable feature items). In terms of specific construction, for example, when it is a linear activation function, it can be intuitively understood as something like " y=1+2*x1+3*x2+4*x3+…” such a function). And the construction of 'pseudo feature items' can have multiple levels (that is, the 'hidden layer neurons' can have multiple layers, the default is 1 layer), and each layer can have multiple neurons (the default is 100). Finally, it is calculated by the mathematical optimization algorithm and the output is obtained, that is, the prediction item.

In layman's terms, it means inputting the feature item Combined with the principle of neural network, it involves the following parameters, as follows:

The activation function, that is, the mathematical function how the interneuron is obtained, is usually a nonlinear function, and the relu term is usually used. There are three weight optimization methods available, namely lbfgs, sgd and adam. The opportunity gradient descent method is used by default, and the weight optimization method is used to calculate the optimal weight value. "L2 regularization penalty coefficient" is used to prevent over-fitting. The larger the value, the easier it is to bring better training model fitting, but the higher the risk of 'over-fitting' (that is, the training data model is good, but the test Poor data model). The maximum number of iterations and optimization tolerance are the internal criteria for ending the algorithm.

Regarding the setting of hidden layer neurons, the more layers there are, the more complex the model will be and the longer the calculation time will be. However, in theory, the more layers there are, the better the model fitting effect will be. SPSSAU defaults to one layer. In terms of the number of neurons (i.e. the 'number of pseudo features'), the larger the value of this parameter, the easier it is to bring about a better model fitting effect, but at the same time it is also likely to bring about an 'over-fitting' effect. Generally speaking, it is recommended that neural The number of neurons should be less than 2 times the 'number of feature items'. For example, in this case, there are only 4 feature items, and the number of neurons can be set to 8 at most. The more layers of hidden layer neurons, the higher the number of neurons. The more there are, the more complex the model will become and the longer the calculation time will be. When the number of feature items is large, it is recommended to make a comprehensive trade-off by reducing the number of layers and the number of neurons (SPSSAU defaults to one layer and the number of neurons is 100) .

In addition, when the weight optimization method is sgd or adam, the following three parameter values ​​may be involved (not included when the weight optimization method is lbfgs Newton method), as follows:

The initial learning rate is the step value for moving the optimal solution during the internal iteration process. The larger the value, the faster the calculation, but it is easier to find the optimal solution. The smaller the value, the slower the calculation, but it is more likely to find the optimal solution. untie. In addition, the learning rate can also be optimized. There are three optimization methods. The constant method is used by default.

Batch size refers to the number of data used for training each time in the internal mathematical algorithm. For example, if there are 1,000 training data, then set the Batch size to 100. The internal algorithm of the neural network will first use 100 of the data for training, and then use the other 100. Training, keep training until the training data is used up. The smaller the value of this parameter, the less machine memory is used, but usually the longer the neural network will run. The default value of this parameter is the smaller of 200 and the number of training samples. If the training data is small, it is recommended to do it yourself Set this parameter value to a smaller value. For example, when there are only 100 training data, it is recommended to set the value between 2 and 20. However, a Batch size value that is too small will cause problems such as too slow convergence of calculations, so in actual use , it is recommended to set multiple different batch size values ​​for comparison and selection.

3  operations

This example operates as follows:

The default selection of the training set proportion is: 0.8 or 80% (150*0.8=120 samples) for training the neural network, and the remaining 20% ​​or 30 samples (test data) are used for model verification. In order to maintain the uniformity of data dimensions, the 'normal normalization' method was selected. However, it should be noted that the data in this case is only 150, and the data used for training is only 120, which is very small, so certain parameter values ​​need to be specially set.

The first is the batch size value. Since only 120 data is too little for training the model, it is better to set the batch size to 10 (or 20 comparisons, etc.). Other parameters are temporarily defaulted, but the model that came out for the first time was very bad. The f1-score of the training set data was only 0.58, which means that the model is not feasible. Next, consider the important parameter value of hidden layer neurons. Since the current data samples are very few and the feature items are very low, you can consider 'making the model more complex', that is, increasing the number of neuron layers. This time, it is set to 3 layers. Each layer has 100 neurons. The final training data evaluation effect is good, and the test data evaluation effect is good, which means that there is no 'overfitting' phenomenon in the model and the model is usable.

The setting parameters of this case are as follows:

4  SPSSAU output results

SPSSAU outputs a total of 5 results, which are basic information summary, training set or test set model evaluation results, test set result confusion matrix, model summary table and model code, as explained below:

In the above table, the basic information is summarized to show the classification distribution of the dependent variable Y (label item). The model evaluation results (including training set or test set) are used to judge the fitting effect of the model, especially the fitting effect of the test set. It also provides the confusion matrix results of the test set data; the model summary table summarizes various parameter values, and the core code for neural network model construction is appended at the end.

5 Text Analysis

Next, the most important model fitting conditions are explained, as shown in the following table:

The above table provides four evaluation indicators for the training set and test set respectively, namely precision rate, recall rate, f1-score, accuracy rate, as well as average index and sample size index. The f1-score value is 0.97 when training data. , and the test set data also maintains a high score of 0.94. The two are relatively close, which means that there should be no 'overfitting' phenomenon and the model is good.

Then further look at the 'confusion matrix' of the test data, which is the intersection set of model predictions and factual conditions, as shown below:

In the 'Confusion Matrix', the larger the value of the diagonal line of the lower right triangle, the better, which means that the predicted value is completely consistent with the true value. In the above figure, only 2 samples in category B are judged to be category C, and the rest are all correct, which means that this neural network performed well on the test data. Finally, SPSSAU outputs model parameter information values, as shown in the following table:

The model summary table shows the parameter settings of the model. The final SPSSAU output uses the slearn package in python to construct the core code of this neural network as follows:

model = MLPClassifier(activation='relu', alpha=1.0E-4, hidden_layer_sizes=(100,100,100), learning_rate='constant', learning_rate_init=1.0E-4, batch_size=20, max_iter=200, solver='adam', tol=0.001)

model.fit(x_train, y_train)

6  Analysis

The following key points are involved, as follows:

  • Is standardization required for neural networks?
    The general recommendation is to standardize, because distance calculations may be involved in neural networks and dimensional data processing is required. Normal normalization is usually sufficient.
  • Saving predicted values
    ​​When saving predicted values, SPSSAU will generate a new title to store the category information predicted by the model. The meaning of its number is consistent with the number of the label item (dependent variable Y) in the model.

  • When constructing a neural network in SPSSAU, how to deal with the categorical data included in the independent variable Priority is given to various feature weight values, and the actual meaning of the original features has been lost after the intermediate hidden layer transformation process, so neural networks usually do not pay attention to data types. If it must be processed, it is recommended to convert the categorical data into dummy variables and put them in. You can click to view the dummy variables.

SPSS Online_SPSSAU_Dummy Variable_Dummy Variable​spssau.com/front/spssau/helps/otherdocuments/dummy.html​Edit

  • What are the criteria for judging the qualification of neural networks in SPSSAU?
    In machine learning models, training data is usually used to train the model first, and then test data is used to test the model effect. Usually the criterion is that the training model has a good fitting effect, and the test model also has a good fitting effect. In machine learning models, it is easy for the phenomenon of 'overfitting', that is, false good results, to occur, so it is necessary to focus on the fitting effect of the test data. For a single model, parameters can be transformed and optimized. At the same time, multiple machine learning models can be used, such as decision trees, random forests, support vector machines, neural networks, etc., to comprehensively compare and select the optimal model.
  • More references on neural networks?
    More information about neural networks can be viewed through the sklearn official manual, click to view.

1.17. Neural network models (supervised)​scikit-learn.org/stable/modules/neural_networks_supervised.html​Edit

  • Neural network model parameter settings?

When using a neural network model, parameter settings are very important. It is recommended to customize the batch size (when the weight optimization method is sgd or adam, and select the batch size as custom, customize the batch size), and set the number of hidden layer neurons and The number of neurons in each layer (increasing the number of hidden layer neurons, increasing the number of neurons in each layer will help the model to fit, but it will take longer to calculate and the model will be more complex, leading to 'overfitting') Due to the 'combination' phenomenon, under normal circumstances it is recommended that the number of hidden layer neurons be less than or equal to 3). If the 'overfitting' phenomenon occurs, the L2 regularization penalty coefficient value can be set (set larger). In addition, you can set more initial learning rate parameter values ​​to speed up the calculation.

Guess you like

Origin blog.csdn.net/m0_37228052/article/details/133016591