Deep learning methods to improve model accuracy

deep learning

We have collected a dataset, built a neural network, and trained the model, and the final accuracy rate in the test and validation phases is less than 90%. Or not meeting the expectations of the business (100% required).

The following lists some strategies or techniques to improve the performance indicators of the model to improve the accuracy of the model.

insert image description here

data

use more data

The easiest way is to increase the data set. The accuracy of the model is not high. It can also be understood that your model is not generalized, but only for predictions based on the content of the training set. Add more data sets to make the data more diverse, while increasing some negative samples.

As for data enhancement, you have to know what you are doing. Simple things like resize are universal. Rotation is definitely not suitable for people. People can’t be upside down, right? The crop is gone, anyway, the data enhancement is actually to increase the data that is close to reality as much as possible, and there are some color changes. If the color modification will change the object type, you need to pay attention. For details, please refer to the figure below, and select the appropriate enhancement operator according to your own situation.

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

In order to conduct comparative experiments and observe the performance of different data enhancement methods, experiment 1 only performed image cutting, experiment 2 only performed image flipping, experiment 3 only performed image whitening, and experiment 4 performed these three data enhancement methods at the same time, and also trained for 5000 rounds. It is observed that the loss change curve, the training set accuracy change curve and the validation set accuracy change curve are compared as shown in the figure below.

insert image description here

Change image size

When you preprocess images for training and evaluation, you need to do a lot of experimentation with image size.
If you choose an image size that is too small, your model will not be able to identify salient features that help with image recognition. The objects in the image are blurred because the resolution is too low

Conversely, if your images are too large, you will increase the computational resources required by your computer, and/or your models may not be complex enough to handle them.

Common image sizes include 64x64, 128x128, 28x28 (MNIST), and 224x224 (vgg-16).

Keep in mind that most preprocessing algorithms do not take into account the aspect ratio of the image, so smaller sized images may shrink on a certain axis.

If our model is trained with a small resolution and predicted with a large resolution image, the image will be converted to a small resolution. At this time, many pixels in the image will be reduced, and some important content will inevitably be removed.

reduce color channels

The color channels reflect the dimensionality of the image array. Most color (RGB) images consist of three color channels, while grayscale images have only one channel.
The more complex the color channels, the more complex the dataset and the longer it takes to train the model.
If color is not that important factor in your model, you can go ahead and convert your color image to grayscale.

You can even consider other color spaces like HSV and Lab.

algorithm

Model improvements

  1. Weight decay: Add a regularization term to the objective function to limit the number of weight parameters. This is a method to prevent overfitting. This method is actually the l2 regularization method in machine learning, but it is used in neural In the Internet, the old bottled new wine was renamed weight decay.
  2. Dropout: During each training, some feature detectors are stopped, that is, the neurons are not activated with a certain probability, which can prevent over-fitting and improve the generalization ability.
  3. Batch normalization: batch normalization normalizes the input data of each layer of the neural network, which is conducive to making the distribution of the data more uniform, and all data will not lead to the activation of neurons, or all The data will not lead to the activation of neurons, which is a data normalization method, which can improve the fitting ability of the model
  4. LRN: The LRN layer imitates the lateral inhibition mechanism of the biological nervous system, and creates a competition mechanism for the activity of local neurons, so that the response value is relatively large, and the generalization ability of the model is improved.

Increase training rounds

An epoch is basically the number of times you pass the entire dataset through the neural network. Train your model incrementally at +25, +100 intervals.
Increasing the epoch is only necessary if you have a lot of data in your dataset. However, your model will eventually reach a point where increasing epochs will not improve accuracy.
At this point, you should consider adjusting the learning rate of the model. This small hyperparameter determines whether your model reaches a global minimum (the ultimate goal of a neural network) or gets stuck in a local minimum.

transfer learning

Transfer learning involves using pre-trained models, such as YOLO and ResNet, as a starting point for most computer vision and natural language processing tasks.
Pretrained models are state-of-the-art deep learning models, and they are trained on millions of samples, often taking months. These models are surprisingly capable of detecting subtle differences in different images.
These models can be used as the basis for your models. Most models are fine so you don't need to add convolution and pooling

add more layers

Adding more layers to the model enhances its ability to learn the characteristics of the dataset more deeply, so it will be able to identify subtle differences that a human might not notice.
This trick charts the nature of the task to be solved.

For complex tasks, like distinguishing between cat and dog breeds, it makes sense to add more layers, as your model will be able to learn the subtle features that distinguish a poodle from a shih tzu.

For simple tasks like classifying cats and dogs, a simple model with few layers will do.

Or the best method is to increase the residual network. The residual network can solve the problem of gradient decay very well, so that the deep neural network can work normally. Due to the deepening of the number of network layers, the gradient will be continuously attenuated in the process of error back propagation, and through the direct connection of the cross-layer edges, the error attenuation can be reduced in the process of back propagation, so that the deep network can be successfully trained.

Tune hyperparameters

The tips above give you a basis for optimizing your model. To actually tune the model, you need to consider tuning the various hyperparameters and functions involved in the model, such as the learning rate (as described above), activation functions, loss functions, and even batch sizes are all very important parameters to tune.

Deep networks or unsuitable loss functions and unsuitable learning rates may lead to gradient disappearance and gradient explosion.

Predictive models should not only look at the accuracy rate but also select appropriate evaluation indicators in combination with business problems

See a column below

A company wants to sell 50 products. The company has established two models to select customers to sell. The confusion matrix is ​​shown in the figure below. Which model should be selected?
insert image description here
Only considering the accuracy rate, it seems that the A model should be selected, but at this time we need to sell 50 items to 75 (=50/0.667, predicting that 66.7% of the buyers will actually buy, that is, the accuracy rate) customers. ; And choose model B, as long as 60 (=50/0.833) customers are sold, 50 items may be sold, and the cost of promotion is reduced. In this scenario, we only care about those customers who can be sold successfully, and those who cannot be successfully sold and are correctly predicted as not successfully sold, although it helps to improve the accuracy of the model, it does not make much sense to us. Therefore, it is more appropriate to use the accuracy rate to evaluate the quality of the model.

Do the project and give the accuracy rate. Your accuracy rate refers to the accuracy rate of the current data set tested, and does not represent the accuracy rate of other data sets.

Summarize

  • Troublesome: If you have a ready-made project, just transfer and learn your own data set, and start with the data set
  • Model overfitting: just improve the hyperparameters of the model, or regularization, weight decay, Dropout
  • Model underfitting: just increase the complex model and increase the epoch
  • Model training time is too long: batch normalization

Guess you like

Origin blog.csdn.net/weixin_42010722/article/details/127494720