yolov5 How to judge the reason for poor recognition

After yolov5 trained the model, I found that the recognition of the test image was not good. So what should I do at this time?
Is it over-fitting or under-fitting? How to judge?

Underfitting
An important topic in machine learning is the generalization ability of the model. A model with strong generalization ability is a good model. For a trained model, if it performs poorly on the training set, it will also perform poorly on the test set. , This may be caused by under-fitting. This is because the generalization ability is too strong, and the misrecognition on the training set and the test set is relatively high.

Underfitting solution

  1. The simplest is to increase the number of positive samples in the data set, and increase the number of samples for the main features

  2. Increase the number of training. It may be that you have not trained the appropriate number of times, and the network has not learned the features before being stopped by you

  3. Add other feature items. Sometimes when our model is under-fitting, it is caused by insufficient feature items. You can add other feature items to solve it well. For example, the three types of features, "combination", "generalization", and "relevance" are important means of feature addition. No matter what the scene is, you can take pictures of the gourd, and you will always get unexpected results. In addition to the above features, "context features", "platform features", etc., can all be used as preferences for feature addition.

  4. Adding polynomial features is very common in machine learning algorithms, such as adding quadratic or cubic terms to linear models to make the model more general. Take the example of the picture above.

  5. Reduce the regularization parameters. The purpose of regularization is to prevent over-fitting, but now that the model is under-fitting, you need to reduce the regularization parameters.

Overfitting
If the performance on the training set is very good, but the performance on the test set is poor, this may be due to overfitting.
For example, the cans on the training set can be identified, but they cannot be identified on the test set. This is overfitting Performance.

Overfitting solution

  1. The easiest is to increase the number of samples of other features and retrain the network.

  2. Re-cleaning the data, one cause of over-fitting may also be caused by impure data. If over-fitting occurs, we need to clean the data again.

  3. Another reason for increasing the amount of training data is that the amount of data we use for training is too small, and the proportion of training data to the total data is too small.

  4. Adopt regularization method. Regularization methods include L0 regularization, L1 regularization and L2 regularization, and regularization is generally to add the norm after the objective function. However, L2 regularity is generally used in machine learning. Let's look at the specific reasons below.

The L0 norm is the number of non-zero elements in the quantity. The L1 norm is the sum of the absolute values ​​of each element in the pointing quantity, also called "sparse regularization" (Lasso regularization). Both can achieve sparsity. Since L0 can achieve sparsity, why not use L0 but use L1? Personal understanding is that the L0 norm is difficult to solve optimally (NP-hard problems), and the second is that the L1 norm is the optimal convex approximation of the L0 norm, and it is easier to optimize and solve than the L0 norm. That's why everyone turned their attention and love to the L1 norm.

The L2 norm is the sum of the squares of the elements of the quantity and then the square root. Each element of W can be made very small and close to 0, but unlike the L1 norm, it will not make it equal to 0, but close to 0. The L2 regular term has the effect of making the parameter w smaller and intensifying, but why can it prevent overfitting? A popular understanding is: a smaller parameter value w means that the model has a lower complexity, fits the training data just right (Occam's razor), and will not overfit the training data, so that it will not be too Fitting to improve the generalization ability of the model. There is also the problem that some people say that the L2 norm helps to deal with the problem of matrix inversion when the condition number is not good (I don't understand it too much here).

  1. Use the dropout method. This method is very commonly used in neural networks. The dropout method is a method proposed in ImageNet. In layman's terms, the dropout method allows neurons to not work with a certain probability during training. See the figure below for details:
    Insert picture description here

As shown in the figure above, the picture on the left a is a standard neural network that does not use the dropout method, and the picture on the right b is a neural network that uses the dropout method during the training process, that is, skip a certain nerve with a certain probability p during training yuan.


The following content is a bit related to this article. It is a bit of work experience.

After long-term exploration and experimentation, the following points have been summarized.

At present, I feel that if the visual neural network is to reach the practical stage, first of all, we must pay attention to the management of the training data set.
Before discussing, we should understand how the current neural network works. Let’s not go into too much detail. How to achieve it. Let's talk a little rougher.

The main function of the current visual neural network is to establish a mapping model between the data set and the standard answer given by humans.
How is this model generated? The visual neural network completes this function in three stages, the
first The first stage is to extract features and split the picture into different small features, such as corners, contours, colors, shapes, and brightness. The
second stage is to use the extracted features to establish a relationship with the standard answer through a polynomial formula. Relations, the parameters of the polynomial are constantly adjusted to finally achieve the same prediction result and recognition result.

Moreover, this model has a certain generalization ability. In layman's terms, it can draw inferences from one another... This is indeed possible, because its principle is to calculate the weight of each feature to classify. It is a probabilistic answer. Not 1+ The deterministic question of 1=2... But can it be done by analogy and analogy? This is currently impossible.
Since it is a probabilistic answer, there must be a possibility that the answer is incorrect...also It means that the recognition is incorrect.

To briefly summarize, there must be knowledge or features in the data set, and there must be no less data for the main features, nor can there be only one or two main features.

The first is that the data set image must contain some main features that can reflect the items to be identified . For example, bottles and cans. Speaking of bottles, they are generally shaped with a small upper mouth and a larger body. You mix the bottles and cans with labels , The
jar is generally a shape with a big opening on the top and a body about the same size on the bottom. The training data sets of these two items should not be confused, although sometimes humans will also call the straight-through jar a bottle, such as the bottle opening of the Nutrition Express It is very large. There are also jars of peaches sometimes called bottles, but this increases uncertainty for the computer, resulting in failure to fit. That is to say, the positive sample of the data set must be able to highlight the main characteristics of the item.

The second is that the main features of the data set should not be too few pictures . The more the main features are , the easier it is to extract the main features. There are as many angles as possible. The current neural network does not have the ability to derive the three-dimensional correlation and is only flat. Seek common features and find rules. There is no association and the ability to roll in three dimensions. There are many features to ensure that objects can be recognized in various situations. For example, now the neural network is very effective in identifying people, due to the very large number of data sets The samples of are all taken from the front or side. No matter how good the training is, you let it recognize the pictures taken from under the skirt. It still doesn’t know... Unless you add a lot of these pictures. It still works Able to learn. This is called the knowledge contained in the data set. Find the rules. Find the features. That's it.

The third is that the image of an item in the data set must not have only one feature , and it should include all features as much as possible. An item should have at least 5 or more features to easily have a good generalization effect, for example, For example, what can be the main feature of the furry feature? That's too much. It can be toys, cats, dogs, hair, down jackets.

Summary
Through calm analysis and thinking. It can be found that if you want to learn knowledge and achieve the best results, you first need to have a large amount of data set and if it is a better data set. The labeling is correct and reasonable. Do not refer to the deer as a horse. The main features (note that there are several). Plus a few edge samples...

The above requirements on the data set are things that can be understood purely by reasoning. I think everyone can understand without proof.

Too little is easy to underfit, which makes it difficult to identify different items, such as Wanglaoji beverage bottles, and red is also regarded as an important feature of Wanglaoji... The data set cannot be trained too much, and too many training times are easy to overfit .

The training process is a process of constant trial and error fitting. Constantly adjusting the model parameters to adapt to the results. It does not know what this parameter means. So it will get all kinds of messy results.

In order to solve the problem of lack of data set, I feel, get an item to take a 360-degree photo as a basic item photo. Then mix it with the background image to form a variety of photos, which can basically achieve a good recognition effect. Then add the post Some of our photos of real data can basically be effective and practical.

The calculation power here has become very important... At present, the result of each training can improve the subsequent training speed. The later training should be faster than the beginning, because the parameters have been adjusted in place.

Subsequent addition of item categories is also possible, and the data set must be consistent with the model. Each new category must be added after the existing category. Otherwise, it will cause the previous network to retrain.

Guess you like

Origin blog.csdn.net/phker/article/details/109456539