Some summary of algorithm implementation

According to the blogger's own algorithm and the summary of some other bloggers

During the interview, the interviewer particularly dislikes directly using the SOTA model on paperswithcode. If the model works, he can use it directly, without thinking about why the model is suitable for feature tasks, so trial and error is not advisable, which is also the original intention of this blog post.

1. Familiar with the data

Andrew Ng: 80% data + 20% model = better AI
For new project tasks, the first step is to be familiar with the data, such as detection tasks, you can write a visual code to check whether the labels are reasonable, and check the objects to be detected The size distribution (convenient for anchor setting), check the size of the picture, check the category distribution (whether there is a long tail phenomenon), etc.

2. Algorithm selection

When receiving a task in a new field, it is necessary to research algorithms in related fields. For example, bloggers have done motion detection before, and they are not very familiar with this field at the beginning, so the research will take a long time at the beginning. This part is also when patience is required. Although it takes some time to conduct research, it can save some experiments in algorithm selection, which is very cost-effective! For example, motion detection: openpose, slowfast, hrnet, etc. It is recommended to watch the mmdet series! Push, push, push, push!
Less desirable idea:

  1. If you are too stubborn on the indicators, for example, if the indicator does not perform well on your own data set, immediately change to another algorithm, or immediately change to a backbone. It is necessary to carefully analyze which indicators are not effective, AP or AR? , Is there a problem with my own training (lr), or the current data is not suitable for the algorithm, or the evaluation index is unreasonable
  2. Do not conduct relevant research and directly use the SOTA algorithm: for example, the current task detection target is a small target, while the SOTA algorithm is biased towards detecting medium and large targets. Another example: SOTA does not make some optimizations for its own scene data. Or the current task requires high FPS, and the SOTA model is often a heavy network.

3. Optimize the algorithm based on the existing implementation

After selecting a suitable algorithm for a certain task, it is best to choose an open source project with a relatively high star on github to reproduce the algorithm.
The purpose of doing this:

  • It is more convenient and in-depth to understand the specific details of the algorithm. It is impossible for the article to be particularly specific about the entire model (using additional tricks). We need to read the source code.
  • Can quickly grasp the basic performance of the algorithm, such as the approximate running speed and achieved effect of the reproduction algorithm
  • Don't do some useless work yourself

Using the improved open source project model has the following ideas:

  • Does the code implement some tricks in the article, if not, you can try.
  • Articles generally analyze the experimental results, and later there will be some opinions of the author. They may explain why the algorithm effect of the article is poor in some cases.
  • Some articles will write about their possible future work, which is also an improvement idea
  • It is necessary to visually view the experimental results (especially run your own data set), the results may be different from the problems shown by the author in the public data set, and analyze the reasons for poor results

4. Recurrence algorithm from 0

some advices:

  • Try to test every detail, from data interface, model, to loss output, to the final evaluation code. Keep every part under control
  • Test the data interface, start from a single process, batch is 1, it is convenient to print the value for comparison
  • Do not randomize randomly, try to ensure that the problem can be reproduced. For example, do not add random data enhancement first, and the random seed of the model is fixed.
  • With a small amount of data, experiments can be done quickly, and the model can be quickly overfitted. The ability of the model to overfit can roughly determine that the model can learn something.
  • Try to reproduce the original text as much as possible. Before reproducing, don't add too much of your own unique ideas. For example, training parameters, model backbone, data enhancement methods, etc., first follow the article. If you are unclear, you can try to email the author or find relevant circles for discussion.
  • The log is printed in full. For example, when the loss is resolved to nan, it is necessary to know whether it is caused by forward or bp.

5. Some useful training advice

  • Ensure data is reliable
  • If you have a pre-trained model, you must definitely use it!
  • Usually the learning rate parameter is less than 1e-5, which is basically useless, such as cosine or step operation, and the final learning rate is 1e-5. Of course special tasks are different
  • Remember to turn on the update during bn training (especially the small partners of tf, which are easy to miss), otherwise the possible problem is that the loss drops rapidly during training, and the test feels that the model has not converged
  • sgd is great, but the experimental use of adam may have better convergence speed
  • If you want to squeeze out the performance of an algorithm well, please ensure that the current model can reach the corresponding performance before squeezing. Rather than blindly changing modules and adjusting parameters crazily, that may just be a waste of time
  • Don't trust your own parameter tuning technology too much. Without a better baseline, parameter tuning will not make a qualitative leap (unless the previous parameters caused some kind of bug)
  • When the data is small, use the pre-trained model and remember to fix the model parameters of the first few layers, and you can also use a smaller learning rate
  • Repeated training may increase the points. After training a model, use the trained model as a pre-training model to load, and continue to train with the same set of parameters.
  • DL is not supported by so many formulas like machine learning, and many of them are made sense to do an experiment to verify, so read as many papers as possible and look at other people's experiments, so that unnecessary experiments can be reduced

Guess you like

Origin blog.csdn.net/weixin_45074568/article/details/124850999