What are the tricks for deep learning to master SOTA?

Link: https://www.zhihu.com/question/540433389

Editor: Deep Learning and Computer Vision

Statement: Only for academic sharing, no infringement or deletion

作者:Gordon Lee
https://www.zhihu.com/question/540433389/answer/2549775065

1.R-Drop: twice forward + KL loss constraint

2. MLM: Use mlm to further pre-train (Post-training) on ​​domain corpus

3. EFL: With a small number of samples, the classification problem is converted into a matching problem, and the input is structured into an NSP task form.

4. Mixed precision fp16: speed up training and improve training accuracy

5. When using Doka ddp training, when using gradient accumulation, you can use no_sync to reduce unnecessary gradient synchronization and speed up the process.

6. For the case where the verification set or test set is particularly large, you can try multi-card inference. What you need to use is dist.all_gather. For non-tensors, you can also use all_gather_object

7. PET: With a small number of samples, convert classification into mask position prediction and construct a verbalizer. Refer to EACL2021. PET

8. ArcFaceLoss: The loss for twin-tower sentence matching is changed from NT-Xent loss to arccos form, refer to ACL2022. ArcCSE

9. Data enhancement is in zero shot x-lingual transfer: code switch, machine translation.. Remember to add consistency loss at the end, refer to consistency regularization for cross lingual finetuning

10. SimCSE: Continue to pre-train simcse on domain corpus

11. Focal loss: Unbalanced processing

12. Twin tower late interaction: maxsim operation: calculate the similarity of each token of query and doc, take the maximum similarity and then sum. There is a good balance between speed and accuracy, refer to colbert

13. Continuous learning reduces forgetting: EWC method + a strong pre-training model works very well. Just add a regular rule so that important parameters are not forgotten too much, and the importance is measured by fisher information.

14. Adversarial training: FGM, PGD, I can point out, but the training is slow.

15. Memory bank increases bsz, although I feel it is a bit useless sometimes.

16. PolyLoss: -logpt + eps * (1-pt) The effect is doubtful. Anyway, I tried it but it had no effect. Some people have tried it with good results.

Author: Luo Yin
https://www.zhihu.com/question/540433389/answer/2669605576

I quite agree with what Gao Zan said. The biggest impact on performance is actually data.

When I was in school, I also liked to learn about various tricks to increase points. I thought about using them in competitions in the future. In the game, this is really necessary, because the game data is provided, all we need to do is to explore all kinds of tricks that are easy to use, no matter what, just increase the score

After work, I found that the most concerned thing when I actually do business is the data. If the performance is not good, I just continue to mark the data. Either I will mark the business or I will mark it myself. I really have marked dozens of hundreds of records. The performance increases significantly after the data is released. Of course, I have to say here that labeling data also has rules and regulations. Not all data is marked. The basic step-by-step process is to analyze the rules of the bad case of the model, and then find similar data to label. The final result of this process is to select the most valuable batch of data for the current model state for labeling, which is essentially the active learning of human flesh. However, I have never tried some mainstream active learning methods. I don’t know if they are effective or not. Students who know more can answer them:-)

Having said that, in actual business, compared to trick, the comprehensive income of labeling data is the largest, because data itself is a precious resource, especially for labeled data, labeling data is the behavior of predecessors planting trees and future generations enjoying the shade , accumulation to a certain extent can also become a barrier. As for trick, one is not necessarily easy to use, then it is not necessarily orthogonal, and finally it is not necessarily universal. The overall uncertainty is relatively large, and the rate of return is still relatively low, so it is better to directly mark the data simply and roughly.

Author: AI Advanced Artificial Intelligence
https://www.zhihu.com/question/540433389/answer/2601363270

A tourist attraction is a regional place that focuses on tourism and related activities. It is usually for tourists to visit, vacation, and exercise in their spare time. It also has related facilities and an independent management area that provides corresponding tourism services. Tourist attractions are mainly built around various landscapes, temples, parks, etc. Abnormal behavior is a common social phenomenon in social life. It is divided into general abnormal behavior, such as traffic violations, and serious abnormal behavior, such as illegal and criminal behavior. More specifically, abnormal behavior refers to an unusual action that occurs in a normal position at a normal time in a certain scene, or an action that occurs in an abnormal position at an abnormal time. This behavior is usually issued by human beings, including some violent behaviors (such as punching, kicking, running, stamping, etc.). Non-violent behavior (such as painting, sitting quietly, walking, etc.), this type of behavior is usually initiated by one person, and the action takes a long time.

The abnormal behavior recognition system is a process of analyzing the surveillance video content through computer vision technology, image and video processing technology and artificial intelligence recognition technology, and then controlling the system according to the analysis results.

For the abnormal behavior recognition system in tourist attractions, it refers to real-time identification and early warning of specific abnormal behaviors of various tourists by detecting surveillance videos in scenic spots, so as to meet regulatory requirements.

☆ END ☆

If you see this, it means you like this article, please forward it and like it. Search "uncle_pn" on WeChat. Welcome to add the editor's WeChat "woshicver". A high-quality blog post will be updated in the circle of friends every day.

Scan the QR code to add editor↓

1dba154e84f969f79c531cddf6b8107a.jpeg

Guess you like

Origin blog.csdn.net/woshicver/article/details/131950443