[Learning arrangement] Knowledge points related to deep learning 2

1.CLS

CLS is the meaning of classification, which can be understood as being used for downstream classification tasks. It is a feature vector that can represent the semantics of the entire text, that is, the label that represents the entire sentence, and represents the meaning of the sentence, not just the meaning of a word. It can be directly used for classification when it is taken out.

2. Token in Transformer

Class token, patch token, in NLP, each word is called token, and the sentence semantics is called CLS, in CV, the image is cut into non-overlapping patch sequences, which are actually tokens.

3.FLOPs

FLOPs refers to " Floating Point Operations Per Second" , which is used to measure the computing performance of computers, computing devices or deep learning models. It refers to the number of floating-point operations that can be performed per unit time, and is usually used to evaluate the speed and efficiency of a computer or model.
In deep learning models, FLOPs are often used to measure the computational complexity of the model. Generally speaking, the smaller the FLOPs, the lighter the model, which is more suitable for embedded scenarios such as mobile devices. However, too small FLOPs may also degrade the performance of the model. On the contrary, too large FLOPs will increase the computational cost and reduce the training and inference speed. Therefore, appropriate models and parameters need to be selected according to specific scenarios and requirements.
The unit of FLOPs is Giga Floating Point Operations (GFLOPs for short) or Tera Floating Point Operations (TFLOPs for short).

4. mAP

mAP is an abbreviation for Mean Mean Precision and is a metric used to evaluate object detection models. It is the average of the precision values ​​at different recalls. The precision value is the ratio of true examples to the total number of predicted positive examples. Recall is the ratio of true examples to the total number of actual positive examples. A high mAP score indicates that the model is able to accurately detect objects with high precision and recall.

5. COCO MASK

COCO maskAP is one of the metrics used to evaluate the performance of instance segmentation models in the COCO dataset. maskAP refers to a variant of Average Precision (AP), which is calculated in the instance segmentation task. The calculation of maskAP is similar to traditional AP calculation, but it considers the overlapping of segmentation masks. Specifically, maskAP calculates AP values ​​based on different IoU thresholds (usually 0.5, 0.75, and 0.95), and then averages these AP values ​​as the final maskAP value. In the COCO dataset, maskAP is one of the important indicators to evaluate the performance of instance segmentation models.

6. Fine-grained and coarse-grained

In deep learning, convolutional neural network (CNN) is usually used to extract image features, and the convolutional layer in CNN can be regarded as the feature extraction of different levels of images. Therefore, the feature representation at different levels can be controlled by adjusting the number of convolutional layers and the size of the convolutional kernel in CNN. In practical applications, different levels of feature representations can be obtained by using different CNN architectures and adjusting different hyperparameters. In general, fine-grained feature representation requires more convolutional layers and smaller convolution kernels to extract more details and local information, while coarse-grained feature representation requires fewer convolutional layers and larger Convolution kernels to extract less detail and local information.

7. feature level

In computer vision, feature levels are often used to represent different levels of feature representation. Specifically, feature levels can include different feature maps, different convolutional layers, different network layers, etc. In deep learning, convolutional neural network (CNN) is usually used to extract image features, and the convolutional layer in CNN can be regarded as the feature extraction of different levels of images. Therefore, the feature representation at different levels can be controlled by adjusting the number of convolutional layers and the size of the convolutional kernel in CNN. The same feature level indicates that the same number of convolutions have been performed, but the convolution kernels are not necessarily the same.

8. Super-resolution technology

Super-resolution is an important research direction in the field of computer vision, and its goal is to convert low-resolution images into high-resolution images. Super-resolution technology can be applied in many fields, such as image processing, video processing, medical imaging, etc. In the field of image processing, super-resolution technology can be used to improve the clarity and details of images, thereby improving the quality and usability of images. In the field of video processing, super-resolution technology can be used to improve the clarity and fluency of videos, thereby improving the viewing experience of videos. In the field of medical imaging, super-resolution technology can be used to improve the resolution and clarity of medical images, thereby improving the accuracy and reliability of medical diagnosis.
insert image description here
The left image is the original image, and the right image is the high-resolution image repaired by AI technology.
The existing mainstream image super-resolution methods can usually be divided into two types: methods based on image interpolation and methods based on deep learning.
You can refer to this article to learn more about super-resolution technology: Take you to read "Super-resolution Technology" in 3 minutes

9. Cross-layer Attention

Cross-layer attention (Cross-layer Attention) is an attention mechanism used in the Transformer model, which can help the model perform information interaction and fusion between feature representations at different levels. Specifically, cross-layer attention can take feature representations of different levels as input, and then obtain the correlation between different levels by calculating attention weights, so as to realize the interaction and fusion of features. In practical applications, cross-layer attention can be used for multiple tasks, such as natural language processing, computer vision, etc.

Guess you like

Origin blog.csdn.net/qq_45746168/article/details/129662488