Target detection piecemeal notes

Li Mu Video-Target Detection

compiler + interpreter

The difference between imperative (interpretive) programming and symbolic programming is as follows:

  • Imperative programming is easier to use. In Python, most of the code for imperative programming is straightforward. Imperative programming is also easier to debug, because it's easier to get and print all intermediate variable values, or use Python's built-in debugging tools;

  • Symbolic programming runs more efficiently and is easier to port. Symbolic programming makes it easier to optimize code during compilation, while also being able to port programs into a Python-independent format, allowing programs to run in non-Python environments, avoiding any potential performance issues associated with the Python interpreter.

As mentioned above, PyTorch is based on imperative programming and uses dynamic computation graphs. In order to be able to take advantage of the portability and efficiency of symbolic programming, developers think about whether the advantages of these two programming models can be combined, so torchscript is produced. torchscript allows users to develop and debug using purely imperative programming, while being able to convert most programs to symbolic for use when production-grade computing performance and deployment are required

summary

  • Imperative programming makes the design of new models easier, because code can be written according to the control flow, and it has a relatively mature Python software ecosystem.
  • Symbolic programming requires us to define and compile the program first, and then execute the program, which has the advantage of improving computing performance.

Sequence data is strongly correlated with time and space

How to predict? 1. Truncate 2. Keep past predictions

image-20230412094108733

image-20230414154753253

You can think that RPN is a small target detection algorithm so you can see that Faster R-CNN is two predictions

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-5JQCHtAx-1682575606947)(null)]

Mask-RCNN requires pixel-level labels. Unmanned vehicles use Mask-RCNN more often.

SSD

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-WriWZXlB-1682575599692)(…/…/Library/Application%20Support/typora-user-images/image-20230412095439937 .png)]

Core idea: I only do single stage once

image-20230412103425432

ssd is a relatively early network and has not been updated. The author went home and became a civil servant.

image-20230412153926850

YOLO

Just fast!

image-20230412154153385

Quantitative change Target detection Surveillance camera ex: The real edge box has certain rules, too many are used and poorly written

image-20230412154516978

How to extract the key frames for license plate recognition depends on the genre. I do this. The effect transformer can replace CNN as long as it is tuned well. I don’t have to say how to do it.

SSD implementation

image-20230417104209449v

Summary¶ _

  • At multiple scales, we can generate anchor boxes of different sizes to detect objects of different sizes.
  • By defining the shape of the feature map, we can decide the centers of uniformly sampled anchor boxes on any image.
  • We use the information of the input image in a certain receptive field region to predict the anchor box category and offset on the input image that is close to the region.
  • We can perform multi-scale object detection on hierarchical image representations at multiple levels through deep learning.

Guess you like

Origin blog.csdn.net/sinat_34626178/article/details/130405274