Li Mu Video-Target Detection
compiler + interpreter
The difference between imperative (interpretive) programming and symbolic programming is as follows:
-
Imperative programming is easier to use. In Python, most of the code for imperative programming is straightforward. Imperative programming is also easier to debug, because it's easier to get and print all intermediate variable values, or use Python's built-in debugging tools;
-
Symbolic programming runs more efficiently and is easier to port. Symbolic programming makes it easier to optimize code during compilation, while also being able to port programs into a Python-independent format, allowing programs to run in non-Python environments, avoiding any potential performance issues associated with the Python interpreter.
As mentioned above, PyTorch is based on imperative programming and uses dynamic computation graphs. In order to be able to take advantage of the portability and efficiency of symbolic programming, developers think about whether the advantages of these two programming models can be combined, so torchscript is produced. torchscript allows users to develop and debug using purely imperative programming, while being able to convert most programs to symbolic for use when production-grade computing performance and deployment are required
summary
- Imperative programming makes the design of new models easier, because code can be written according to the control flow, and it has a relatively mature Python software ecosystem.
- Symbolic programming requires us to define and compile the program first, and then execute the program, which has the advantage of improving computing performance.
Sequence data is strongly correlated with time and space
How to predict? 1. Truncate 2. Keep past predictions
You can think that RPN is a small target detection algorithm so you can see that Faster R-CNN is two predictions
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-5JQCHtAx-1682575606947)(null)]
Mask-RCNN requires pixel-level labels. Unmanned vehicles use Mask-RCNN more often.
SSD
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-WriWZXlB-1682575599692)(…/…/Library/Application%20Support/typora-user-images/image-20230412095439937 .png)]
Core idea: I only do single stage once
ssd is a relatively early network and has not been updated. The author went home and became a civil servant.
YOLO
Just fast!
Quantitative change Target detection Surveillance camera ex: The real edge box has certain rules, too many are used and poorly written
How to extract the key frames for license plate recognition depends on the genre. I do this. The effect transformer can replace CNN as long as it is tuned well. I don’t have to say how to do it.
SSD implementation
v
Summary¶ _
- At multiple scales, we can generate anchor boxes of different sizes to detect objects of different sizes.
- By defining the shape of the feature map, we can decide the centers of uniformly sampled anchor boxes on any image.
- We use the information of the input image in a certain receptive field region to predict the anchor box category and offset on the input image that is close to the region.
- We can perform multi-scale object detection on hierarchical image representations at multiple levels through deep learning.