CS231n学习笔记--11.Detection and Segmentation

1. Computer Vision Task


2. Semantic Segmentation

2.1 特点:
a. Label each pixel in the image with a category label
b. Don’t differentiate instances, only care about pixels

2.2 步骤:

a. Semantic Segmentation Idea: Sliding Window


b. Semantic Segmentation Idea: Fully Convolutional


2.3 upsampling:

Max Unpooling


这样的upsamle有效的原因在于算法不要求得到一张好看的超分辨率图片,而是为了尽可能的保留像素的结构分布特征!

Transpose Convolution

算法原理图:


1D Example:


3. Classification + Localization

原理图:


Human Pose Estimation

目标:


原理图:


4. Object Detection as Classification

搜索算法Sliding Window存在的问题:


Region Proposals:


RNN算法原理:


R-CNN: Problems

  1. Ad hoc training objectives
    • Fine-tune network with softmax classifier (log loss)
    • Train post-hoc linear SVMs (hinge loss)
    • Train post-hoc bounding-box regressions (least squares)
  2. Training is slow (84h), takes a lot of disk space
  3. Inference (detection) is slow
    • 47s / image with VGG16 [Simonyan & Zisserman. ICLR15]
    • Fixed by SPP-net [He et al. ECCV14]

Fast R-CNN

检测ROI区域在得到图像特征图之后,从而减少大量的重复特征计算。



Faster R-CNN: RoI Pooling


在卷积层中设置RPN层用于检测ROI:


Detection without Proposals: YOLO / SSD

扫描一次图片时同时进行区域定位与物体识别:


Object Detection: Lots of variables …


Aside: Object Detection + Captioning = Dense Captioning


算法架构:


Mask R-CNN

加入一个掩摸:



Mask R-CNN Also does pose


效果图:


猜你喜欢

转载自blog.csdn.net/u012554092/article/details/78235775
今日推荐