Li Feifei computer visual learning a summary (classic paper attached link)

content

First class - Introduction

  1. What is computer vision - that is, research on visual data
    . 1.1 Baidu know

    Computer vision is to use an analog computer and related equipment for biological vision. Its main task is through the images or video capture is processed to obtain three-dimensional information of the corresponding scene. Computer vision is not only engineering, but also a challenging and important areas of research in the field of science. Computer vision is a comprehensive subject, it has attracted researchers from various disciplines to participate in its research. These include computer science and engineering, signal processing, physics, applied mathematics and statistics, neurophysiology and cognitive science.

  2. The importance of computer vision course
  3. Massive visual data (YouTube for example)
  4. Instructors and courses (cs131, etc.)

Second lesson - the history of visual development

  1. Visual history of animal development
  2. History of machine vision development - target segmentation concept

    Block world vision of the world reduced to a simple shape

  3. History of the development of computer vision - paper and real-time face detection Adaboost

    "SIFT"& Object Recognition, David Lowe, 1999
    Spatial Pyramid Matching, Lazebnik, Schmid& Ponce,2006
    Histogram of Gradients(hog), Dalal& Triggs,2005
    Deformable Part Model Felzenswalb McAllester, Ramanan 2009
    https://pan.baidu.com/s/1B06-0quirEwrxhdrwbgSbg(百度云链接

  4. Two well-known collection of data

    ImageNet / Pascal Voc data sets (to solve the problem of over-fitting)

  5. Study Guide

    a master cat visual experiments conclusions :( electrophysiological studies: electrode control) visual processing began with a simple structure of the visual world

b. mastery goal segmentation concept (Note 1)

c. Learn how Adaboost face detection algorithm in real time

d. Learn mageNet, Pascal Voc data set (Note 3) types and categories included pictures

Third class - the development of neural network convolution

  1. Image Task List

    Image classification, image detection, image captioning

  2. Imagenet used in the model contest

    Traditional extraction features -> SVM (Note 4)
    neural networks (in 2012 Alexnet is a big breakthrough)
    trend is more and more
    the first use of CNN's Le Cun is used to make handwritten numeral recognition

  3. Looking in the direction of development of computer vision (problems)
  4. Learning Essentials Guide:
    breakthrough event is the 2012 Alexnet
    conditions for the rapid development of the neural network (GPU and data)

operation

  1. The main source of data for an image which (to name a few)

    ImageNet, PASCAL VOC, Labelme, COCO et
    https://blog.csdn.net/u012966194/article/details/79676516 (connector)

  2. What sift feature that can be used to do? What idea is to match the pyramid can be used to do? What hog feature that can be used to do?

    2.1 SIFT:. That is scale invariant feature transform (Scale-invariant feature transform, SIFT ), it is a description of an image processing field. This description of scale invariance can detect critical points in the image, a local feature descriptor. SIFT for detecting a local feature points of the image
    https://baike.baidu.com/item/SIFT/1396275?fr=aladdin(SIFT Encyclopedia )
    2.2. A multi-resolution image pyramid is used to explain the structure of the image, by multiscale original image pixel samples embodiment, generating N images of different resolutions, an image having the highest level of resolution on the bottom, a pyramid-like arrangement, a series of pixels is decreased up image pyramid until top image contains only a pixel, which constitutes the image pyramid traditional sense, more image pyramid is an idea different scales of space, can also be used in optical flow, slam acceleration model pose estimation and matching, etc.
    2.3 . Histogram of Oriented Gridients, abbreviated as the HOG, is computer vision, pattern recognition features a very commonly used local texture of the image description. This feature is also very straightforward name from, that first calculate the value of the picture in a certain area in different directions gradients, and then accumulated to obtain a histogram, this histogram it, you can represent this area, that is, as feature, which may be input to the classifier. HOG can be used for detection, it is mainly used for pedestrian detection, vehicle detection, tracking, etc.

  3. Neural networks already exist then why the neural network has only recently (Hint: from data and hardware considerations)

    3.1 From a hardware point of view: to enhance the magnitude of computer hardware
    from the data perspective 3.2: data and methods of artificial intelligence he brings, technology has changed dramatically

  4. What image task, solve any kind of image problem (eg: image classification is to see what the picture of the object it is.)

    Common image tasks image segmentation, image classification (see what objects in the image in particular), object detection (to find the position of an object in a given picture) posture detection, semantic segmentation (identifying present in the image content and location), examples segmentation (image recognition task outline of a certain pixel level), mainly to solve the image problem of different scales of different scenarios

annotation

Note 1:
target video object segmentation process involved, target identification, target detection and target tracking what all refer to?

  1. Object segmentation

    Should Target Segmentation, should be a data / image segmentation, the task is to split off a portion corresponding to a target. For general optical images, divided pixels is a relatively common goal, which is to extract some of the pixels are used to describe a known target. This Segmentation can be a classification (classificatio) problem, each pixel is to do labeling, propose the kind of label pixel of interest. It can also be a clustering problem, that is, do not know the label, but need to meet certain optimality, such as to minimize the correlation between the type of cluster. Of course, some of the divided A primary target is also seen other data, such hyperspectral data, or the need to split the frequency corresponding to the channel which is the goal. Such as a video stream, corresponding to a certain period of time.

  2. Target Recognition

    It should be Target Recognition. This is based on a classification (Classification) recognition (Recognition) problem, that is, given all the data, some of which classify the sample is the goal, which ones are not. Or take the picture as data, for example, the classification level is often not the pixel, given some of the segment, or defined objects (Object), or the picture itself.

  3. Target Detection

    It should be Target Detection. The earliest signal detection system should be engaged in the radar of people and heavily study was first proposed, the simplest task is from the seemingly random (random) and full of interference (interference) and noise (noise), the information has to crawl wherein (information-bearing pattern). The simplest of a chestnut, is that when you get a random radar echo, you can set a threshold, when above this threshold, it is considered to detect a target high-speed large-scale return of the aircraft and the like. Of course, there's a threshold that how design relates to the balance between False Alarm and Miss Detection. People often need to find the best domain transform or to analyze the signal.

  4. Goal Tracking

    It should be Target Tracking. This task is very important point is the first target location (Target Locating), and this task is generally designed to have the time series data (Temporal Data). Often the case that after the first Identify Target is, the algorithm or system needs the next in the data sequence, quickly and efficiently to a given target relocation. Tasks need to distinguish between similar goals, do not need to avoid double counting, make full use of good correlation timing (Temporal Correlation), and requires some simple changes in the Robust, must be rotated, cover, reduce the amplification, linear or nonlinear like Motion Blur Variety.

Note 1 Source: https://www.zhihu.com/question/36500536

Note 2:
Adaboost face detection algorithm in real time how
https://blog.csdn.net/guyuealian/article/details/70995333

Note 3:
Computer Vision standard set of data collation -PASCAL VOC data collection
https://blog.csdn.net/xingwei_09/article/details/79142558

Note 4:
SVM (Support Vector Machine, SVM)

SVM (Support Vector Machine, SVM) are a class by supervised learning (supervised learning) the way the data is binary classification of generalized linear classifiers (generalized linear classifier), whose decision boundary is the largest margin of solving learning samples hyperplane (maximum-margin hyperplane)
the SVM using a hinge loss function (hinge loss) calculated empirical risk (empirical risk) and adding a regularization term to optimize the structure of risk (structural risk), is a sparse and robustness in solving the system classification of [2]. SVM can be non-linear classification by nuclear methods (kernel method), it is one of the common core of learning (kernel learning) method 
SVM was proposed in 1964, after the 1990s developed rapidly and derive a series of improvements and expansion algorithm, there is applied [5-6] in face recognition, text classification pattern recognition (pattern recognition) problem 

https://baike.baidu.com/item/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA/9683835?fromtitle=SVM&fromid=4385807&fr=aladdin

Guess you like

Origin www.cnblogs.com/feng2019/p/11962177.html