Object Recognition and Detection Related Concepts

1. Clarify several concepts:

1.  Target Segmentation:The task is to segment the corresponding part of the target.

Pixel-level foreground and background classification problem, the background is culled.

Example: (Take the tracking of Xiao Ming in the video as an example, and list the processing process)

The first step is to perform target segmentation and collect the first frame of video image. Because the skin color of the human face is yellowish, the human face and the background can be segmented by color features.

2.  Target Detection:

Locate the target, determine the target location and size. Detect the presence or absence of the target.

The detection has a clear purpose, and what needs to be detected is to obtain samples, then train to obtain a model, and finally go directly to the image for matching, which is actually the process of recognition.

Example: In the second step, target recognition is performed. The segmented image may not only contain human faces, but may also include objects with yellowish colors in some environments. At this time, all the human faces in the image can be identified by certain shape features. Find out exactly and determine its location and scope.

 3. Target Recognition: Qualitative target, determine the specific pattern (category) of target.

Example: In the third step, target recognition is performed, and all faces in the image are compared with Xiaoming's facial features to find the best match, so as to determine which is Xiaoming.

 4. Target Tracking: Track the target movement trajectory.

Example: In the fourth step, target tracking is performed. In each subsequent frame, there is no need to detect Xiaoming in the whole image as in the first frame. Instead, a motion model can be established according to Xiaoming's movement trajectory, and the next frame of Xiaoming can be detected through the model. Predict the location to improve tracking efficiency.

 

 2.  Target recognition

Reference blog; http://blog.csdn.net/liuheng0111/article/details/52348874

 (1) The task of target recognition :

Identify what object is in the image and report the object's position and orientation in the scene represented by the image. For target recognition of a given picture, first determine whether there is a target. If there is no target, the detection and recognition are over. If there is a target, it is necessary to further determine how many targets are there and where the targets are located, and then the target is carried out. Segmentation to determine which pixels belong to the target.

 

(2) The process of target recognition :






1. Object Recognition Framework

 Object recognition often includes the following stages: preprocessing, feature extraction, feature selection, modeling, matching, and localization . At present, object recognition methods can be classified into two categories: one is model-based or context-based recognition methods, and the other is two-dimensional object recognition or three-dimensional object recognition methods. For the evaluation criteria of object recognition methods, Grimson summed up 4 criteria that most researchers mainly recognized: robustness, correctness, efficiency and scope.

 

2. Creation of training samples

The training samples include positive samples and negative samples; the positive samples refer to the target samples to be inspected (such as faces or cars, etc.), and the negative samples refer to other arbitrary pictures (such as backgrounds, etc.) that do not contain targets. Normalize to the same size (eg, 20x20).

 

3. Preprocessing

Preprocessing usually includes five basic operations:

(1) Coding: an effective description of the realization mode, suitable for computer operations.

(2) Threshold or filter operation: select some functions as needed and suppress others.

(3) Pattern improvement: Eliminate or correct errors in the pattern, or unnecessary function values.

(4) Normalization: adapt some parameter values ​​to standard values, or standard value ranges.

(5) Discrete mode operation: special operation in discrete mode processing.


 4. Feature extraction

Generally, we call the space composed of the original data the measurement space, and the space on which the classification and recognition are based is called the feature space. Patterns represented in space. Feature extraction is the first step in object recognition and an important part of the recognition method. Good image features make different objects have better separation in the high-dimensional feature space, which can effectively reduce the subsequent steps of the recognition algorithm. The burden of achieving multiplier effect is achieved with half the effort. The following are some commonly used feature extraction methods:

(1) Color features . Color features describe the surface properties of the scene corresponding to an image or image area. Commonly used color features include image slice features and color channel histogram features.

(2) Texture features . Texture is usually defined as some local property of an image, or a measure of the relationship between pixels in a local area. An effective method for texture feature extraction is based on the gray-level spatial correlation matrix, that is, the co-occurrence matrix. Other methods include feature extraction based on image friendliness difference histogram and feature extraction based on image gray-level co-occurrence matrix.

(3) Shape features . Shape is one of the basic features to describe objects. It is very intuitive to use shape features to distinguish objects. Using shape features to retrieve images can improve the accuracy and efficiency of retrieval. Shape feature analysis plays an important role in pattern recognition and visual detection. In general, there are two types of representation methods for shape features, one is the shape outline feature description, and the other is the shape region feature. The shape contour features mainly include: straight line segment description, spline fitting curve, Bolier descriptor, interior angle histogram and Gaussian parameter curve, etc. The shape area features mainly include: shape independent moment, area area, shape vertical and horizontal than wait.

(4) Spatial features . Spatial feature refers to the mutual spatial position or relative directional relationship between multiple targets segmented in an image. There is relative position information, such as up, down, left, right, and absolute position information. The basic idea of ​​commonly used methods for extracting spatial features is right After the image is segmented and features are extracted, these features are indexed.

The more popular targets are: Haar feature, LBP feature, HOG feature, and Shif feature, etc.; each of them has its own merits, depending on the target situation you want to detect.

 

5. Feature selection (optional step)

6. Modeling

7. Train the classifier

Training a classifier can be understood as the classifier (brain) through the observation (learning) of positive samples and negative samples, so that it has the ability to detect the target (the target can be recognized in the future).

8. Match

After obtaining the training results (often represented as the value of a set of parameters in describing, generating or distinguishing models, and as the acquisition and storage of a set of features in other models), the next task is to use the current model to identify new which class of object the image belongs to, and if possible, give boundaries that separate the object from the rest of the image. Generally, when the model is determined, the matching algorithm will appear naturally. In the description model, each type of object is usually modeled, and then the maximum likelihood or Bayesian inference is used to obtain the category information; the generative model is roughly the same, except that the value of the latent variable is usually estimated first, or the Hidden variable integration, this step often leads to a huge computational load; distinguishing the model is simpler, and the result is obtained by substituting the feature value into the classifier.

The general matching process is as follows: a scanning sub-window is used to continuously shift and slide in the image to be detected. Every time the sub-window reaches a position, the feature of the region will be calculated, and then the classifier we have trained will be used for the feature. A filter is performed to determine whether the area is a target. Then, because the size of the target image may be different from the sample image size you used when training the classifier, you need to make the scanned sub-window larger or smaller (or make the image smaller), and then slide in the image, match again.

9. Target recognition

The object recognition method is to use various matching algorithms to find the best match with the object model library according to the features extracted from the image. Its input is the image and the model library of the object to be recognized, and the output is the name of the object, attitude, position, etc.

 

 

3. Target recognition using TensorFlow

目标:利用TensorFlow基于ImageNet数据库的一个子集构建一个用于图像分类的CNN模型。训练CNN模型需要使用TensorFlow处理图像并理解卷积神经网络的使用方法。

使用Stanford's Dogs Dataset(ImageNet的一个子集),数据集包括不同品种的狗的图像及其品种标签。建模目标是给定一幅图像时可以精确预测其中包含的狗的品种。

卷积神经网络(CNN):适用于稠密的输入,输入分量大部分非0

滤波器/卷积核 (训练目标是调节这些卷积核的权值,直到与训练数据精确匹配)

CNN架构:图像输入(image_batch)——卷积层(tf.nn.conv2d)——修正线性单元/激活函数(tf.nn.relu)——池化层(tf.nn.max_pool)——全连接层(tf.matmul(x,W)+b)

激活函数为神经网络引入非线性,能对更复杂的模式进行描述。

池化层可减少过拟合,并通过减小输入的尺寸提高性能,对输入降采样。

卷积

输入与卷积核(kernel):

输入向量格式 [image_size,image_height,image_width,image_channels]

跨度(strides):

卷积核跳过输入中的一些固定数目的元素,降低输出的维数。Strides参数的格式与输入向量相同。

边界填充(paddiing=SAME或VALID):卷积核与图像尺寸不匹配时零填充或错误状态

 

补充:循环神经网络(RNN)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325680849&siteId=291194637