Computer Vision - Final Review (Short Answer Questions)

1. The difference between computer vision and machine vision

Computer vision is the use of computers to realize human visual functions, that is, the perception, processing, and interpretation of three-dimensional scenes in the objective world, focusing on the theory and methods of scene analysis and image interpretation, while machine vision pays more attention to obtaining images of the environment through visual sensors. Build systems with visual perception and implement algorithms to detect and recognize objects.

2. The difference between Euclidean distance, urban distance and chessboard distance

The result given by Euclidean distance is the most accurate, but the calculation requires square and square root operations, which requires a large amount of calculation. The urban distance and the chessboard distance are non-Euclidean distances, which do not require square and root calculations, and the amount of calculation is small, but the results have certain errors. The calculation of the distance only considers the position of two pixels in the image, and does not consider the gray value of the two pixels.
3. The principle of USAN judgment method

Each pixel in the template is compared with the gray level of the kernel pixel in the center of the template. USAN is maximized when the circular template is in the light area (target area) and dark area (background area). When 1/4 of the circular template is in the dark area, the USAN is the smallest. When 1/2 of the circular template is in the dark area, the USAN for the edge point is half of the maximum value. When the USAN area is more than half, the kernel pixels are in the gray-scale consistent area in the image

4. The basic principle of Hough transform

Using the duality of point and line, the given curve in the original image space is transformed into a point in the parameter space through the curve expression form. The detection problem of the given curve in the original image is transformed into the problem of finding the peak value in the parameter space; that is Transform the detection of overall characteristics into detection of local characteristics. Such as straight lines, ellipses, circles, arcs, etc.

5. Principle of straight line detection

A straight line is determined by two points A and B (Cartesian coordinates); y=kx+q is written as a function expression about (k,q) q=-kx+y; the transformed space is Hough space, that is, Cartesian coordinates A straight line in the system corresponds to a point in Hough space; and vice versa. Steps: (1) Quantify the possible value ranges of parameters p and q in the parameter space, construct an accumulative array A (pmin : pmax , qmin : qmax) according to the quantization result, and initialize it to zero; (2) For each XY At a given point in the space, let p take all possible values, use q==px+y to calculate q, and accumulate A according to the values ​​​​of p and q, that is, A(p, q) = A(p, q) + 1 ( 3) According to p and q corresponding to the maximum value in A after accumulation, a straight line in XY is determined by y=px+q

6. Position histogram detection principle  ( horizontal histogram )

Position histogram: The histogram obtained by projecting the image to multiple axes and summing the gray levels of the pixels. Assuming that there are multiple separate targets in the image, each target is projected and superimposed in the horizontal direction and vertical direction respectively, and a horizontal histogram and a vertical histogram can be obtained respectively. By back-projecting the two histograms to the target area, the position of each target can be determined. In practice, the target in the image can be detected according to the two histogram distributions.

  1. Three Factors in the Region of Most Stability

Most Stable Region: Starting from the target seed, a sequence of (nested) surrounding regions is generated using a region growing strategy. The gap between the intervals is large + the area is close + there is a certain size. Factors (1) Whether there is a high contrast between this area and its outer surrounding area (2) Whether the interior of this area has only a low contrast (that is, very smooth) (3) The area of ​​this area should not be too small, otherwise the noise will affect it too much big

8. The purpose of target segmentation , the difference between semantic segmentation and instance segmentation

Target segmentation: refers to the separation and extraction of the target area of ​​interest from the image, that is, image segmentation; in the field of computer vision, image segmentation refers to the process of subdividing a digital image into multiple image sub-regions (a collection of pixels). Simplify or change the representation of an image to make it easier to understand and analyze. Image segmentation is often used to locate objects and boundaries (lines, curves, etc.) in an image. Image segmentation is a process of labeling each pixel in an image so that pixels with the same label have some common visual characteristics.

  1. The steps and principle of active contour model

A method of object segmentation based on edge information. The main principle: by constructing the energy functional function, driven by the minimum value of the energy function, the contour curve gradually approaches the edge of the object to be detected, and finally the target is segmented. First create an initial curve in the image, the shape is not limited, but the contour line of the target object needs to be included in the internal test; establish an "energy equation", including the "internal energy" for the purpose of standardizing the shape of the curve, and the standard curve is close to the contour line of the target object The degree of "external energy"; in the calculation process, minimizing the internal energy can make the curve continue to shrink inward and keep it smooth; minimizing the external energy can make the curve continue to be close to the contour of the target object and reach a consistent position

11. Average shift to determine the cluster center

First randomly select an initial region of interest (initial point) and determine its center of gravity. Next, search for regions of interest with higher point density around the region and determine its center of gravity, and then move the window to the position determined by the new center of gravity. Here The displacement vector between the original center of gravity and the new center of gravity is correspondingly shifted, and the above process is repeated to continuously move the mean value until convergence. Here the final center of gravity position determines the maximum value of the local density, that is, the mode value of the local probability density function.

13. The difference between target expression and description

Expression and description of the target: Realize the qualitative or quantitative representation and description of the geometric properties of the target of interest in the region obtained by image segmentation. The expression of the target focuses on the data structure; the description of the target focuses on the regional characteristics of the target and the connections and differences between different regions.

14. LBP meaning, detection process, core idea

Idea: Use the gray value of the central pixel as the threshold, compare it with its neighbors, and get the corresponding binary code to represent the local texture feature

Process: Determine the value of 1 or 0 according to the size of the adjacent pixel compared with the intermediate adjacent point (adjacent pixel value >= intermediate pixel value, 1; < is 0); perform neighborhood information on each pixel Integrating, it becomes the encoding of the numbers of each bit in the neighborhood; then perform histogram statistics on all pixel encodings in a block to obtain LBP features.

15. Principle of Pattern Classification

For an unknown pattern x, if it is substituted into all decision functions and the value of di (x) is the largest, then x belongs to the i-th category. If, for a value of x, di(x) = dj(x), then we get a decision boundary separating class i from class j: dij(x)=di(x)-dj(x)=0. d ij (x) > 0, the pattern belongs to class si, otherwise, it belongs to class sj. Map the data set to a given category by constructing a classification function or classification model, that is, classify samples with given characteristics through a classifier

Get information-->preprocessing-->feature extraction-->build classifier-->decision output

Motion detection using image difference : In sequence images, the difference between two frames of images before and after is directly calculated by pixel-by-pixel comparison; assuming that the lighting conditions basically do not change between multiple frames of images, then the difference between images may be motion the result of.

  1. frame difference method

Basic idea: extract the contour of the moving target by performing differential operations on two frames (or multiple frames) of images in the video sequence.

Principle: (1) Calculate the absolute value of each corresponding pixel difference (pixel gray value difference) in the two frames of images to obtain the frame difference image

(2) The frame difference image is binarized: the gray value difference of a certain pixel in the two frames of images is greater than the set threshold, and the pixel is judged as the foreground (moving target), that is, the two images at the front and rear moments change. If the pixel gray value difference is less than the set threshold, the pixel point is determined as the background part, that is, no change occurs; all the pixels determined as the moving target part form a relatively complete target shape and shape on the current observation image frame. its size and location information.

keyframe extraction

Principle: The difference between two frames of images is obtained to obtain the average pixel intensity of the image, which can be used to measure the change of the two frames of images. Based on the average intensity of the difference between frames, whenever a certain frame in the video has a large change in the content of the previous frame, it is considered as a key frame and extracted.

Process: Take the video, calculate the inter-frame difference between each two frames in turn, and obtain the average inter-frame difference intensity; extract key frames (use the order of difference intensity, use the difference intensity threshold, and use the local maximum value)

  1. Background subtraction method ( a general method for motion segmentation of static scenes )

The currently acquired image frame and the background image are differentially calculated to obtain the grayscale image of the target motion area, and the grayscale image is thresholded to extract the motion area.

In order to avoid the influence of ambient light changes, the background image is updated according to the currently acquired image frame. Interferenced by scene environment changes, light, weather and external factors, the background model needs to be updated regularly, in real time, or according to certain rules, so the update strategy of the background model is also one of the key links of background subtraction.

Principle: (1) Establish the background model image of the video sequence; (2) Find the pixel gray value difference between the current frame image and the corresponding pixel of the background model image (3) Binarize the current frame difference image (if two frames of images If the gray value difference of the pixel at the corresponding position is greater than the set threshold, the pixel is judged as the part of the foreground (moving object); if the pixel gray value difference is smaller than the set threshold, the pixel is judged as the part of the background Part. All the pixels judged as the foreground target constitute relatively complete target shape and position information on the current frame (observation) image. (4) Update the background model according to certain rules.

background modeling

1) Single frame extraction method

A method to directly extract a certain frame image in a video sequence as a background model image (,). It is usually used to be able to detect temporarily appearing moving objects when the frame is used as a background reference image within a certain period of time, and it is mostly used in scenes where the background does not change for a period of time.

2) Multi-frame statistical averaging method

The statistical averaging method refers to taking continuous multi-frame images from the video stream, and obtaining a new image as the background model image (,) by averaging the gray values ​​of each pixel in the multi-frame images. Statistical averaging method assumes that although some points in the background part are sometimes occluded by the foreground object, most of the time, the image in the background part can be considered to be constant or gradually changing.

3) Median method

It refers to taking continuous multi-frame images from the video stream, and sorting the gray value of the pixels at the same position in the multi-frame images, and then taking the median value as the pixel gray value of the corresponding position in the background image, that is, the background The gray value of each pixel of the image is determined by the median value of the gray value of the corresponding pixel in the sequence image.

4) Model-based approach

Model-based methods are divided into single-modal Gaussian background model method and multi-modal (mixed) Gaussian background model method

Guess you like

Origin blog.csdn.net/m0_46493223/article/details/125648491