python-opencv study notes (7): sliding window and image pyramid

introduction

The content of this section is the experiment I did in the laboratory building. Combined with the official examples and explanations of opencv for these two methods, I want to summarize some notes about myself, which can be regarded as a more systematic study of the operation mode of yolo.

sliding window

The function of Sliding Windows in the target detection process is to locate the position of the target (object, animal, etc.) in the picture. In computer vision, a sliding window is a rectangular box, which slides on the picture in the direction from left to right and from top to bottom to achieve the purpose of extracting each area in the picture. The figure below is an example of a sliding window. You can see that a green rectangle slides on the picture in the direction from left to right and from top to bottom. For each area that the box slides over, we use a classifier to determine whether there is an object in that area.

insert image description here

The code for the sliding window is relatively simple, first import the required packages:

import cv2
from matplotlib import pyplot as plt
from IPython import display
%matplotlib inline

Then we define a function sliding_windowto get the sliding window. This function has three parameters image, windowand step.

  • The first parameter imageis the image of the input function, on which we will slide the rectangle.
  • The second parameter windowis a tuple representing the height and width of the sliding rectangle.
  • The third parameter stepindicates how many pixels the rectangle box moves once, here we can call it the step size .

In the figure below, we use three colors to represent the same rectangular frame in different positions. In the figure, nit means that the rectangular frame moves na distance of pixels each time. If the stepsetting is too small or too large, it will have a negative impact on the target detection. Generally, this value will be set between 4 and 8.

So the code is:

def sliding_window(image, window, step):
    for y in range(0, image.shape[0] - window[1], step):
        for x in range(0, image.shape[1] - window[0], step):
            yield (x, y, image[y:y + window[1], x:x + window[0]]) 

To realize the sliding of the rectangular frame on the picture, we need to know the position of the rectangular frame in the picture each time it slides. We use two forloops to get all the coordinate positions of the rectangle. The first forloop controls the rectangle to stepmove up and down in the picture in steps of , and the second forloop controls the rectangle to stepmove left and right in the picture in steps of . Finally, a tuple is returned through the yieldgenerator , where the first element xand the second element of the tuple yrepresent the coordinates of the upper left corner of the rectangular box, and the third element of the tuple image[y:y + window[1], x:x + window[0]]is the rectangular box at different positions in the picture.

Next we use the cv2.imreadfunction to read the picture, pets.jpg is the name of the picture to be read. Then we define the width window_wof the 400 pixels and the height window_hof the sliding window to be 400 pixels. We use n to denote the number of sliding windows.

image = cv2.imread("pets.jpg")
(window_w, window_h) = (400, 400)

Next we use a forloop to iterate over each sliding window, we need to pass three parameters to sliding_window.

  • The first parameter imageis the image we read.
  • The second parameter (window_w, window_h)represents the width and height of the sliding window.
  • The third parameter 200indicates that the sliding window will have a step size of 200pixels for each sliding (note that for the convenience of demonstration, the values ​​of the sliding window and the step size are set to be large, and it is not recommended to set them too large or too large in actual use. Small).

Then we use a ifstatement to determine whether the obtained sliding window is the same size as the sliding window we set. If the area intercepted by the sliding window (window_w, window_h)is different from any element in the set , execute continueto skip the sliding window.

for (x, y, window) in sliding_window(image, (window_w, window_h), 200):
    if window.shape[0] != window_w or window.shape[1] != window_h:
        continue
        
    clone = image.copy()
    cv2.rectangle(clone, (x, y), (x + window_w, y + window_h), (0, 255, 0), 2)
    clone = clone[:,:,::-1]
    plt.imshow(clone)
    plt.pause(0.1) 
    display.clear_output(wait=True)

Inside the loop we draw each sliding window in the image, use the copyfunction drawing, because the next drawing operation will modify the source image, and then we use cv2.rectangleon cloneto draw each sliding window. Because images in OpenCV are stored in channel order of B, G, R (blue, green, red), and in Matplotlib, images are stored in channel order of R, G, B, we use the clone[:,:,::-1]slicing method to jump pictures The order of the channels is then used to render the plotted results plt.imshowin the page.

Because we want to draw multiple pictures in the forloop , use plt.pause(0.1)to make each picture display pause for a period of time, and the parameter of the function 0.1means to pause for 0.1 seconds. Finally, we use displaythe clear_output(wait=True)method to clear the displayed image in preparation for the display of the next image.

In pycharm, you can see the dynamic image and the corresponding group image of each frame on the right:
insert image description here

Because we want to draw multiple pictures in the forloop , use plt.pause(0.1)to make each picture display pause for a period of time, and the parameter of the function 0.1means to pause for 0.1 seconds. Finally, we use displaythe clear_output(wait=True)method to clear the displayed image in preparation for the display of the next image.

// An highlighted block
var foo = 'bar';

image pyramid

In simple terms, an image pyramid is a representation of an image in multiple different sizes. As shown in the figure below, the leftmost image is the original image, and then the size of the image is reduced from left to right until the size of the image reaches a threshold. This threshold is the minimum size of the image that can be reduced multiple times. Now, multiple layers like this picture whose size gradually increases or decreases are called image pyramids, and each picture of different size is called a layer of the image pyramid. The purpose of the image pyramid is to find objects of different sizes (objects, animals, etc.) that appear in the picture.

insert image description here

There are generally two types of image pyramids:

  • Gaussian pyramid: used for downsampling, and is the main image pyramid.
  • Laplacian pyramid: It is used to reconstruct the unsampled image of the upper layer from the image at the bottom of the pyramid. In digital image processing, it is also the prediction residual, which can restore the image to the greatest extent and is used together with the Gaussian pyramid.

Opencv has encapsulated two APIs for this, namely cv2.pyrUp() and cv2.pyrDown() functions, which we can use to generate images corresponding to the above:
insert image description here

The code used is as follows:

img = cv2.imread('pets.jpg')  
up_img = cv2.pyrUp(img)  # 上采样操作
img_1 = cv2.pyrDown(img)  # 下采样操作
img_2 = cv2.pyrDown(img_1)
# cv2.imshow('up_img', up_img)
cv2.imshow('img', img)
cv2.imshow('img_1', img_1)
cv2.imshow('img_2', img_2)
cv2.waitKey(0)
cv2.destroyAllWindows()

As for the principle of the pyramid, we can create a pyramidfunction , this function will be used to generate the image pyramid, this function has three parameters as shown below.

  • The first parameter imageis the original image to be subjected to the image pyramid operation.
  • The second parameter topis the minimum size of the image will be reduced, we set this parameter to a default value (128, 128), the first 128is the height of the image, and the second 128is the width of the image.
  • The third parameter ratioindicates that each time the image will be reduced by ratioa factor of , we set a default value for this parameter 1.2.
def pyramid(image, top = (128, 128), ratio = 1.2):
    yield image
    
    while True:
        (w, h) = (int(image.shape[1] / ratio), int(image.shape[0] / ratio))
        image = cv2.resize(image, (w, h), interpolation = cv2.INTER_AREA)
        
        if w < top[1] or h < top[0]:
            break
        
        yield image

Inside the function we first use the yieldgenerator to return the original image, because at the bottom of the image pyramid we need an original image. Then use a whileloop to keep shrinking the image size. Until the size of the reduced image is smaller than the previous topparameter . Inside the loop (w, h)represents ratiothe width and height of the image in the previous layer of the image pyramid reduced by a factor of . We use the cv2.resizemethod to scale the image in the previous layer, and we will use it (w, h)as the second parameter of the function, indicating the width and height of the scaled image.

As the size of the image continues to shrink, we use the ifstatement determine whether the size of the image has reached the set minimum size, and compare the width and height of each image scaling with the set minimum size top, if it is smaller than the minimum size, use breakEnd the loop. Finally, use the yieldgenerator to return the image after each zoom. At this point, the function of the image pyramid is constructed.

for i in pyramid(image, ratio = 1.5):
    i = i[:,:,::-1]
    plt.imshow(i)
    plt.pause(0.3) 
    display.clear_output(wait=True)

insert image description here
You can see that the coordinate axis is changing, and the resolution is blurring because of downsampling. If you want to make the same group image as above, you can add a container to copy each changed image, and pop up the display after the for loop. No more demos.

Image pyramid combined with sliding window

In the traditional object detection method, the combination of image pyramid and sliding window is used to detect objects in different positions and sizes in the picture. When using the sliding window method, the size of the rectangle sliding on the image is fixed, which results in that if the size of the target is too large or too small relative to the rectangle, we will not be able to detect the target. In this case, we can solve this problem by performing a sliding window operation on each layer of the image pyramid. As shown on the left side of the picture below, the dog cannot be completely surrounded by a rectangular frame, and the rectangular frame can only cover part of the dog's face area; the picture on the right side of the following figure, by using the combination of image pyramid and sliding window, the size of the rectangular frame is not change, but in the reduced image the dog is completely enclosed by the rectangle.

Then, combining the above two methods of sliding window + image pyramid, the size of the rectangular box can remain unchanged, but as the picture continues to shrink, the rectangular box gradually wraps the target.

for i in pyramid(image, ratio = 1.5):
    for (x, y, window) in sliding_window(i, (window_w, window_h), 100):
        if window.shape[0] != window_w or window.shape[1] != window_h:
            continue
            
        clone = i.copy()
        cv2.rectangle(clone, (x, y), (x + window_w, y + window_h), (0, 255, 0), 2)
        clone = clone[:,:,::-1]
        plt.imshow(clone)
        plt.pause(0.01) 
        display.clear_output(wait=True)

insert image description here

Guess you like

Origin blog.csdn.net/submarineas/article/details/123347906