introduction
The content of this section is the experiment I did in the laboratory building. Combined with the official examples and explanations of opencv for these two methods, I want to summarize some notes about myself, which can be regarded as a more systematic study of the operation mode of yolo.
sliding window
The function of Sliding Windows in the target detection process is to locate the position of the target (object, animal, etc.) in the picture. In computer vision, a sliding window is a rectangular box, which slides on the picture in the direction from left to right and from top to bottom to achieve the purpose of extracting each area in the picture. The figure below is an example of a sliding window. You can see that a green rectangle slides on the picture in the direction from left to right and from top to bottom. For each area that the box slides over, we use a classifier to determine whether there is an object in that area.
The code for the sliding window is relatively simple, first import the required packages:
import cv2
from matplotlib import pyplot as plt
from IPython import display
%matplotlib inline
Then we define a function sliding_window
to get the sliding window. This function has three parameters image
, window
and step
.
- The first parameter
image
is the image of the input function, on which we will slide the rectangle. - The second parameter
window
is a tuple representing the height and width of the sliding rectangle. - The third parameter
step
indicates how many pixels the rectangle box moves once, here we can call it the step size .
In the figure below, we use three colors to represent the same rectangular frame in different positions. In the figure, n
it means that the rectangular frame moves n
a distance of pixels each time. If the step
setting is too small or too large, it will have a negative impact on the target detection. Generally, this value will be set between 4 and 8.
So the code is:
def sliding_window(image, window, step):
for y in range(0, image.shape[0] - window[1], step):
for x in range(0, image.shape[1] - window[0], step):
yield (x, y, image[y:y + window[1], x:x + window[0]])
To realize the sliding of the rectangular frame on the picture, we need to know the position of the rectangular frame in the picture each time it slides. We use two for
loops to get all the coordinate positions of the rectangle. The first for
loop controls the rectangle to step
move up and down in the picture in steps of , and the second for
loop controls the rectangle to step
move left and right in the picture in steps of . Finally, a tuple is returned through the yield
generator , where the first element x
and the second element of the tuple y
represent the coordinates of the upper left corner of the rectangular box, and the third element of the tuple image[y:y + window[1], x:x + window[0]]
is the rectangular box at different positions in the picture.
Next we use the cv2.imread
function to read the picture, pets.jpg is the name of the picture to be read. Then we define the width window_w
of the 400 pixels and the height window_h
of the sliding window to be 400 pixels. We use n to denote the number of sliding windows.
image = cv2.imread("pets.jpg")
(window_w, window_h) = (400, 400)
Next we use a for
loop to iterate over each sliding window, we need to pass three parameters to sliding_window
.
- The first parameter
image
is the image we read. - The second parameter
(window_w, window_h)
represents the width and height of the sliding window. - The third parameter
200
indicates that the sliding window will have a step size of200
pixels for each sliding (note that for the convenience of demonstration, the values of the sliding window and the step size are set to be large, and it is not recommended to set them too large or too large in actual use. Small).
Then we use a if
statement to determine whether the obtained sliding window is the same size as the sliding window we set. If the area intercepted by the sliding window (window_w, window_h)
is different from any element in the set , execute continue
to skip the sliding window.
for (x, y, window) in sliding_window(image, (window_w, window_h), 200):
if window.shape[0] != window_w or window.shape[1] != window_h:
continue
clone = image.copy()
cv2.rectangle(clone, (x, y), (x + window_w, y + window_h), (0, 255, 0), 2)
clone = clone[:,:,::-1]
plt.imshow(clone)
plt.pause(0.1)
display.clear_output(wait=True)
Inside the loop we draw each sliding window in the image, use the copy
function drawing, because the next drawing operation will modify the source image, and then we use cv2.rectangle
on clone
to draw each sliding window. Because images in OpenCV are stored in channel order of B, G, R (blue, green, red), and in Matplotlib, images are stored in channel order of R, G, B, we use the clone[:,:,::-1]
slicing method to jump pictures The order of the channels is then used to render the plotted results plt.imshow
in the page.
Because we want to draw multiple pictures in the for
loop , use plt.pause(0.1)
to make each picture display pause for a period of time, and the parameter of the function 0.1
means to pause for 0.1 seconds. Finally, we use display
the clear_output(wait=True)
method to clear the displayed image in preparation for the display of the next image.
In pycharm, you can see the dynamic image and the corresponding group image of each frame on the right:
Because we want to draw multiple pictures in the for
loop , use plt.pause(0.1)
to make each picture display pause for a period of time, and the parameter of the function 0.1
means to pause for 0.1 seconds. Finally, we use display
the clear_output(wait=True)
method to clear the displayed image in preparation for the display of the next image.
// An highlighted block
var foo = 'bar';
image pyramid
In simple terms, an image pyramid is a representation of an image in multiple different sizes. As shown in the figure below, the leftmost image is the original image, and then the size of the image is reduced from left to right until the size of the image reaches a threshold. This threshold is the minimum size of the image that can be reduced multiple times. Now, multiple layers like this picture whose size gradually increases or decreases are called image pyramids, and each picture of different size is called a layer of the image pyramid. The purpose of the image pyramid is to find objects of different sizes (objects, animals, etc.) that appear in the picture.
There are generally two types of image pyramids:
- Gaussian pyramid: used for downsampling, and is the main image pyramid.
- Laplacian pyramid: It is used to reconstruct the unsampled image of the upper layer from the image at the bottom of the pyramid. In digital image processing, it is also the prediction residual, which can restore the image to the greatest extent and is used together with the Gaussian pyramid.
Opencv has encapsulated two APIs for this, namely cv2.pyrUp() and cv2.pyrDown() functions, which we can use to generate images corresponding to the above:
The code used is as follows:
img = cv2.imread('pets.jpg')
up_img = cv2.pyrUp(img) # 上采样操作
img_1 = cv2.pyrDown(img) # 下采样操作
img_2 = cv2.pyrDown(img_1)
# cv2.imshow('up_img', up_img)
cv2.imshow('img', img)
cv2.imshow('img_1', img_1)
cv2.imshow('img_2', img_2)
cv2.waitKey(0)
cv2.destroyAllWindows()
As for the principle of the pyramid, we can create a pyramid
function , this function will be used to generate the image pyramid, this function has three parameters as shown below.
- The first parameter
image
is the original image to be subjected to the image pyramid operation. - The second parameter
top
is the minimum size of the image will be reduced, we set this parameter to a default value(128, 128)
, the first128
is the height of the image, and the second128
is the width of the image. - The third parameter
ratio
indicates that each time the image will be reduced byratio
a factor of , we set a default value for this parameter1.2
.
def pyramid(image, top = (128, 128), ratio = 1.2):
yield image
while True:
(w, h) = (int(image.shape[1] / ratio), int(image.shape[0] / ratio))
image = cv2.resize(image, (w, h), interpolation = cv2.INTER_AREA)
if w < top[1] or h < top[0]:
break
yield image
Inside the function we first use the yield
generator to return the original image, because at the bottom of the image pyramid we need an original image. Then use a while
loop to keep shrinking the image size. Until the size of the reduced image is smaller than the previous top
parameter . Inside the loop (w, h)
represents ratio
the width and height of the image in the previous layer of the image pyramid reduced by a factor of . We use the cv2.resize
method to scale the image in the previous layer, and we will use it (w, h)
as the second parameter of the function, indicating the width and height of the scaled image.
As the size of the image continues to shrink, we use the if
statement determine whether the size of the image has reached the set minimum size, and compare the width and height of each image scaling with the set minimum size top
, if it is smaller than the minimum size, use break
End the loop. Finally, use the yield
generator to return the image after each zoom. At this point, the function of the image pyramid is constructed.
for i in pyramid(image, ratio = 1.5):
i = i[:,:,::-1]
plt.imshow(i)
plt.pause(0.3)
display.clear_output(wait=True)
You can see that the coordinate axis is changing, and the resolution is blurring because of downsampling. If you want to make the same group image as above, you can add a container to copy each changed image, and pop up the display after the for loop. No more demos.
Image pyramid combined with sliding window
In the traditional object detection method, the combination of image pyramid and sliding window is used to detect objects in different positions and sizes in the picture. When using the sliding window method, the size of the rectangle sliding on the image is fixed, which results in that if the size of the target is too large or too small relative to the rectangle, we will not be able to detect the target. In this case, we can solve this problem by performing a sliding window operation on each layer of the image pyramid. As shown on the left side of the picture below, the dog cannot be completely surrounded by a rectangular frame, and the rectangular frame can only cover part of the dog's face area; the picture on the right side of the following figure, by using the combination of image pyramid and sliding window, the size of the rectangular frame is not change, but in the reduced image the dog is completely enclosed by the rectangle.
Then, combining the above two methods of sliding window + image pyramid, the size of the rectangular box can remain unchanged, but as the picture continues to shrink, the rectangular box gradually wraps the target.
for i in pyramid(image, ratio = 1.5):
for (x, y, window) in sliding_window(i, (window_w, window_h), 100):
if window.shape[0] != window_w or window.shape[1] != window_h:
continue
clone = i.copy()
cv2.rectangle(clone, (x, y), (x + window_w, y + window_h), (0, 255, 0), 2)
clone = clone[:,:,::-1]
plt.imshow(clone)
plt.pause(0.01)
display.clear_output(wait=True)