Chapter 17: Image Segmentation Extraction

In the process of image processing, we often need to extract foreground objects from the image. For example, in video surveillance, what we observe is the video content under a fixed background, but we are not interested in the background itself, but in the vehicles, pedestrians or other objects appearing in the background. We want to extract these objects out of the video and ignore video content where no objects go into the background.

1. Use the watershed algorithm to realize image segmentation and extraction:

Image segmentation is a very important operation in image processing. The watershed algorithm compares the image to the terrain surface in geography to achieve image segmentation. This algorithm is very effective.

1. Algorithm principle:

​Any grayscale image can be regarded as a geographical topographic surface. Areas with high grayscale values ​​can be regarded as mountain peaks, and areas with low grayscale values ​​can be regarded as valleys. As shown in the figure below, the left image is the original image, and the right image is its corresponding "topographic surface".

image-20211209201008028

​ If we "infuse" water of different colors into each valley (the expression on the OpenCV official website is used here, Gonzalez expresses infusion as making a hole in the valley, and then letting the water rise through the hole at a uniform rate) . Then, as the water level continues to rise, water from different valleys will come together. In this process, in order to prevent the water from different valleys from converging, we need to build dikes where the water flows may meet. This process separates the imagery into two distinct sets: catchment basins and watershed lines. The dikes we build are the watershed lines, which are the segmentation of the original image. This is the watershed algorithm.

​ The left image in the figure below is the original image, and the right image is the image segmentation result obtained by using the watershed algorithm. The CMM website not only provides the sample image, but also provides animation demonstration effects, interested readers can go to the website to have a look.

image-20211209201347233

However, due to the influence of noise and other factors, the above-mentioned basic watershed algorithm often results in over-segmentation. Over-segmentation will divide the image into dense independent small blocks, making the segmentation meaningless. The image below shows an over-segmented image. The image on the left is the image of electrophoresis, and the image on the right is the result image of over-segmentation. It can be seen that the phenomenon of over-segmentation is very serious.

image-20211209202055976

In order to improve the image segmentation effect, an improved watershed algorithm based on the mask is proposed . The improved watershed algorithm allows the user to mark out what he considers to be the same segmented area (the marked part is called a mask). In this way, when the watershed algorithm is processing, it will process the labeled part as the same segmented area.

​ In the picture below, the left picture is the original image, which we have labeled, and the three small color blocks marked as dark indicate that when using the mask watershed algorithm, the colors contained in these parts will be divided in the same area. The segmentation result obtained by using the mask watershed algorithm is shown in the right figure in the figure.

image-20211209202809422

​The improved watershed algorithm is used to mask the electrophoretic image on the left side of the figure to obtain the segmentation result on the right side. It can be seen that the segmentation results are significantly improved.

image-20211209202919067

2. Introduction of related functions:

​ In OpenCV, the function cv2.watershed() can be used to implement the watershed algorithm. In the specific implementation process, it is also necessary to complete image segmentation with the help of morphological functions and distance transformation functions cv2.distanceTransform() and cv2.connectedComponents(). The following is a brief description of the functions used in the watershed algorithm.

  • Before using the watershed algorithm, some preprocessing of the image is required. The following are the steps of image preprocessing.
  • That is, image segmentation can be performed using the watershed algorithm, and the image needs to be preprocessed to use the watershed algorithm.
  • The watershed algorithm implemented by the watershed function in OpenCV is a segmentation algorithm based on "marking", which is used to solve the problem of over-segmentation of the traditional watershed algorithm.

(1) Review of morphological functions:

Before using the watershed algorithm to segment the image, it is necessary to perform simple morphological processing on the image. First review the basic operations in morphology.

  • Opening operation: The opening operation is an operation of erosion first and then dilation, and the opening operation can remove the noise in the image. For example, in the figure below, if the left image is eroded first, the middle image will be obtained, and then the middle image will be dilated to obtain the right image. Opening the image can remove the noise in the image. Before using the watershed algorithm to process the image, the noise in the image should be removed by using the open operation, so as to avoid the possible interference caused by the noise to the image segmentation.

    image-20211209204019537

  • Obtain image boundaries: The boundaries of images can be obtained through morphological operations and subtraction operations. For example, in the image below, the left image is the original image, the middle image is the image obtained by corroding it, and the image on the right is obtained by subtracting the two. It can be seen from observation that the right image is the boundary of the left image.

    image-20211209204320121

    From the above analysis, it can be seen that the boundary information of the image can be obtained by using the morphological operation and the subtraction operation. However, morphological operations are only applicable to relatively simple images. If the foreground objects in the image are connected, the boundaries of each sub-image cannot be accurately obtained using morphological operations.

(2) The distance transformation function distanceTransform:

​ When the subgraphs in the image are not connected, you can directly use the morphological erosion operation to determine the foreground object, but if the subgraphs in the image are connected together, it is difficult to determine the foreground object. At this point, the foreground object can be easily extracted by means of the distance transformation function cv2.distanceTransform().

​The distance transformation function cv2.distanceTransform() calculates the distance from all points in the binary image to the nearest background point (that is, the distance from the non-zero value pixel in the image to the nearest zero value pixel). Of course, if the value of the pixel itself is 0, then this distance is also 0.

​ The calculation result of the distance transformation function cv2.distanceTransform() reflects the distance relationship between each pixel in the image and the background (pixels with a value of 0). usually:

  • If the center (centroid) of the foreground object is far away from the pixel with a value of 0, it will get a larger value.
  • If the edge of the foreground object is closer to the pixel with a value of 0, it will get a smaller value.

If the above calculation results are thresholded, information such as the center and skeleton of the foreground object in the image can be obtained. The distance transformation function cv2.distanceTransform() can be used to calculate the center of the object, and can also refine the outline, obtain the foreground of the image, etc., and has many functions.

The syntax format of the distance transformation function cv2.distanceTransform() is:

  • dst=cv2.distanceTransform(src,distanceType,maskSize[,dstType]])

    • src: is an 8-bit single-channel binary image.

    • distanceType: It is a distance type parameter, and its specific value and meaning are shown in the table.

      image-20211209210708877

      image-20211209210725379

    • maskSize: It is the size of the mask, and its possible values ​​are shown in the table. It should be noted that when distanceType=cv2.DIST_L1 or cv2.DIST_C, maskSize is forced to be 3 (because there is no difference between setting it to 3 and setting it to 5 or greater).

      image-20211209210856176

    • dstType: the type of the target image, the default value is CV_32F.

    • dst: The return value, indicating the calculated target image, which can be an 8-bit or 32-bit floating point number, and the size is the same as src.

Example: Use the distance transformation function cv2.distanceTransform() to calculate the certain foreground of an image and observe the effect

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('img.jpg')
# img = cv2.imread('../sugar.tiff')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 阈值处理
rst, thresh = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# 开运算
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)

# 图像距离计算
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)

# 获取前景对象的中心
rst, front = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)

plt.subplot(161)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(162)
plt.imshow(gray_img, cmap='gray')
plt.title('gray_img')
plt.axis('off')

plt.subplot(163)
plt.imshow(thresh, cmap='gray')
plt.title('thresh')
plt.axis('off')

plt.subplot(164)
plt.imshow(opening, cmap='gray')
plt.title('opening')
plt.axis('off')

plt.subplot(165)
plt.imshow(dist_transform, cmap='gray')
plt.title('dist_transform')
plt.axis('off')

plt.subplot(166)
plt.imshow(front, cmap='gray')
plt.title('front')
plt.axis('off')

plt.show()

image-20211210171816909

(3) Determine the unknown area:

The dilation operation using morphology can "inflate" the foreground in the image. When the foreground in the image is enlarged, the background will be "compressed", so the background information obtained at this time must be smaller than the actual background, and does not include the "determined background" of the foreground. Hereinafter, the determination background is referred to as B for convenience of description.

​The distance transformation function cv2.distanceTransform() can obtain the "center" of the image and obtain the "determined foreground". For the convenience of description, the determined foreground is referred to as F.

​ With the determined foreground F and determined background B in the image, the remaining area is the unknown area UN. This part of the area is the area to be further clarified by the watershed algorithm

For an image O, the unknown area UN can be obtained through the following relationship:

  • Unknown area UN = image O - determine background B - determine foreground F

Arranging the above expressions, we can get:

  • Unknown area UN=(image O-determined background B)-determined foreground F

The "image O-determined background B" in the above formula can be obtained by performing a morphological expansion operation on the image. Foreground object dilation = image o - determine background B

Example: Annotate the determined foreground, determined background and unknown regions of an image

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('img.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 阈值分割
rst, thresh = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((3, 3), dtype=np.uint8)
# 开运算
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
# 膨胀
bg = cv2.dilate(opening, kernel, iterations=3)

# 距离计算
dist_tansform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
rst, fore = cv2.threshold(dist_tansform, 0.7 * dist_tansform.max(), 255, 0)
fore =  np.uint8(fore)
un = cv2.subtract(bg, fore)

plt.subplot(221)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(222)
plt.imshow(bg)
plt.title('bg')
plt.axis('off')

plt.subplot(223)
plt.imshow(fore)
plt.title('fore')
plt.axis('off')

plt.subplot(224)
plt.imshow(un)
plt.title('un')
plt.axis('off')

plt.show()

image-20211211184357636

Note that in image bg in the upper right corner of the figure:

  • The small circles in the foreground are the "original image - determine the background" part, not the "determined background".
  • Its background image is the "determined background".

(4) Function connectedComponents object annotation:

After the determined foreground is determined, the determined foreground image can be marked. In OpenCV, you can use the function cv2.connectedComponents() for annotation. This function will mark the background as 0, and mark other objects with positive integers starting from 1.

The syntax format of the function cv2.connectedComponents() is:

  • retval,labels=cv2.connectedComponents(image)
    • image: An 8-bit single-channel image to be labeled.
    • retval: the number of annotations returned.
    • labels: It is the labeled result image.
import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('img.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

rst, thresh = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((3, 3), dtype=np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)

sure_bg = cv2.dilate(opening, kernel, iterations=3)
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, fore = cv2.threshold(dist_transform, 0.7*dist_transform.max(), 255, 0)
fore = np.uint8(fore)

# 标注前景对象
rst, markers = cv2.connectedComponents(fore)
print(markers)

plt.subplot(131)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(132)
plt.imshow(fore)
plt.title('fore')
plt.axis('off')

plt.subplot(133)
plt.imshow(markers)
plt.title('markers')
plt.axis('off')

plt.show()

image-20211211214747631

  • The left image is the original image
  • The middle one is the center point image fore of the foreground image obtained after distance transformation.
  • The image on the right is the result image markers after labeling the center point image of the foreground image.

It can be seen that the center point of the foreground image is marked differently

When the function cv2.connectedComponents() marks the image, it will mark the background as 0, and mark other objects with positive integers starting from 1. The specific corresponding relationship is:

  • A value of 0 represents the background area.
  • Values ​​starting from 1 represent different foreground regions.

In the watershed algorithm, a label value of 0 represents an unknown region. Therefore, we need to adjust the results marked by the function cv2.connectedComponents(): add the value 1 to the marked results. After the above processing, in the labeling result:

  • A value of 1 represents the background area.
  • Values ​​starting from 2 represent different foreground regions.

In order to be able to use the watershed algorithm, it is also necessary to mark the unknown area in the original image, and mark the calculated unknown area as 0.

ret,markers=cv2.connectedComponents(fore)
markers=markers+1
markers[未知区域]=0

Example: Correct the labeling result of cv2.connectedComponents()

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('img.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

rst, thresh = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((3, 3), dtype=np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)

sure_bg = cv2.dilate(opening, kernel, iterations=3)
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, fore = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
fore = np.uint8(fore)
rst, markers = cv2.connectedComponents(fore)

# 修正标注的前景对象
fore_adv = fore.copy()
unknown = cv2.subtract(sure_bg, fore_adv)
ret2, markers2 = cv2.connectedComponents(fore_adv)
markers2 += 1
markers2[unknown == 255] = 0

plt.subplot(141)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(142)
plt.imshow(fore)
plt.title('fore')
plt.axis('off')

plt.subplot(143)
plt.imshow(markers)
plt.title('markers')
plt.axis('off')

plt.subplot(144)
plt.imshow(markers2)
plt.title('markers2')
plt.axis('off')

plt.show()

image-20211211230305532

  • The markers map is the result of directly marking an image using the function cv2.connectedComponents().
  • The markers2 graph is the corrected labeling result.

Comparing the left and right images, it can be seen that the right image is marked on the edge (unknown area) of the foreground image, so that each determined foreground has a black edge, which is the marked unknown area.

(5) Function cv2.watershed():

The watershed algorithm implemented by the watershed function in OpenCV is a segmentation algorithm based on "marking", which is used to solve the problem of over-segmentation of the traditional watershed algorithm.

After the above image preprocessing is completed, the preprocessing result image can be segmented using the watershed algorithm. In OpenCV, the function that implements the watershed algorithm is cv2.watershed(), and its syntax format is:

  • markers=cv2.watershed(image,markers)

    • image: is the input image, which must be an 8-bit three-channel image.

    • markers: It is a 32-bit single-channel labeling result, and it should have the same size as the image.

      • Before using the cv2.watershed() function to process the image, the image must be preprocessed, and the expected segmentation area in the image should be roughly outlined with a positive number. Each segmented region will be labeled 1, 2, 3, etc. For areas that have not been determined, they need to be marked as 0. We can understand the marked area as the "seed" area for watershed algorithm segmentation.

      • In markers, each pixel is either set to an initial "seed value" or set to "-1" for a boundary.

      • The algorithm will use the contours passed in by markers as seeds (so-called water injection points), judge other pixels on the image according to the rules of the watershed algorithm, and delineate the area attribution of each pixel until the image is processed on all pixels. The value of the boundary between regions is set to "-1" for distinction.

3. Example of watershed algorithm image segmentation

When using the watershed algorithm for image segmentation, the basic steps are:

  1. The original image O is denoised by morphological opening operation.
  2. Obtain "determined background B" by the etch operation. It should be noted that it is enough to get the "original image-determine the background" here.
  3. Use the distance transformation function cv2.distanceTransform() to operate on the original image, and perform threshold processing on it to obtain "determined foreground F".
  4. Compute the unknown region UN(UN=O –BF).
  5. Use the function cv2.connectedComponents() to annotate the original image O.
  6. Correct the annotation result of the function cv2.connectedComponents().
  7. Segmentation of the image is done using the watershed function.

1-6 is image preprocessing, as long as the unknown area in the image is marked as 0, and the known area is marked as 1, 2, 3..., that is, the seed area is marked. The seventh step is to segment the image using the watershed algorithm according to the annotation.

Example:

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('img.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
new_img = rgb_img.copy()

rst, thresh = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
sure_bg = cv2.dilate(opening, kernel, iterations=3)
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)
ret, markers = cv2.connectedComponents(sure_fg)
markers += 1
markers[unknown == 255] = 0
markers = cv2.watershed(new_img, markers)
new_img[markers == -1] = [0, 255, 0]

plt.subplot(121)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(122)
plt.imshow(new_img)
plt.title('rst')
plt.axis('off')

plt.show()

image-20211211233842654

2. Interactive foreground extraction

​ Classical foreground extraction techniques mainly use texture (color) information, such as the magic wand tool, or based on edge (contrast) information, such as smart scissors. In 2004, Rother et al. from Microsoft Research (Cambridge) proposed interactive foreground extraction technology in the paper GrabCut: Interactive Foreground Extraction Using Iterated Graph Cuts. The algorithm they proposed can accurately extract the foreground image with only a few interactive operations.

​When starting to extract the foreground, first use a rectangular frame to specify the approximate location range of the foreground area, and then iteratively segment until the best effect is achieved. After the above processing, the effect of extracting the foreground may not be ideal, and there may be cases where the foreground is not extracted, or the background is extracted as the foreground. At this time, the user needs to intervene in the extraction process. In the copy of the original image (or any image equal in size to the original image), the user marks the area to be extracted as the foreground with white, and the area to be used as the background with black. Then, using the labeled image as a mask, let the algorithm continue to iteratively extract the foreground to get the final result.

For example, for the left image in the figure below, first frame the foreground Lena to be extracted with a rectangular frame, and then mark the foreground image and background image with white and black respectively. After the annotation is completed, use the interactive foreground extraction algorithm to get the result image shown on the right.

image-20211212140453640

Let's look at the specific implementation process of the GrabCut algorithm.

  1. Mark the approximate location of the foreground with a rectangle. It is worth noting that at this time, the rectangular frame only outlines the approximate position of the foreground, which contains both the foreground and the background, so this area is actually an undetermined area. However, areas outside this area are considered "determined background".

  2. Distinguish between foreground and background within the area of ​​the rectangle based on the "determined background" data outside the rectangle.

  3. The foreground and background are modeled with a Gaussian Mixture Model (GMM). GMM learns and creates new pixel distributions based on user input. Unclassified pixels (which may be background or foreground) are classified according to their relationship to known classified pixels (foreground and background).

  4. A graph is generated according to the distribution of pixels, and the nodes in the graph are each pixel. In addition to pixels, there are two nodes: foreground node and background node. All foreground pixels are connected to the foreground node, and all background pixels are connected to the background node. The weight of the edge connecting each pixel to a foreground node or a background node is determined by the probability that the pixel is foreground or background.

  5. In addition to being connected to foreground nodes or background nodes, each pixel in the graph is also connected to each other. The weight value of the edge connecting two pixels is determined by their similarity, the closer the colors of the two pixels are, the greater the weight value of the edge.

  6. After the nodes are connected, the problem to be solved becomes a connected graph. The graph is cut according to the weight relationship of the respective edges, and different points are divided into foreground nodes and background nodes.

  7. Repeat the above process until the classification converges.
    OpenCV's official website has more detailed information (http://www.cs.ru.ac.za/research/g02m1682/), and readers can learn more if they are interested.

In OpenCV, the function for implementing interactive foreground extraction is cv2.grabCut(), and its syntax format is:

  • mask,bgdModel,fgdModel=cv2.grabCut(img,mask,rect,bgdModel,fgdModel,iterCount[,mode])

    • img: For the input image, it is required to be 8-bit 3-channel.

    • mask: It is the mask image, which is required to be 8-bit single-channel. This parameter is used to determine the foreground area, background area and uncertain area, and can be set in 4 forms.

      • cv2.GC_BGD: Indicates that the background is determined, and can also be represented by a value of 0.
      • cv2.GC_FGD: Indicates that the foreground is determined, and it can also be represented by a value of 1.
      • cv2.GC_PR_BGD: Indicates a possible background, which can also be represented by a value of 2.
      • cv2.GC_PR_FGD: Indicates a possible prospect, which can also be represented by a value of 3.

      Note that mask is not only the mask image used as a parameter, but also the result mask image after the grabCut function is processed. We will extract the foreground object based on the result mask image.

      When the template is finally used to extract the foreground, the parameter values ​​0 and 2 will be merged into the background (both treated as 0), and the parameter values ​​1 and 3 will be merged into the foreground (both treated as 1). Under normal circumstances, we can use a white brush and a black brush to mark on the mask image, and then set the white pixels to 0 and the black pixels to 1 through conversion.

    • rect: refers to the area containing the foreground object, and the part outside the area is considered to be "determined background". Therefore, when selecting, make sure that the foreground is included in the range specified by the rect; otherwise, the foreground part outside the rect will not be extracted. The parameter rect is meaningful only when the value of the parameter mode is set to the rectangle mode cv2.GC_INIT_WITH_RECT. Its format is (x, y, w, h), which respectively represent the x-axis and y-axis coordinates of the pixel in the upper left corner of the area and the width and height of the area. If the foreground is located at the bottom right and you don't want to judge the size of the original image, you can directly use a large value for w and h. When using mask mode, set this value to none.

    • bgdModel: For the array used inside the algorithm, only a numpy.float64 array of size (1,65) needs to be created.

    • fgdModel: For the array used inside the algorithm, only a numpy.float64 array of size (1,65) needs to be created.

    • iterCount: Indicates the number of iterations.

    • mode: Indicates the iteration mode. Its possible values ​​and meanings are shown in the table.

      image-20211212143424526

    • The return value of the function is mask, bgdModel, fgdModel

Example 1: Use the GrabCut algorithm to extract the foreground of an image, and observe the extraction effect.

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('../lena512color.tiff')
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
mask = np.zeros(img.shape[:2], dtype=np.uint8)
bgd_model = np.zeros((1, 65), dtype=np.float64)
fgd_model = np.zeros((1, 65), dtype=np.float64)
rect = (50, 50, 500, 500)

# 函数的返回值为mask,bgdModel,fgdModel
cv2.grabCut(img, mask, rect, bgd_model, fgd_model, 5, cv2.GC_INIT_WITH_RECT)
print(mask)
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')

rst = img * mask2[:, :, np.newaxis]
rst = cv2.cvtColor(rst, cv2.COLOR_BGR2RGB)

plt.subplot(121)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(122)
plt.imshow(rst)
plt.title('rst')
plt.axis('off')

plt.show()

image-20211212145057101

​ It can be seen that when the mask is not used (when the mask value is set to the default value of 0), the processing effect of the function cv2.grabCut() is not very good: when extracting the foreground of the left image, the hat of the character is not completely extracted . For some images, it is also possible to extract the background incorrectly.

In order to get a complete foreground object, some improvements need to be made. Here, the original image is marked, **set the part that needs to be kept to white, and set the background that needs to be deleted to black. **Using the marked image as a template, use the function cv2.grabCut() to complete the extraction of the foreground.
This process mainly includes the following steps:

  1. Use the function cv2.grabCut() to perform preliminary foreground extraction on the image in the cv2.GC_INIT_WITH_RECT mode, and obtain the preliminary extracted result image og. Mainly to get a preliminary mask.
  2. Use the brush tool that comes with the Windows system to open the image to extract the foreground, such as lena.
  3. Use a white brush to mark the foreground area you want to extract.
  4. Use a black brush to mark the areas of the background you want to remove.
  5. Save the currently set lena image as a template image m0.
  6. Map the white and black values ​​in template image m0 to template m. Map the white value (pixel value 255) in the template image m0 to the determined foreground (pixel value 1) in the template image m, and map the black value (pixel value 0) in the template image m0 to the template image m The determined background (pixel value is 0).
  7. Use the template image m as the template parameter (mask) of the function cv2.grabCut() to complete the foreground extraction of the image og.

It should be noted that in the above steps, the template image m0 marked with a brush cannot be directly used as a template (ie, the parameter mask). The function cv2.grabCut() requires that the value of the parameter mask must be cv2.GC_BGD (determined background), cv2.GC_FGD (determined foreground), cv2.GC_PR_BGD (possible background), cv2.GC_PR_FGD (possible foreground), or A value among 0, 1, 2, 3. In the template image m0 at this time, there is a value in [0,255], so its value does not meet the requirements of the function cv2.grabCut(), and cannot be used directly as a parameter mask. The white value and black value in the template image m0 must be mapped to the template m first, and then the template image m is used as the template parameter of the function cv2.grabCut().

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread('lena512color.tiff')
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 对前景对象进行初步提取,获取初步mask
mask = np.zeros(img.shape[:2], dtype=np.uint8)
bgd = np.zeros((1, 65), dtype=np.float64)
fgd = np.zeros((1, 65), dtype=np.float64)
rect = (50, 50, 500, 500)
cv2.grabCut(img, mask, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)

# 读取模板,根据模板设置得到的初始mask
mask2 = cv2.imread('m.tiff')
rgb_mask2 = cv2.cvtColor(mask2, cv2.COLOR_BGR2RGB)
gray_mask2 = cv2.cvtColor(mask2, cv2.COLOR_BGR2GRAY)
mask[gray_mask2 == 0] = 0
mask[gray_mask2 == 255] = 1

# 根据修改后的mask再次进行前景对象提取
cv2.grabCut(img, mask, None, bgd, fgd, 5, cv2.GC_INIT_WITH_MASK)
mask = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')

# 根据得到的mask提取前景对象
new_img = rgb_img.copy()
rst = new_img * mask[:, :, np.newaxis]

plt.subplot(131)
plt.imshow(rgb_img)
plt.title('img')
plt.axis('off')

plt.subplot(132)
plt.imshow(rgb_mask2)
plt.title('m')
plt.axis('off')

plt.subplot(133)
plt.imshow(rst)
plt.title('rst')
plt.axis('off')

plt.show()

image-20211212162510096

Guess you like

Origin blog.csdn.net/weixin_57440207/article/details/122647056