Practical case of image segmentation using YOLOv5

How to train YOLOv5 for segmentation? Simply put, it includes several steps:

  • Prepare dataset for image segmentation

  • Train YOLOv5 on a custom dataset

  • Using YOLOv5 for inference

Prepare dataset

As a first step, you need to prepare your dataset in an appropriate format. This format is very similar to the YOLOv5 format used for detection. You need to create a directory similar to the following:

766e66489047831927f9a1eab21cf96f.png

Let's take a look inside the data.yaml file. This file has the same structure as the detection task. Its structure is shown in the figure below:

80851148113eb31240cdfc4c11559903.png

The structure of the data.yaml file

train - path to your train images
val - path to your validation images
nc - number of classes
names - сlass names

Let's take a look inside the .txt file.

1d0f3132eed066af29ab224af7c80ce1.png

The first element is "0", indicating the number of categories. The next values ​​are the x and y coordinates of the polygon. These coordinates are normalized to the size of the original image. If you want to view an image with this polygon, you can use the function below. The first opens the image and markup files, the second displays the image and markup.

def read_image_label(path_to_img: str, path_to_txt: str, normilize: bool = False) -> Tuple[np.array, np.array]:
    
    # read image
    image = cv2.imread(path_to_img)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_h, img_w = image.shape[:2]
  
    # read .txt file for this image
    with open(path_to_txt, "r") as f:
        txt_file = f.readlines()[0].split()
        cls_idx = txt_file[0]
        coords = txt_file[1:]
        polygon = np.array([[eval(x), eval(y)] for x, y in zip(coords[0::2], coords[1::2])]) # convert list of coordinates to numpy massive
  
    # Convert normilized coordinates of polygons to coordinates of image
    if normilize:
        polygon[:,0] = polygon[:,0]*img_w
        polygon[:,1] = polygon[:,1]*img_h
    return image, polygon.astype(np.int)




def show_image_mask(img: np.array, polygon: np.array, alpha: float = 0.7):
    
    # Create zero array for mask
    mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
    overlay = img.copy()
    
    # Draw polygon on the image and mask
    cv2.fillPoly(mask, pts=[polygon], color=(255, 255, 255))
    cv2.fillPoly(img, pts=[polygon], color=(255, 0, 0))
    cv2.addWeighted(overlay, alpha, image, 1 - alpha, 0, image)
    
    # Plot image with mask
    fig = plt.figure(figsize=(22,18))
    axes = fig.subplots(nrows=1, ncols=2)
    axes[0].imshow(img)
    axes[1].imshow(mask, cmap="Greys_r")
    axes[0].set_title("Original image with mask")
    axes[1].set_title("Mask")
    
    plt.show()

The result after processing the above code is as follows:

45a7dc5aad33b3533a10bd626da87cce.png

In some cases you may not have polygon data, but have binary masks. So it would be useful to have a function to convert binary masks to polygons. Examples of such functionality are as follows:

def mask_to_polygon(mask: np.array, report: bool = False) -> List[int]:
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    polygons = []
    for object in contours:
        coords = []


        for point in object:
            coords.append(int(point[0][0]))
            coords.append(int(point[0][1]))
        polygons.append(coords)
    
    if report:
        print(f"Number of points = {len(polygons[0])}")
    
    return np.array(polygons).ravel().tolist()
    
polygons = mask_to_polygon(mask, report=True)

The result is as follows:

Number of points: 1444

The x and y coordinates are 722 points respectively.

In principle, we could continue to train models on top of this, but I would like to give an example of another function that reduces the number of points obtained after converting a mask to a polygon. This is useful when you don't want to highlight an object with too many points.

def reduce_polygon(polygon: np.array, angle_th: int = 0, distance_th: int = 0) -> np.array(List[int]):
    angle_th_rad = np.deg2rad(angle_th)
    points_removed = [0]
    while len(points_removed):
        points_removed = list()
        for i in range(0, len(polygon)-2, 2):
            v01 = polygon[i-1] - polygon[i]
            v12 = polygon[i] - polygon[i+1]
            d01 = np.linalg.norm(v01)
            d12 = np.linalg.norm(v12)
            if d01 < distance_th and d12 < distance_th:
                points_removed.append(i)
                continue
                angle = np.arccos(np.sum(v01*v12) / (d01 * d12))
                if angle < angle_th_rad:
                    points_removed.append(i)
        polygon = np.delete(polygon, points_removed, axis=0)
    return polygon
    
    
def show_result_reducing(polygon: List[List[int]]) -> List[Tuple[int, int]]:
    original_polygon = np.array([[x, y] for x, y in zip(polygon[0::2], polygon[1::2])])


    tic = time()
    reduced_polygon = reduce_polygon(original_polygon, angle_th=1, distance_th=20)
    toc = time()


    fig = plt.figure(figsize=(16,5))
    axes = fig.subplots(nrows=1, ncols=2)
    axes[0].scatter(original_polygon[:, 0], original_polygon[:, 1], label=f"{len(original_polygon)}", c='b', marker='x', s=2)
    axes[1].scatter(reduced_polygon[:, 0], reduced_polygon[:, 1], label=f"{len(reduced_polygon)}", c='b', marker='x', s=2)
    axes[0].invert_yaxis()
    axes[1].invert_yaxis()
    
    axes[0].set_title("Original polygon")
    axes[1].set_title("Reduced polygon")
    axes[0].legend()
    axes[1].legend()
    
    plt.show()


    print("\n\n", f'[bold black] Original_polygon length[/bold black]: {len(original_polygon)}\n', 
          f'[bold black] Reduced_polygon length[/bold black]: {len(reduced_polygon)}\n'
          f'[bold black]Running time[/bold black]: {round(toc - tic, 4)} seconds')
    
    return reduced_polygon

The output of the function looks like this:

72edde4fe961f92b4ca11003fe87311f.png

There are 722 points in x and y respectively. After processing, x and y become 200 points respectively.

At this point, we continue to train the model.

Train YOLOv5 on a custom dataset

Here you need to perform the following steps:

git clone https://github.com/ultralytics/yolov5.git
pip install -r requirements.txt

After you git clone the YOLOv5 complete project code to your local computer and install the library, you can start the learning process. Pre-trained models are used here.

python3 segment/train.py 
--data "/Users/vladislavefremov/Downloads/Instance_Segm_2/data.yaml"
--weights yolov5s-seg.pt 
--img 640 
--batch-size 2 
--epochs 50

d3dc8d7d5f5f32409209a1af54b13b49.png

After training, you can look at the results on the validation set:

c8e24f73264658cf1d3eb4b240d10863.jpeg

Model predictions on the validation set

If you want to know more about YOLOv5 parameters, you can view the official code (https://github.com/ultralytics/yolov5)

Reasoning using YOLOv5

We have trained the model and now we can infer from photos, directories containing photos, videos, directories containing videos, etc.

Let's perform inference on a video and see the final result.

python3 segment/predict.py 
--weights "/home/user/Disk/Whales/weights/whale_3360/weights/best.pt" 
--source "/home/user/Disk/Whales/Video" 
--imgsz 1280 
--name video_whale

Get the video address: https://youtu.be/_j8sA6VUil4

What is the form of the result obtained after reasoning? Polygon with class index and absolute values ​​of x and y coordinates.

f6f23a9ceeab7cbd2b6acff1f5cdd0b8.png

in conclusion

In this article, we studied how to prepare data for segmentation with the YOLOv5 algorithm; a function that quickly converts the mask matrix into polygons. We learned how to train the YOLOv5 algorithm and perform inference after training.

·  END  ·

HAPPY LIFE

1db5f15c21eb0c74de06995a1c8be34e.png

Guess you like

Origin blog.csdn.net/weixin_38739735/article/details/128016330#comments_28080417