How to train YOLOv5 for segmentation? Simply put, it includes several steps:
Prepare dataset for image segmentation
Train YOLOv5 on a custom dataset
Using YOLOv5 for inference
Prepare dataset
As a first step, you need to prepare your dataset in an appropriate format. This format is very similar to the YOLOv5 format used for detection. You need to create a directory similar to the following:
Let's take a look inside the data.yaml file. This file has the same structure as the detection task. Its structure is shown in the figure below:
The structure of the data.yaml file
train - path to your train images
val - path to your validation images
nc - number of classes
names - сlass names
Let's take a look inside the .txt file.
The first element is "0", indicating the number of categories. The next values are the x and y coordinates of the polygon. These coordinates are normalized to the size of the original image. If you want to view an image with this polygon, you can use the function below. The first opens the image and markup files, the second displays the image and markup.
def read_image_label(path_to_img: str, path_to_txt: str, normilize: bool = False) -> Tuple[np.array, np.array]:
# read image
image = cv2.imread(path_to_img)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
img_h, img_w = image.shape[:2]
# read .txt file for this image
with open(path_to_txt, "r") as f:
txt_file = f.readlines()[0].split()
cls_idx = txt_file[0]
coords = txt_file[1:]
polygon = np.array([[eval(x), eval(y)] for x, y in zip(coords[0::2], coords[1::2])]) # convert list of coordinates to numpy massive
# Convert normilized coordinates of polygons to coordinates of image
if normilize:
polygon[:,0] = polygon[:,0]*img_w
polygon[:,1] = polygon[:,1]*img_h
return image, polygon.astype(np.int)
def show_image_mask(img: np.array, polygon: np.array, alpha: float = 0.7):
# Create zero array for mask
mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
overlay = img.copy()
# Draw polygon on the image and mask
cv2.fillPoly(mask, pts=[polygon], color=(255, 255, 255))
cv2.fillPoly(img, pts=[polygon], color=(255, 0, 0))
cv2.addWeighted(overlay, alpha, image, 1 - alpha, 0, image)
# Plot image with mask
fig = plt.figure(figsize=(22,18))
axes = fig.subplots(nrows=1, ncols=2)
axes[0].imshow(img)
axes[1].imshow(mask, cmap="Greys_r")
axes[0].set_title("Original image with mask")
axes[1].set_title("Mask")
plt.show()
The result after processing the above code is as follows:
In some cases you may not have polygon data, but have binary masks. So it would be useful to have a function to convert binary masks to polygons. Examples of such functionality are as follows:
def mask_to_polygon(mask: np.array, report: bool = False) -> List[int]:
contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
polygons = []
for object in contours:
coords = []
for point in object:
coords.append(int(point[0][0]))
coords.append(int(point[0][1]))
polygons.append(coords)
if report:
print(f"Number of points = {len(polygons[0])}")
return np.array(polygons).ravel().tolist()
polygons = mask_to_polygon(mask, report=True)
The result is as follows:
Number of points: 1444
The x and y coordinates are 722 points respectively.
In principle, we could continue to train models on top of this, but I would like to give an example of another function that reduces the number of points obtained after converting a mask to a polygon. This is useful when you don't want to highlight an object with too many points.
def reduce_polygon(polygon: np.array, angle_th: int = 0, distance_th: int = 0) -> np.array(List[int]):
angle_th_rad = np.deg2rad(angle_th)
points_removed = [0]
while len(points_removed):
points_removed = list()
for i in range(0, len(polygon)-2, 2):
v01 = polygon[i-1] - polygon[i]
v12 = polygon[i] - polygon[i+1]
d01 = np.linalg.norm(v01)
d12 = np.linalg.norm(v12)
if d01 < distance_th and d12 < distance_th:
points_removed.append(i)
continue
angle = np.arccos(np.sum(v01*v12) / (d01 * d12))
if angle < angle_th_rad:
points_removed.append(i)
polygon = np.delete(polygon, points_removed, axis=0)
return polygon
def show_result_reducing(polygon: List[List[int]]) -> List[Tuple[int, int]]:
original_polygon = np.array([[x, y] for x, y in zip(polygon[0::2], polygon[1::2])])
tic = time()
reduced_polygon = reduce_polygon(original_polygon, angle_th=1, distance_th=20)
toc = time()
fig = plt.figure(figsize=(16,5))
axes = fig.subplots(nrows=1, ncols=2)
axes[0].scatter(original_polygon[:, 0], original_polygon[:, 1], label=f"{len(original_polygon)}", c='b', marker='x', s=2)
axes[1].scatter(reduced_polygon[:, 0], reduced_polygon[:, 1], label=f"{len(reduced_polygon)}", c='b', marker='x', s=2)
axes[0].invert_yaxis()
axes[1].invert_yaxis()
axes[0].set_title("Original polygon")
axes[1].set_title("Reduced polygon")
axes[0].legend()
axes[1].legend()
plt.show()
print("\n\n", f'[bold black] Original_polygon length[/bold black]: {len(original_polygon)}\n',
f'[bold black] Reduced_polygon length[/bold black]: {len(reduced_polygon)}\n'
f'[bold black]Running time[/bold black]: {round(toc - tic, 4)} seconds')
return reduced_polygon
The output of the function looks like this:
There are 722 points in x and y respectively. After processing, x and y become 200 points respectively.
At this point, we continue to train the model.
Train YOLOv5 on a custom dataset
Here you need to perform the following steps:
git clone https://github.com/ultralytics/yolov5.git
pip install -r requirements.txt
After you git clone the YOLOv5 complete project code to your local computer and install the library, you can start the learning process. Pre-trained models are used here.
python3 segment/train.py
--data "/Users/vladislavefremov/Downloads/Instance_Segm_2/data.yaml"
--weights yolov5s-seg.pt
--img 640
--batch-size 2
--epochs 50
After training, you can look at the results on the validation set:
Model predictions on the validation set
If you want to know more about YOLOv5 parameters, you can view the official code (https://github.com/ultralytics/yolov5)
Reasoning using YOLOv5
We have trained the model and now we can infer from photos, directories containing photos, videos, directories containing videos, etc.
Let's perform inference on a video and see the final result.
python3 segment/predict.py
--weights "/home/user/Disk/Whales/weights/whale_3360/weights/best.pt"
--source "/home/user/Disk/Whales/Video"
--imgsz 1280
--name video_whale
Get the video address: https://youtu.be/_j8sA6VUil4
What is the form of the result obtained after reasoning? Polygon with class index and absolute values of x and y coordinates.
in conclusion
In this article, we studied how to prepare data for segmentation with the YOLOv5 algorithm; a function that quickly converts the mask matrix into polygons. We learned how to train the YOLOv5 algorithm and perform inference after training.
· END ·
HAPPY LIFE