Code source: https://github.com/milesial/Pytorch-UNet
1. Build the environment
Before starting to build the environment, be sure to read the readme carefully
I chose Without Docker, then I will follow the following requirements to configure the environment:
Install CUDA
Official website: https://developer.nvidia.com/cuda-toolkit-archive
You can check the highest version of CUDA that your computer can support by command nvidia-smi
You can see that the highest CUDA version supported by my computer is 11.7, and then go to the official website to select a CUDA version lower than this version to download. I chose 10.2 for the first time, but I encountered problems during installation, so I finally chose the 11.3 version. The reason will be mentioned later. It is recommended to choose the appropriate CUDA version after reading the tutorial.
After selecting the version, choose to download the corresponding exe according to your own configuration.
Run the exe to start the installation, you can customize the installation path
Keep going to the next step until the installation is successful
install cudnn
Official website: https://developer.nvidia.com/rdp/cudnn-archive#a-collapse51b
Select the version corresponding to your own CUDA
To download directly, you need to register an account. We can expand the version to be downloaded, right-click on the version to be downloaded to copy the link address, and then drag it to Thunder and other download software to help download without registering an account.
After the download is complete, unzip it, and copy the three folders after decompression to the folder corresponding to CUDA, and the configuration is completed.
install anaconda
There are many online tutorials in this part so I won’t go into details. (Well, I’m too lazy to take a screenshot)
Because different projects require different environments, we can create virtual environments to run our projects:
conda create -n pytorch python=3.8 #创建名为pytorch,python版本为3.8的虚拟环境
conda activate pytorch #激活虚拟环境
conda deacivate #退出虚拟环境
conda remove -n pytorch --all #删除虚拟环境
Install Pytorch
Note: According to the requirements in the readme, version 1.12 and above need to be installed
Corresponding version installation instructions: https://pytorch.org/get-started/previous-versions/
Enter the virtual environment we just created and enter the corresponding command:
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
Test whether the installation is successful: CRTL+R enter cmd and press Enter
If you get True, the installation is successful!
Step on the pit record! ! !
When I first installed this place, it was always false. I thought it was an environmental problem. After deleting and reinstalling it many times, it was still false. I checked many methods on the Internet and found that this may be because the version downloaded by conda is not the gpu version at all!
Enter conda list, you can see that the correct version should be like this:
If you find that pytorch displays the cpu version after downloading, then you have fallen into the pit of conda. Conda defaults to Tsinghua source. It will download pytorch from Tsinghua source. If it can’t find the version you specified for it, it will download a default cpu version. In order to solve this problem, I chose the simplest and rude one. The way is to see which versions are available, and then download the corresponding cuda, which is why I downloaded CUDA 11.3 later.
python3.8 + cuda11.3 + cudnn8_0 are all corresponding versions, so there will be no errors!
Link address: https://mirrors.bfsu.edu.cn/anaconda/cloud/pytorch/win-64/
install dependencies
You can directly pip install -r requirements.txt according to the instructions given by the readme
document content:
matplotlib==3.6.2
numpy==1.23.5
Pillow==9.3.0
tqdm==4.64.1
wandb==0.13.5
But this will be very slow, it is recommended to use the mirror source
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple matplotlib==3.6.2
Note that these commands are executed in the virtual environment we just created.
2. Data preparation
Reference blog: https://blog.csdn.net/ECHOSON/article/details/122914826
Prepare two folders, one is the original image and the other is the marked mask
The labeling software used is labelme
You can use the command line to download and use, activate the virtual environment, enter:
pip install labelme #同样也可以使用镜像源
Then enter labelme directly on the command line to start.
After getting the json file, it needs to be converted into png format before it can be used. Convert the code:
from __future__ import print_function
import argparse
import glob
import math
import json
import os
import os.path as osp
import shutil
import numpy as np
import PIL.Image
import PIL.ImageDraw
import cv2
def json2png(json_folder, png_save_folder):
if osp.isdir(png_save_folder):
shutil.rmtree(png_save_folder)
os.makedirs(png_save_folder)
json_files = os.listdir(json_folder)
for json_file in json_files:
json_path = osp.join(json_folder, json_file)
os.system("labelme_json_to_dataset {}".format(json_path))
label_path = osp.join(json_folder, json_file.split(".")[0] + "_json/label.png")
png_save_path = osp.join(png_save_folder, json_file.split(".")[0] + ".png")
label_png = cv2.imread(label_path, 0)
label_png[label_png > 0] = 255
cv2.imwrite(png_save_path, label_png)
# shutil.copy(label_path, png_save_path)
# break
if __name__ == '__main__':
# !!!!你的json文件夹下只能有json文件不能有其他文件
json2png(json_folder="D:/Project/testData/jsons/",png_save_folder="D:/Project/testData/jsons/labels/")
The final file structure is as follows
The original picture is placed in imgs, and the marked mask is placed in masks. Note that the picture names must correspond one-to-one. This part can be seen in the reference blog, which is written in detail by the blogger.
The main thing I want to talk about is data enhancement and the pits encountered.
Due to the small amount of original data, the training effect is not good. It is thought that the number of pictures can be expanded by data enhancement.
Using Augmentor to do data enhancement for semantic segmentation
Create a virtual environment Augmentor, activate the virtual environment and download Augmentor:
conda create -n Augmentor python=3.8
conda activate Augmentor
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple Augmentor
Create two new folders test1 and test2
import Augmentor
# 确定原始图像存储路径以及掩码文件存储路径,需要把“\”改成“/”
p = Augmentor.Pipeline("D:/Project/Augmentor/test1") #原图
p.ground_truth("D:/Project/Augmentor/test2") #标注后的图
# 图像左右互换: 按照概率0.5执行
p.flip_left_right(probability=0.5)
p.flip_top_bottom(probability=0.5)
#随机亮度增强/减弱,min_factor, max_factor为变化因子,决定亮度变化的程度,可根据效果指定
p.random_brightness(probability=1, min_factor=0.7, max_factor=1.2)
#随机颜色/对比度增强/减弱
#p.random_color(probability=1, min_factor=0.0, max_factor=1)
p.random_contrast(probability=1, min_factor=0.7, max_factor=1.2)
#随机翻转(flip_random)
p.flip_random(probability=1)
# 最终扩充的数据样本数可以更换为100。1000等
p.sample(1000)
The final image will be output to the output folder, and then the original image and mask will be separated manually.
To prepare for training, we need to modify the name of the picture. One is to ensure that the name of the original picture and the mask are the same, and the other is that there are two names in the generated picture. It is not conducive to splitting the name during training.
The code for modifying image names in batches is as follows. You can modify the code a little according to your own needs:
#批量修改后缀名
path = 'D:/Project/Pytorch-UNet-master/data/imgs' #文件地址
list_path = os.listdir(path) #读取文件夹里面的名字
for index in list_path: #list_path返回的是一个列表 通过for循环遍历提取元素
name = index.split('.')[0] + '.png'
print(name)
os.rename(os.path.join(path,index),os.path.join(path,name))
At this point, we have obtained the expanded 1000 pictures and the corresponding masks. A new problem has arisen. I only need two types during training, similar to the one shown in the picture below, with only 0 and 255 pixels:
However, there may be many kinds of image pixel values obtained after data enhancement, so we need to make a simple modification to make our image pixel values meet the training needs (c++ implementation):
void getFiles(string path, vector<string>& files);
int main()
{
vector<string> files;
string path = "D:\\Project\\Augmentor\\mask";
getFiles(path, files);
// 遍历文件夹下所有文件
for (int i = 0; i < files.size(); i++)
{
Mat src = imread(files[i]);
for (int i = 0; i < src.rows; i++) {
for (int j = 0; j < src.cols; j++) {
if(src.at<cv::Vec3b>(i, j)[0] > 50)
{
src.at<cv::Vec3b>(i, j)[0] = 255;
src.at<cv::Vec3b>(i, j)[1] = 255;
src.at<cv::Vec3b>(i, j)[2] = 255;
}
else
{
src.at<cv::Vec3b>(i, j)[0] = 0;
src.at<cv::Vec3b>(i, j)[1] = 0;
src.at<cv::Vec3b>(i, j)[2] = 0;
}
}
}
imwrite(files[i], src);
}
return 0;
}
void getFiles(string path, vector<string>& files)
{
//文件句柄
long long hFile = 0;
//文件信息
struct _finddata_t fileinfo;
string p;
if ((hFile = _findfirst(p.assign(path).append("\\*").c_str(), &fileinfo)) != -1)
{
do
{
//如果是目录,迭代之
//如果不是,加入列表
if ((fileinfo.attrib & _A_SUBDIR))
{
if (strcmp(fileinfo.name, ".") != 0 && strcmp(fileinfo.name, "..") != 0)
getFiles(p.assign(path).append("\\").append(fileinfo.name), files);
}
else
{
files.push_back(p.assign(path).append("\\").append(fileinfo.name));
}
} while (_findnext(hFile, &fileinfo) == 0);
_findclose(hFile);
}
}
At this point, I am very close to success, but I still have a problem when I throw the picture into the training. It reminds me that the dimensions of the two inputs are different. After investigation, it is found that this is because the original mask is an 8-bit image. The enhanced mask is a 24-bit image, so we need to convert the bit depth:
#24位转8位
path = 'D:/Project/Augmentor/mask' #文件地址
path1 = 'D:/Project/Augmentor/masktest'
list_path = os.listdir(path) #读取文件夹里面的名字
for index in list_path: #list_path返回的是一个列表 通过for循环遍历提取元素
print(os.path.join(path,index))
p1 = os.path.join(path,index)
p2 = os.path.join(path1,index)
print(p1)
print(p2)
img = cv2.imread(os.path.join(path,index)) # 填要转换的图片存储地址
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(os.path.join(path1,index),img) # 填转换后的图片存储地址,若在同一目录,则注意不要重名
At this point, all the processing of the image is completely completed.
3. Start training
Modify the appropriate parameters and the number of categories you want to divide, etc. img_scale is the ratio of the image resize. If the image is too large and there is an error of insufficient video memory during training, you can try to change this value to a smaller value.
Then you can start training! If you use the command line to execute, remember to enter the corresponding disk and virtual environment. If you are not on the same disk, an error will be reported, and if the environment is wrong, you will not be able to execute.
Excuting an order: