Pytorch Deep Learning Practical Tutorial (1): Semantic Segmentation Foundation and Environment Construction

Pytorch's basic use && semantic segmentation algorithm explanation

Let's start with the simplest semantic segmentation foundation and development environment setup.

2. Semantic segmentation
What is semantic segmentation?

Semantic segmentation: It is to label each point in the target category on the image according to the "semantic", so that different kinds of things can be distinguished on the image. It can be understood as a classification task at the pixel level. The straight point is to classify each pixel.

In short, our goal is to give an RGB color image (height x width x3) or a grayscale image (height x width x1), output a segmentation map, which includes the category label of each pixel (height x width x1). The details are shown in the following figure:
Insert picture description here
Note: For visual clarity, the above prediction map is a low-resolution map. In practical applications, the resolution of the segmentation annotation needs to be the same as the resolution of the original image.

The pictures here are divided into five categories: Person (person), Purse (package), Plants/Grass (plant/grass), Sidewalk (sidewalk), Building/Structures (building).

Similar to standard categorical values, here is also the creation of a one-hot encoded target category label-essentially creating an output channel for each category. Because there are 5 categories in the
Insert picture description here
above figure, the number of channels output by the network is also 5, as shown in the following figure: As shown in the figure above, the predicted result can be integrated into a segmentation by calculating argmax in depth for each pixel In the picture. Furthermore, we can easily observe each target in an overlapping manner.

The way of argmax is also easy to understand. As shown in the figure above, each channel has only 0 or 1. Taking the Person channel as an example, the red 1 represents the Person pixel, and the other pixels are all 0. The same is true for other channels, and there is no case where the same pixel is 1 in more than two channels. Therefore, the maximum index channel value of each pixel is found through argmax. The final result is:
Insert picture description here
when only one layer of channels is overlapped on the original image, we call it a mask, which only indicates the area where a certain category exists.

The high resolution result is shown in the figure below, with different colors representing different categories:

Insert picture description here
3. Data set
Common semantic segmentation algorithms belong to supervised learning, so a well-labeled data set is essential.

There are many public semantic segmentation data sets. At present, there are mainly three benchmarks (data sets) in the academic community for model training and testing.

The first commonly used data set is the Pascal VOC series. The most popular in this series is VOC2012, and similar data sets such as Pascal Context are also useful.

The second commonly used data set is Microsoft COCO. COCO has a total of 80 categories. Although there are detailed pixel-level annotations, there is no official evaluation of semantic segmentation. This data set is mainly used for instance-level segmentation and image description. Therefore, the COCO data set is often regarded as an additional training data set for model training.

The third data set is Cityscapes for the assisted driving (autonomous driving) environment, using the more common 19 categories for evaluation.

There are many data sets that can be used for semantic segmentation training:

Pascal Voc 2012: More common object classification, a total of 21 categories;
MS COCO: sponsored by Microsoft, has almost become a "standard" data set for the performance evaluation of image semantic understanding algorithms, a total of 80 categories;
Cityscapes: contains 50 European cities 33 types of labeled objects in different scenes, different backgrounds, and different seasons of street scenes;
Pascal-Context: an extension of PASCAL-VOC 2010 recognition competition, a total of 59 categories;
KITTI: the most popular for research on mobile robots and autonomous driving One of the data sets, a total of 11 categories;
NYUDv2: 2.5-dimensional data set, it contains 1449 indoor RGB-D images captured by Microsoft Kinect equipment;
SUN-RGBD: obtained by four RGB-D sensors, containing 10000 An RGB-D image, the size is the same as PASCAL VOC;
ADE20K_MIT: A new data set for scene understanding, this data set can be downloaded for free, a total of 151 categories.
There are many data sets. This series of tutorials is not limited to specific data sets. It may also use data sets such as Kaggle competitions. How to deal with each data set, what is the format of the data set, and what will be used in subsequent articles The data set will be explained in detail.

Fourth, GPU machine
For semantic segmentation tasks, it is necessary to have a machine with a high-end GPU graphics card. If not, the training convergence will be very slow.

The best development environment is Linux, because the daily work in the company is basically the use of Linux cloud servers for model development, so it is good to adapt to the Linux operating system in advance.

For the student party, if the laboratory is doing research in the direction of deep learning, and the resources are complete, then GPU servers should still be available, and there is no need to worry about GPU servers.

However, due to limited conditions, the laboratory is not equipped with a GPU server. If you want to learn deep learning related knowledge, there are three methods:

1. Google Colab, a free cloud server that
can barely be used, is Google Colab. It is a free GPU server provided by Google. The GPU computing power provided is not bad, but its main problem lies in the need to overturn the wall and the small storage space. Google Colab's storage space is obtained by mounting Google Drive. Google Drive only provides 15G of free storage space. If you want to expand the space, you still need to spend money.

If you want to use the free cloud server Google Colab, you can use Baidu tutorial by yourself.

2. Alibaba Cloud paid GPU cloud server.
Alibaba Cloud provides GPU cloud server resources. There are two payment modes: monthly and pay-per-flow. There are P4 servers and even V100 servers. The performance is strong and the price is very touching. It is very expensive to describe in two words, and individual users do not recommend buying it. In addition to GPU cloud services provided by Alibaba Cloud, Tencent, Baidu, and Huawei all have corresponding services, but they are all expensive.

3. Configure a computer host.
You can configure a desktop host by yourself, which can be regarded as an investment in yourself. It costs about 6000 yuan to configure a good host that can be used for deep learning training.

Deep learning training is very dependent on the performance of the graphics card, so you need to configure a better N card, that is, NVIDIA graphics card. The trick to choose a graphics card is to look at the graphics card ladder chart:
Insert picture description here
this graphics card ladder chart mainly includes the ranking of commonly used graphics cards in the market , Excluding graphics cards with a price of 100,000 like the V100.

The higher the ladder diagram, the higher the performance of the graphics card. Do not choose the AMD graphics card on the right. Although the performance is good, card A does not support CUDA.

Choose a graphics card according to your budget. Try to choose a video memory of 8G or more for the graphics card. Deep learning model training consumes video memory resources.

I bought MSI’s RTX 2060 Super at a price of 3399 yuan. The graphics card is not worth keeping, and the price will get lower and lower over time.

There are a lot of things you can write about configuring a computer, such as the choice of CPU, computer motherboard, power supply, memory, and radiator, etc., so I won’t expand here. If you don’t have the energy to assemble your own desktop, you can directly buy a desktop equipped with the corresponding graphics card, but the price is higher than that of the desktop assembled by yourself.

V. Development environment construction If
conditions permit, it is recommended to use the Ubuntu system to configure the development environment. Ubuntu is one of the distributions of Linux, suitable for novices, with a friendly interface and simple operation.

Since the computer motherboard I purchased does not support the installation of a Linux-based system, I will use Windows as the development environment in the future, but this does not affect the explanation of the algorithm principle and code.

My desktop configuration:

CPU:Intel i7 9700k

Graphics: RTX 2060 Super

System: Windows 10

After installing the Windows system and the necessary drivers, the tools that need to be installed are: CUDA, Anaconda3, cuDNN, Pytorch-gpu, Fluent Terminal (optional).

1. CUDA
CUDA is a computing platform launched by graphics card manufacturer NVIDIA. We need to choose the supported CUDA version according to the model of our graphics card. For example, RTX 2060 Super supports CUDA 10, download link: click to view

Insert picture description here
Fool-style installation is very simple.

After installation, you need to configure the environment variables of the system, computer -> right mouse button -> properties -> advanced system settings -> environment variables -> Path:
Insert picture description here
add your own NVSMI path to the environment variables, I used the default installation Address: After
Insert picture description here
configuration, you can use the nvidia-smi command in cmd to view the graphics card.

2. Anaconda3
Anaconda is Python's package manager and environment manager, which can facilitate the installation of third-party libraries for Python.

Download address: Click to view.
Insert picture description here
Choose the version of Python 3.7, the installation is also very simple, the next step is foolish.

After installation, you need to add system environment variables in the same way as when installing CUDA:

D:\Anaconda
D:\Anaconda\Scripts

The path can be changed to the Anaconda path installed by yourself.

After the configuration is complete, running conda -V in cmd does not report an error, and the version information is output, indicating that the configuration is successful.

3. Install cuDNN and Pytorch
cuDNN is a GPU acceleration library for deep neural networks. It emphasizes performance, ease of use and low memory overhead.

After installing Anaconda, you can use conda to install cuDNN and Pytorch.

Open Anaconda Prompt, which is the command line tool that comes with Anaconda. First, you must use this tool to create an environment. Use the cmd that comes with the system directly. You may encounter some strange problems, such as CondaHTTPError. Enter in Anaconda Prompt:


conda create -n your_name jupyter notebook

The meaning of this sentence is to create a virtual environment named your_name, and this virtual environment additionally installs a third-party library of jupyter notebook. You can change your_name to a name you like. This name is the name of your virtual environment, and you can take it whatever you want, such as jack.

Then, enter y to install: After the
Insert picture description here
installation is complete, you can check the existing environment through the command conda info -e.
Insert picture description here
As you can see from the above figure, there are two environments, one is base, the built-in basic environment, and the other is our newly created environment called jack. The reason for creating a new environment is that we can manage our configured environment separately.

After installing the environment, we can activate the jack environment and install cuDNN and GPU version of Pytorch. Activate the environment named jack:


activate jack

Insert picture description here
As you can see, our environment has changed from base to jack. Install cuDNN in jack environment:

conda install cudnn

After cuDNN is installed, install Pytorch and open the Pytorch official website: click to view
Insert picture description here
Choose according to your own environment. After you choose, the web page will automatically give instructions to run. It may be necessary to distinguish between the Python version and the CUDA version.

Python version view method: directly enter python in the command line, you will see the Python version.
Insert picture description here
To view the CUDA version, enter nvidia-smi in the command line: After
Insert picture description here
confirming the version, you can install the GPU version of Pytorch through the instructions provided by the Pytorch official website.

At this point, the basic environment setup has been completed, congratulations.

4. The
basic environment of Fluent Terminal is configured, and normal use is enough.

But those who pursue beauty may feel that the command line tools that come with Windows and the command line tools provided by Anaconda are too ugly.

Is there any good-looking and easy-to-use Terminal? The answer is yes, but you need to configure it yourself, and there are some pits that need to be stepped on slowly.

For example, Fluent Terminal, it is a modern terminal tool that I recommend. It is a terminal emulator with super high value built on the Windows platform and using UWP technology. First look at the value of the face:
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44517301/article/details/114965350