Semantic segmentation environment construction

Semantic segmentation environment construction 

1. Environment installation and configuration

Back to the roots, pytorch comes from torch, but torch uses a niche luna language, and pytorch is python. Of course, pytorch has been updated in many framework design ideas. We also intend to use the pytorch framework to train the semantic segmentation model.

Install pytorch

Before using the pytorch framework, it must be installed. The process is relatively simple. My platform here is: ubuntu18.0.4 + python3.6

First try the following installation command:

pip3 install torch, torchvision

pip3 install torch, torchvision

However, it was found that the download speed was very slow and it was difficult to install successfully. Later, I found this command on the Internet and installed it quickly:

python3 -m pip install --upgrade torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple

After installation, enter the following command to verify, and print out the version number of torch and torchvision:

 

Subsequent supplements:

The CPU version above is installed, so the model training will be very slow. The following explains the pytorch installation of the gpu version.

First download torch-1.1.0-cp36-cp36m-linux_x86_64.whl and torchvision-0.3.0-cp36-cp36m-manylinux1_x86_64.whl from https://download.pytorch.org/whl/cu90/torch_stable.html Package, and then type the command:

pip3 uninstall torch

pip3 uninstall torchvision

sudo pip3 install xxx/torch-1.1.0-cp36-cp36m-linux_x86_64.whl xxx/torchvision-0.3.0-cp36-cp36m-manylinux1_x86_64.whl

 Input during verification: import torchvision may cause the following error:

>>> import torchvisionTraceback (most recent call last): 

File "<stdin>", line 1, in <module>  。。。 。。。  。。。 。。。   

from . import functional as F 

File "/home/xxx/.local/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 5, in <module>    from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION

ImportError: cannot import name 'PILLOW_VERSION'

The solution is to install a relatively lower version of pillow, as shown below

 pip3 install Pillow==6.2.2

After torch and torchvision import are successful, finally use the following command to confirm whether the pytorch cuda status is normal

 

 

 

 As shown in the figure above, if the cuda status is normal, the print value should be True. At this point, the cuda version of pytorch is installed.

Installation of other environment dependent packages

pip3 install matplotlib pillow tensorboardX tqdm

The above command is very simple and should be downloaded and installed quickly.

In addition, you may need to install the following two packages:

Cython installation command:

 python3 -m pip install --upgrade Cython -i https://pypi.tuna.tsinghua.edu.cn/simple

The installation commands of pycocotools are as follows:

 

 pip3 install pycocotools

At this point, the installation of the pytorch framework and environment-dependent packages is basically complete.

Second, semantic segmentation

What is semantic segmentation?

Semantic segmentation (semantic segmentation): It is to label each point in the target category on the image according to "semantics", so that different kinds of things are distinguished on the image. It can be understood as a pixel-level classification task. The straight white point is to classify each pixel.

In short, our goal is to give an RGB color image (height x width x3) or a grayscale image (height x width x1) and output a segmentation map, which includes the category label of each pixel (height x width x1). The details are shown below:

 

  

Note : For visual clarity, the above prediction picture is a low-resolution picture. In practical applications, the resolution of the segmentation annotation needs to be the same as the resolution of the original image.

There are five types of pictures here : Person (person), Purse (bag), Plants / Grass (plant / grass), Sidewalk (sidewalk), Building / Structures (buildings).

Similar to the practice of standard categorical values, here is also to create a one-hot encoded target category label-essentially to create an output channel for each category . Because the above figure has 5 categories, the number of channels output by the network is also 5, as shown in the following figure:

 

 

 

As shown in the figure above, the prediction results can be integrated into a segmentation graph by finding the argmax of each pixel in depth . Furthermore, we can easily observe each target by overlapping.

The argmax approach is also well understood. As shown in the above figure, each channel has only 0 or 1, taking the person's channel as an example, the red 1 represents the pixel of Person, and the other pixels are all 0. The same is true for other channels, and there is no case where the same pixel point is 1 in more than two channels. Therefore, the maximum index channel value of each pixel is found through argmax. The final result is:

 

 When only one layer of channels is overlapped to the original image, we call it mask, which only indicates the area where a certain category exists.

The high-resolution results are shown in the following figure. Different colors represent different categories:

 

 

Third, the data set

Common semantic segmentation algorithms belong to supervised learning, so labeled data sets are essential.

There are many public datasets for semantic segmentation. At present, there are mainly three benchmarks (data sets) for academic training and testing.

The first commonly used data set is the Pascal VOC series. In this series, VOC2012, Pascal Context and other similar datasets are also popular.

The second commonly used data set is Microsoft COCO. COCO has a total of 80 categories. Although there are very detailed pixel-level annotations, there is no official evaluation of semantic segmentation. This data set is mainly used for instance-level segmentation and picture description. Therefore, the COCO data set is often used as an additional training data set for model training.

The third data set is Cityscapes for assisted driving (autonomous driving) environments, using 19 common categories for evaluation.

There are many data sets that can be used for semantic segmentation training,

  • Pascal Voc 2012: more common object classification, a total of 21 categories;
  • MS COCO: Sponsored by Microsoft, it has almost become the "standard" data set of image semantic understanding algorithm performance evaluation, a total of 80 categories;
  • Cityscapes: 33 types of labeled objects containing street scenes in 50 European cities with different scenes, different backgrounds, and different seasons;
  • Pascal-Context: For the expansion of the PASCAL-VOC 2010 recognition competition, a total of 59 categories;
  • KITTI: One of the most popular data sets for mobile robot and autonomous driving research, a total of 11 categories;
  • NYUDv2: 2.5-dimensional data set, which contains 1449 indoor RGB-D images captured by Microsoft Kinect equipment;
  • SUN-RGBD: derived from four RGB-D sensors, containing 10,000 RGB-D images, the size is the same as PASCAL VOC;
  • ADE20K_MIT: A new data set for scene understanding. This data set is free to download and contains 151 categories.

 

There are many data sets. This series of tutorials is not limited to specific data sets. You may also use data sets such as Kaggle games. How to deal with each data set, what the format of the data set is, and what will be used in subsequent articles The data set will be explained in detail.

Four, GPU machine

For semantic segmentation tasks, it is still necessary to have a machine with a high-end GPU graphics card . If not, the training convergence will be slow.

The best development environment is Linux , because in the company's daily work, basically use Linux cloud server for model development work, it is beneficial to adapt to the Linux operating system in advance.

For the student party, if the laboratory is doing research in the direction of deep learning and the resources are complete, then there should still be a GPU server. Do n’t worry about the problem of the GPU server.

However, due to limited conditions, the laboratory is not equipped with a GPU server, and you want to learn deep learning related knowledge. There are three methods:

1. Free cloud server Google Colab

Google Colab can barely be used. It is a free GPU server provided by Google . The GPU computing power provided is okay, but its main problem is that it needs to be over the wall and the storage space is small . The storage space of Google Colab is Get it from Google Drive. Google Drive only provides 15G of free storage space. If you want to expand the space, you still need to spend money.

If you want to use the free cloud server Google Colab, you can do your own Baidu tutorial.

2. Alibaba Cloud paid GPU cloud server

Alibaba Cloud provides GPU cloud server resources. There are two payment models: monthly subscription and pay-per-flow. There are P4 servers and even V100 servers. The performance is strong, and the price is also very touching. The two words describe it as very expensive , and individual users do not recommend it. In addition to Alibaba Cloud providing GPU cloud services, Tencent, Baidu, and Huawei all have corresponding services, but they are all expensive.

3. Configure a computer host

You can configure a desktop host by yourself, which is also an investment in yourself. To configure a good host that can be used for deep learning training needs about 6000 yuan .

The training of deep learning depends on the performance of the graphics card , so you need to configure a better N card, which is the NVIDIA graphics card. The trick of selecting the graphics card is to look at the graphics ladder diagram:

 

 This graphics ladder diagram mainly includes the ranking of commonly used graphics cards in the market , excluding graphics cards with a price of 100,000 like V100.

The higher the ladder diagram, the higher the performance of the graphics card. Do not choose the AMD graphics card on the right . Although the performance is good, the A card does not support CUDA.

According to your own budget, choose a graphics card. The graphics card's video memory should be as large as 8G or more . Deep learning model training consumes memory resources.

I bought MSI's RTX 2060 Super at a price of 3399 yuan at the time of purchase. The graphics card does not maintain value, and the price will become lower and lower with time.

In fact, you can write a lot to configure the computer, such as the choice of CPU, computer motherboard, power supply, memory, heat sink, etc., it will not be expanded here. If you do not have the energy to assemble a desktop computer by yourself, you can directly buy a desktop computer with the corresponding graphics card, but the price will be more expensive than the desktop computer you assembled.

V. Development environment construction

Conditional, it is recommended to use the Ubuntu system to configure the development environment. Ubuntu is one of the Linux distributions, suitable for novices, friendly interface, simple operation.

Since the computer motherboard I purchased does not support Linux-based system installation, Windows will be used as the development environment in the future, but this does not affect the explanation of the algorithm principle and code.

My desktop configuration:

CPU:Intel i7 9700k

Graphics: RTX 2060 Super

System: Windows 10

After installing the Windows system and the necessary drivers, the tools to be installed are: CUDA, Anaconda3, cuDNN, Pytorch-gpu, Fluent Terminal (optional).

1 CUDA

CUDA is a computing platform launched by graphics card manufacturer NVIDIA. We need to choose the supported CUDA version according to the model of our graphics card. For example, RTX 2060 Super supports CUDA 10:

 

 

Fool-like installation is very simple.

After installation, you need to configure the environment variables of the system. Computer-> right mouse button-> properties-> advanced system settings-> environment variables-> Path

 

 

Add your own NVSMI path to the environment variable, I used the default installation address:

 

  

After configuration, you can use nvidia-smi command in cmd to view the graphics card.

2 Anaconda3

Anaconda is Python's package manager and environment manager, which can facilitate our installation of Python third-party libraries.

 

 Choose Python 3.7 version, the installation is also very simple, the next step is a fool.

After installation, system environment variables need to be added in the same way as when installing CUDA:

D:\Anaconda
D:\Anaconda\Scripts

Change the path to the Anaconda path you installed.

After configuration, running conda -V in cmd reports no errors and there is version information output, indicating that the configuration was successful.

3. CuDNN and Pytorch installation

cuDNN is a GPU accelerated library for deep neural networks. It emphasizes performance, ease of use, and low memory overhead.

After installing Anaconda, you can use conda to install cuDNN and Pytorch.

Open Anaconda Prompt, which is the command line tool that comes with Anaconda. You must first use this tool to create an environment and use the cmd that comes with the system directly. You may encounter some strange problems, such as CondaHTTPError. Enter in Anaconda Prompt:

conda create -n your_name jupyter notebook

The meaning of this sentence is to create a virtual environment named your_name, and this virtual environment additionally installs a third-party library of jupyter notebook. You can change your_name to your own favorite name, this name is the name of your virtual environment, you can choose it yourself, such as jack.

Then, enter y to install:

 

 

 

After installation, you can use the command conda info -e to view the existing environment.

 

 

 

As you can see from the picture above, there are two environments, one is base, the built-in basic environment, and the other is our newly created environment named jack. The reason for the new environment is that we can manage the environment we configured separately.

After installing the environment, we can activate the jack environment and install cuDNN and GPU version of Pytorch. Activate the environment named jack:

  • ·        
activatejack

 

  

As you can see, our environment has changed from base to jack. Install cuDNN in the jack environment:

  • ·        
conda install cudnn

After installing cuDNN, install Pytorch and open Pytorch official website:

 

 

 

According to your own environment selection, after the selection, the web page will automatically give instructions to be run. It may be necessary to distinguish between the Python version and the CUDA version.

Python version viewing method: Enter python directly on the command line and you will see the Python version.

 

 

To view the CUDA version, enter nvidia-smi in the command line:

 

  

After confirming the version, you can install the GPU version of Pytorch through the instructions provided on the Pytorch official website.

At this point, the basic environment has been built, congratulations.

4、Fluent Terminal

The basic environment is configured, and normal use is enough.

However, those who pursue beauty may feel that the command-line tools provided with Windows and the command-line tools provided by Anaconda are too ugly.

Is there a good-looking and easy-to-use Terminal? The answer is yes, but you need to configure it yourself, and there are still some pits that need to be stepped on slowly.

For example, Fluent Terminal, it is a modern terminal tool that I also recommend. It is a terminal emulator that is exclusive to the Windows platform and uses UWP technology to create ultra-high value. First look at the face value:

 

 

 

There are many tools for this kind of beautification, which need to be explored by yourself. Since this article is not a beautification article specifically for Terminal, there is no need to introduce these beautification tools in too much space. Those who like tossing can use their own Baidu according to their own needs.

 

 

Guess you like

Origin www.cnblogs.com/wujianming-110117/p/12683620.html