U-Net realizes defect detection

Dig a hole first, record the defect inspections that have been done before, and fill the hole when there is time

U-Net

The U-Net network was published in 2015. It was originally used for medical cell image segmentation, but for other segmentation problems, U-Net seems to have shown good performance [2]. The network structure is shown in the figure below, and the overall process is actually a process of encoding and decoding (encoder-decoder).
Insert picture description here
The U-Net network is a classic full convolutional network. The input is a 572×572 image. In the paper, the left side of the network is called the contracting path, which is a downsampling operation composed of convolution and Max Pooling.

The contracting path is mainly composed of 4 blocks, each block contains three layers of convolution and one layer of maximum pooling operation, and after each downsampling operation, the number of Feature Maps is multiplied by two, and the final size is 32× Feature Map of 32.

The right part of the network is called the expansive path, which is the same as the contracting path. It consists of 4 blocks. Before each block starts, the size of the feature map is doubled by deconvolution, and the number of feature channels is halved, and then The feature maps of the symmetrical contracting path on the left are merged. Since the size of the Feature Map of the compressed path on the left and the expanded path on the right are different, U-Net cuts the Feature Map of the contracting path to the Feature Map of the same size as the extended path. To normalize.

The size of the final feature map is 388×388. Since the paper is a two-classification task, two feature maps are output. In summary, the number of convolutional layers in U-Net is about 20, with 4 downsampling and 4 upsampling.

Algorithm implementation

The algorithm implementation process environment for this time is tensorflow2.0, and the keras that has been inherited in tf is called to implement the training process without GPU acceleration. The actual implementation process of the project is to build a virtual environment in Anaconda, and then complete the model training and verification process in the jupyter notebook.
Insert picture description here

  1. Reading pictures and preprocessing The
    above figure shows the structure of the algorithm model used this time. The pictures and label pictures are stored in the dataset folder. At the beginning, the image and the mask map are read, each of which is 1000 sheets, which are stored in the images and annotaions inside the code. Since the image needs to be segmented in binary this time, the best mask image in the training process is a single-channel binary image. However, it is actually observed that in addition to the values ​​of 0 and 255 in the mask image used for training, there are some discrete pixel gray values ​​between the two, so the mask image needs to be transcoded and then binarized to 0. The graph of the sum 1 distribution is stored in the anno variable. Randomly scramble the images and anno after completion, but make sure that the order after scrambled is still corresponding in turn. After that, you can start making datasets.

  2. Dataset production
    Dataset API is a new module introduced in TensorFlow 1.3, which mainly serves for data reading and constructing input data pipeline. Previously, there were generally two ways to read data in TensorFlow: using placeholder to read data in memory and using queue to read data in hard disk. The Dataset API supports reading from memory and hard disk at the same time, it has a large number of practical methods for processing data, and the syntax is more concise and easy to understand. The class diagram in the Dataset API officially given by Google is shown below. Dataset can be regarded as an ordered list of elements of the same type. In actual use, a single element can be a vector, a string, a picture, a tuple or a dictionary dict.
    Insert picture description here
    First use tf.data.Dataset.from_tensor_slices((image, anno)) to load the data. At this time, an element in the dataset is (image, anno). Enter dataset in the jupyter notebook command line to view the content, which is displayed as <TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.string)>.

    Then call dataset.map(load_dataset), map receives a function object, each element in the Dataset will be used as the input of this function, and the return value of the function will be used as the new Dataset. The load_dataset function called here completes the image size adjustment and normalization operations. After the call is completed, the storage format in the dataset is as follows: <ParallelMapDataset shapes: ((256, 256, 3), (256, 256, 1)), types: (tf.float32, tf.int32)>. Corresponding to 1000 pieces of data internally.

    After that, the dataset data set is divided, 800 pieces of data are randomly taken out as the training set dataset_train, and 200 pieces of data are taken as the validation set dataset_val. To transform the internal data of dataset_train, use the interface function of dataset in tf, batch() divides the internal elements into a batch of size 32 (the last group may be less than 32), shuttle() disrupts the internal elements again, repeat() will The entire sequence is repeated many times.

  3. Training the model and verifying
    Since keras is used, the model is first established, and the U-Net model establishment function create_model() is called to generate the frame and assign it to the model variable. Then model.compile() specifies the optimizer as Adam. Because it is binary segmentation, loss is defined as binary_crossentropy, and then the performance indicators during model training are specified in metrics as accuracy and mIOU. Generate classes for testing).

    Finally, model.fit() is called to train the U-Net model. Because it runs under the CPU, it takes a long time. The entire training process lasts about 18 hours.

  4. Training results The
    output images of loss and val_loss are as follows:
    Insert picture description here
    After the training is completed, save the model as unet_model.h5, and randomly take out some images from the test set to view the segmentation effect as follows (note that the pixels of the prediction result map are restored from 0~1 to 0, 255 Binary distribution). From left to right: the original image, the given mask image and the predicted segmentation image.
    Insert picture description here
    References
    [1] Evan Shelhamer, Jonathan Long, Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation[M]. IEEE Computer Society, 2017.
    [2] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation[J]. 2015.
    [3] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.
    [4] He K, Zhang X, Ren S, et al . Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 2016.
    [5] Han Hui. Research on industrial defect detection methods based on deep learning[D]. 2019.
    [6] Liu Cong. Research on surface defect detection technology of tiny parts based on convolutional neural network [D]. 2019.

I will definitely fill the hole, sure. . .

Guess you like

Origin blog.csdn.net/moumde/article/details/108004712