Defect detection based on deep learning recognition model

1. Introduction

Defect detection is widely used in cloth defect detection, workpiece surface quality detection, aerospace fields, etc. The traditional algorithm can work well for the occasions where the rules are flawed and the scene is relatively simple. In recent years, recognition algorithms based on deep learning have become more and more mature, and many companies have begun to try to apply deep learning algorithms to industrial occasions.

2. Defect data

As shown in the figure below, the cloth data is used as an example here. There are the following three common defects: wear, white spots, and multiple lines.

insert image description hereinsert image description here
insert image description here
How to make training data? Here is the interception on the original image, and a small image is intercepted. For example, the above image is 512x512, and here I crop it into a small image of 64x64. Taking the first type of defect as an example, the following is the method of making the data.

insert image description here
insert image description here

Note: When making defect data, the defect area should account for at least 2/3 of the intercepted image, otherwise it will be discarded and not used as a defect image.

Generally speaking, the defect data is much less than the background data, there is no way, please refer to my other blog post here, image data enhancement https://blog.csdn.net/qq_29462849/article/details/83241797Finally
passed Enhanced data, defect: background = 1:1, each class is around 1000 ~~~

3. Network structure

The specific network structure used is as follows, the input size is 64x64x3, and the size of the intercepted small image is used. Each Conv convolutional layer is followed by a BN layer, and the specific layer parameters are as follows.
Conv1: 64x3x3
Conv2: Two
each of 128x3x3 ResNetBlock and DenseNetBlock. For details, please refer to Residual Network and DenseNet.
Add: Add the results output by the residual module and the results output by DenseNetBlock on the corresponding feature map. The addition method is the same as that of the residual module. Note that in fact, this is for better feature extraction, the method is not necessarily the residual module + DenseNetBlock, it can also be inception, or others.
Conv3: 128x3x3
Maxpool: stride=2, size=2x2
FC1: 4096
Dropout1: 0.5
FC2: 1024
Dropout1: 0.5
Softmax: Corresponding to the category to be divided, here I am the second category.

insert image description here

Regarding the final loss function, it is recommended to choose Focal Loss, which is the masterpiece of He Kaiming. The source code is as follows:

def focal_loss(y_true, y_pred):
    pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
    return -K.sum(K.pow(1. - pt_1, 2) * K.log(pt_1))

The data is ready, you can start training~~~

4. Defect detection of the whole scene image

In the above training network, the input is 64x64x3, but the entire scene image is 512x512. This input does not match the input of the model. What should I do? In fact, you can extract the trained model parameters, assign them to another new model, and then change the input of the new model to 512x512, but the feature map extracted at the conv3+maxpool layer is relatively large, this When the feature map is mapped to the original image, for example, after the last maxpool layer of the original model, the output feature map size is 8x8x128, of which 128 is the number of channels. If the input is changed to 512x512, the output feature map becomes 64x64x128, and each 8x8 here corresponds to 64x64 on the original image, so that an 8x8 sliding window can be used to perform sliding clipping features on the 64x64x128 feature map. Then fatten the cropped features and send them to the fully connected layer. The details are shown in the figure below.

The fully connected layer also needs to rebuild a model, the input is the input after flatten, and the output is the output of the softmax layer. This is a simple little model.

insert image description here

Here is a code that reads the trained model parameters into another model

#提取特征的大模型
def read_big_model(inputs):
    # 第一个卷积和最大池化层
    X = Conv2D(16, (3, 3), name="conv2d_1")(inputs)
    X = BatchNormalization(name="batch_normalization_1")(X)
    X = Activation('relu', name="activation_1")(X)
    X = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name="max_pooling2d_1")(X)
    # google_inception模块
    conv_1 = Conv2D(32, (1, 1), padding='same', name='conv2d_2')(X)
    conv_1 = BatchNormalization(name='batch_normalization_2')(conv_1)
    conv_1 = Activation('relu', name='activation_2')(conv_1)
    conv_2 = Conv2D(32, (3, 3), padding='same', name='conv2d_3')(X)
    conv_2 = BatchNormalization(name='batch_normalization_3')(conv_2)
    conv_2 = Activation('relu', name='activation_3')(conv_2)
    conv_3 = Conv2D(32, (5, 5), padding='same', name='conv2d_4')(X)
    conv_3 = BatchNormalization(name='batch_normalization_4')(conv_3)
    conv_3 = Activation('relu', name='activation_4')(conv_3)
    pooling_1 = MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same', name='max_pooling2d_2')(X)
    X = merge([conv_1, conv_2, conv_3, pooling_1], mode='concat', name='merge_1')
    X = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max_pooling2d_3')(X)  # 这里的尺寸变成16x16x112
    X = Conv2D(64, (3, 3), kernel_regularizer=regularizers.l2(0.01), padding='same', name='conv2d_5')(X)
    X = BatchNormalization(name='batch_normalization_5')(X)
    X = Activation('relu', name='activation_5')(X)
    X = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max_pooling2d_4')(X)  # 这里尺寸变成8x8x64
    X = Conv2D(128, (3, 3), padding='same', name='conv2d_6')(X)
    X = BatchNormalization(name='batch_normalization_6')(X)
    X = Activation('relu', name='activation_6')(X)
    X = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same', name='max_pooling2d_5')(X)  # 这里尺寸变成4x4x128

    return X

def read_big_model_classify(inputs_sec):
    X_ = Flatten(name='flatten_1')(inputs_sec)
    X_ = Dense(256, activation='relu', name="dense_1")(X_)
    X_ = Dropout(0.5, name="dropout_1")(X_)
    predictions = Dense(2, activation='softmax', name="dense_2")(X_)
    return predictions
#建立的小模型
inputs=Input(shape=(512,512,3))
X=read_big_model(inputs)#读取训练好模型的网络参数
#建立第一个model
model=Model(inputs=inputs, outputs=X)
model.load_weights('model_halcon.h5', by_name=True)

5. Identify the positioning results

The above sliding window method can be positioned to the original image. The 8x8 sliding window positioned to the original image is 64x64. Similarly, in the original image, according to the different sliding window methods (here, the left and right and up and down steps are 16 pixels). There are more than one defect positions located, which involves the positioning accuracy. The way of voting here is actually to count each marked pixel position on the original image. When the number is greater than the specified threshold, it is judged as a defective pixel.

The recognition result is shown in the following figure:
insert image description here
insert image description here
insert image description here

6. Some Tricks

For the above case, in fact, the 64x64 size positioning box is not accurate enough, you can consider training a 32x32 size model, and then apply it in the same way as the 64x64 model, and finally vote based on the 32x32 positioning position and the 64x64 positioning position, but this will There is a problem involved, that is, the time will increase a lot, and it should be used with caution.

When there is little difference between the background and the foreground, the network should not be too deep, because the network that is too deep will basically learn the same things later, and there is no good distinguishing ability, which is why I don't use object detection here. The reason is that the depth of these detection model networks is often 50+, and the effect is not good, although there is a residual module as a backbone.

However, when the background and foreground are very different, you can choose a deeper network. At this time, the object detection method comes in handy.

7. About the source code

The code here is no longer open source, because the design is technically confidential, if you are interested, you can implement it yourself, it is not difficult~

Guess you like

Origin blog.csdn.net/qq_29462849/article/details/84763421