[Depth application] · DC race bearing fault detection open source Baseline (based Keras1D convolution val_acc: 0.99780)

Copyright Notice: Copyright - Xiao Song is it - yansongsong.cn - welcome to reprint https://blog.csdn.net/xiaosongshine/article/details/89007098

[Depth application] · DC race bearing fault detection open source Baseline (based Keras1D convolution val_acc: 0.99780)

Website ->  http://www.yansongsong.cn

Github project address ->  https://github.com/xiaosongshine/bearing_detection_by_conv1d

 

Competition Introduction

It is one of the key member having a bearing widely used in mechanical equipment. Due to overload, fatigue, wear, corrosion, bearing easily damaged during operation of the machine. In fact, more than 50% of rotating machinery faults and related bearing failure. In fact, bearing failure could lead to violent shaking equipment, equipment downtime, to stop production, and even casualties. In general, a weak early bearing failure are complex, difficult to detect. Therefore, bearing condition monitoring and analysis is very important, it can detect early weak fault bearing prevent failure caused the loss. Recently, fault detection and diagnosis of bearing always been of concern. Bearing fault diagnosis method for all types of vibration signal analysis is one of the most important and useful tool. In this competition, we offer a real bearing vibration signal data set, the players need to use machine learning techniques to determine the operating state of the bearing.

Competition website

 

Data presentation

There are three kinds of bearing failure: the failure of the outer ring, the inner ring fault, the fault of the ball plus the normal working condition. As shown in Table 1, three kinds of diameter (a diameter of 2, 3 in diameter) bearing a binding, the operating state of the bearing 10 categories:

Players need to design model to classify working condition of the bearing vibration signal bearing operation.

 

1.train.csv, training data, 1-6000 vibration signal value in chronological successive samples, each row of data is a sample of data 792, first column id field is the sample number, the last field is a label tag data, i.e., the bearing working condition, represented by numbers 0 to 9.

2.test_data.csv, the test data set, a total of 528 data, in addition to no label field, other fields with the training set. In general, each row of data after removing the label id and bearing vibration signal data over time, the player needs to these vibration signals to the bearing working condition determination label.

Note: The data in the same column are not necessarily the same sample data point of time, that is, not to each column as a feature

 

Download Data

 

 * Data download specific operations:

ps: registration login before downloading

 


Grading

Scoring algorithm
binary-classification

Each index category F1 using the arithmetic mean, which is the harmonic mean of Recall and Precision.

Wherein, Pi is the i-th corresponding to the type Precision, Ri is the i-th species correspond Recall.

 

The title match analysis

Simple analysis, this game we can be simply understood as a problem of classification of 10, the shape of the input (-1,6000), the result of network output is (1,10) (used here onehot form)

The title match is a very class prediction problem, problem-solving ideas should include the following

  1. Reading and processing data
  2. Build a network model
  3. Training model
  4. Model application and submit predictions

 

Combat application

After analysis of the title match, we break the task into four small task The first step is:

1. Data reading and processing

Data of a CSV file, 1-6000 signal value of the vibration time series continuous sampling, each row of data is a sample, a total of 792 data, the first column id field is the sample number, and finally a label field of the label data, i.e., the bearing working state, represented by the numbers 0 to 9.

Data processing function definition:


import keras
from scipy.io import loadmat
import matplotlib.pyplot as plt
import glob
import numpy as np
import pandas as pd
import math
import os
from keras.layers import *
from keras.models import *
from keras.optimizers import *
import numpy as np

MANIFEST_DIR = "Bear_data/train.csv"
Batch_size = 20
Long = 792
Lens = 640

#把标签转成oneHot
def convert2oneHot(index,Lens):
    hot = np.zeros((Lens,))
    hot[int(index)] = 1
    return(hot)

def xs_gen(path=MANIFEST_DIR,batch_size = Batch_size,train=True,Lens=Lens):

    img_list = pd.read_csv(path)
    if train:
        img_list = np.array(img_list)[:Lens]
        print("Found %s train items."%len(img_list))
        print("list 1 is",img_list[0,-1])
        steps = math.ceil(len(img_list) / batch_size)    # 确定每轮有多少个batch
    else:
        img_list = np.array(img_list)[Lens:]
        print("Found %s test items."%len(img_list))
        print("list 1 is",img_list[0,-1])
        steps = math.ceil(len(img_list) / batch_size)    # 确定每轮有多少个batch
    while True:
        for i in range(steps):

            batch_list = img_list[i * batch_size : i * batch_size + batch_size]
            np.random.shuffle(batch_list)
            batch_x = np.array([file for file in batch_list[:,1:-1]])
            batch_y = np.array([convert2oneHot(label,10) for label in batch_list[:,-1]])

            yield batch_x, batch_y

TEST_MANIFEST_DIR = "Bear_data/test_data.csv"

def ts_gen(path=TEST_MANIFEST_DIR,batch_size = Batch_size):

    img_list = pd.read_csv(path)

    img_list = np.array(img_list)[:Lens]
    print("Found %s train items."%len(img_list))
    print("list 1 is",img_list[0,-1])
    steps = math.ceil(len(img_list) / batch_size)    # 确定每轮有多少个batch
    while True:
        for i in range(steps):

            batch_list = img_list[i * batch_size : i * batch_size + batch_size]
            #np.random.shuffle(batch_list)
            batch_x = np.array([file for file in batch_list[:,1:]])
            #batch_y = np.array([convert2oneHot(label,10) for label in batch_list[:,-1]])

            yield batch_x

A display data read

if __name__ == "__main__":
    if Show_one == True:
        show_iter = xs_gen()
        for x,y in show_iter:
            x1 = x[0]
            y1 = y[0]
            break
        print(y)
        print(x1.shape)
        plt.plot(x1)
        plt.show()

 

We can see a list of each of the leads are made of 6000-point consisting of the above information, we can be understood as mnist expanded into shape after a one-dimensional

 

Label treatment

def create_csv(TXT_DIR=TXT_DIR):
    lists = pd.read_csv(TXT_DIR,sep=r"\t",header=None)
    lists = lists.sample(frac=1)
    lists.to_csv(MANIFEST_DIR,index=None)
    print("Finish save csv")

 

I read how the data is used in pattern generator, which can be read by batch, speed up the training, it can also be used to read it all, a matter of personal habit. About Builder reports, you can refer to my this blog

[Development skills]-depth learning to use the generator accelerating data read and Training Concise Guide (TensorFlow, pytorch, keras)

 

2.网络模型搭建

数据我们处理好了,后面就是模型的搭建了,我使用keras搭建的,操作简单便捷,tf,pytorch,sklearn大家可以按照自己喜好来。

网络模型可以选择CNN,RNN,Attention结构,或者多模型的融合,抛砖引玉,此Baseline采用的一维CNN方式,一维CNN学习地址

模型搭建

TIME_PERIODS = 6000
def build_model(input_shape=(TIME_PERIODS,),num_classes=10):
    model = Sequential()
    model.add(Reshape((TIME_PERIODS, 1), input_shape=input_shape))
    model.add(Conv1D(16, 8,strides=2, activation='relu',input_shape=(TIME_PERIODS,1)))

    model.add(Conv1D(16, 8,strides=2, activation='relu',padding="same"))
    model.add(MaxPooling1D(2))

    model.add(Conv1D(64, 4,strides=2, activation='relu',padding="same"))
    model.add(Conv1D(64, 4,strides=2, activation='relu',padding="same"))
    model.add(MaxPooling1D(2))
    model.add(Conv1D(256, 4,strides=2, activation='relu',padding="same"))
    model.add(Conv1D(256, 4,strides=2, activation='relu',padding="same"))
    model.add(MaxPooling1D(2))
    model.add(Conv1D(512, 2,strides=1, activation='relu',padding="same"))
    model.add(Conv1D(512, 2,strides=1, activation='relu',padding="same"))
    model.add(MaxPooling1D(2))

    model.add(GlobalAveragePooling1D())
    model.add(Dropout(0.3))
    model.add(Dense(num_classes, activation='softmax'))
    return(model)

用model.summary()输出的网络模型为

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
reshape_1 (Reshape)          (None, 6000, 1)           0
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2997, 16)          144
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1499, 16)          2064
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 749, 16)           0
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 375, 64)           4160
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 188, 64)           16448
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 94, 64)            0
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 47, 256)           65792
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 24, 256)           262400
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 12, 256)           0
_________________________________________________________________
conv1d_7 (Conv1D)            (None, 12, 512)           262656
_________________________________________________________________
conv1d_8 (Conv1D)            (None, 12, 512)           524800
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 6, 512)            0
_________________________________________________________________
global_average_pooling1d_1 ( (None, 512)               0
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130
=================================================================
Total params: 1,143,594
Trainable params: 1,143,594
Non-trainable params: 0
_________________________________________________________________
None

训练参数比较少,大家可以根据自己想法更改。

3.网络模型训练

模型训练

Show_one = True

Train = True

if __name__ == "__main__":
    if Show_one == True:
        show_iter = xs_gen()
        for x,y in show_iter:
            x1 = x[0]
            y1 = y[0]
            break
        print(y)
        print(x1.shape)
        plt.plot(x1)
        plt.show()


    if Train == True:
        train_iter = xs_gen()
        val_iter = xs_gen(train=False)

        ckpt = keras.callbacks.ModelCheckpoint(
            filepath='best_model.{epoch:02d}-{val_loss:.4f}.h5',
            monitor='val_loss', save_best_only=True,verbose=1)

        model = build_model()
        opt = Adam(0.0002)
        model.compile(loss='categorical_crossentropy',
                    optimizer=opt, metrics=['accuracy'])
        print(model.summary())

        model.fit_generator(
            generator=train_iter,
            steps_per_epoch=Lens//Batch_size,
            epochs=50,
            initial_epoch=0,
            validation_data = val_iter,
            nb_val_samples = (Long - Lens)//Batch_size,
            callbacks=[ckpt],
            )
        model.save("finishModel.h5")
    else:
        test_iter = ts_gen()
        model = load_model("best_model.49-0.00.h5")
        pres = model.predict_generator(generator=test_iter,steps=math.ceil(528/Batch_size),verbose=1)
        print(pres.shape)
        ohpres = np.argmax(pres,axis=1)
        print(ohpres.shape)
        #img_list = pd.read_csv(TEST_MANIFEST_DIR)
        df = pd.DataFrame()
        df["id"] = np.arange(1,len(ohpres)+1)
        df["label"] = ohpres
        df.to_csv("submmit.csv",index=None)

训练过程输出(最优结果:32/32 [==============================] - 1s 33ms/step - loss: 0.0098 - acc: 0.9969 - val_loss: 0.0172 - val_acc: 0.9924)

Epoch 46/50
32/32 [==============================] - 1s 33ms/step - loss: 0.0638 - acc: 0.9766 - val_loss: 0.2460 - val_acc: 0.9242

Epoch 00046: val_loss did not improve from 0.00354
Epoch 47/50
32/32 [==============================] - 1s 33ms/step - loss: 0.0426 - acc: 0.9859 - val_loss: 0.0641 - val_acc: 0.9848

Epoch 00047: val_loss did not improve from 0.00354
Epoch 48/50
32/32 [==============================] - 1s 33ms/step - loss: 0.0148 - acc: 0.9969 - val_loss: 0.0072 - val_acc: 1.0000

Epoch 00048: val_loss did not improve from 0.00354
Epoch 49/50
32/32 [==============================] - 1s 34ms/step - loss: 0.0061 - acc: 0.9984 - val_loss: 0.0404 - val_acc: 0.9857

Epoch 00049: val_loss did not improve from 0.00354
Epoch 50/50
32/32 [==============================] - 1s 33ms/step - loss: 0.0098 - acc: 0.9969 - val_loss: 0.0172 - val_acc: 0.9924

 

最后是进行预测与提交,代码在上面大家可以自己运行一下。

预测结果

排行榜:第24名 f1score 0.99780

 

 

 

 

展望

此Baseline采用最简单的一维卷积达到了99.8%测试准确率,这体现了一维卷积在一维时序序列的应用效果。

hope this helps

个人网站--> http://www.yansongsong.cn

项目github地址:https://github.com/xiaosongshine/bearing_detection_by_conv1d

欢迎Fork+Star,觉得有用的话,麻烦小小鼓励一下 ><

Guess you like

Origin blog.csdn.net/xiaosongshine/article/details/89007098