[1DCNN] Simply use self-made audio data sets for model training

This article intends to use the self-made watermelon data set for deep learning model training.
By analyzing the audio of the hand tapping the watermelon, performing fast Fourier transform to extract frequency domain features, using a one-dimensional convolutional neural network model for model training, and constructing Watermelon ripeness detection model.

1. Dataset preprocessing

1. Data collection

   The audio data set used for model training was purchased and recorded by myself.
   When collecting audio, the methods of tapping the watermelon are divided into three categories: beat (P), play (T), and knock (Q).

Information about Watermelon:
There are a total of 939 pieces of Watermelon audio collected.

The data set contains a total of 158 watermelon samples, and each watermelon has several pieces of audio data.
86 watermelon samples are selected as the training set, 36 test sets, and 36 verification sets. The
basic data ratio is 6:2:2

watermelon medium rare Mature overripe Partially born sum
Number of watermelons 25 76 31 26 158
Number of audios 145 440 192 162 939
Label 0 1 2 3

2. Data preprocessing

   Because of the self-collected data set, each audio duration is between 6 and 30 seconds, and contains multiple knock signals. The data needs to be preprocessed.

2.1 Endpoint detection

Detailed link for endpoint detection: endpoint detection using double threshold method using matlab

Endpoint detection: refers to determining the starting and ending points of speech from a signal containing speech.
Significance: It can not only increase the number of samples, but also reduce unnecessary calculations during network training and improve the accuracy of model training.

Endpoint detection renderings


Dual parameter dual threshold endpoint detection renderings

 

   It can be seen from the figure that after endpoint detection processing, the number of samples has increased and the signal length has been greatly reduced.

It is worth noting:
1. The audio time should not be too short.
   Assuming that the sampling frequency is 16kHZ, and each data sample needs to extract the first 1000 data of the frequency domain signal to represent the signal, it must be ensured that the duration of each intercepted audio is at least 0.0625s. (Of course, each intercepted audio contains at least one excitation signal)

Sampling frequency = number of sampling points per unit time = number of sampling points/sampling time

2. The threshold setting must be reasonable.
   Because there are too many audios, batch processing is required. Before batch processing, the threshold setting must be reasonable.
   Assume that only the single-parameter double-threshold method is used, that is, based on the short-term energy method. Endpoint detection sets the threshold (amp1 and amp2) based on the short-term average energy of the audio (or other parameters) so that the excitation signal can be found normally.

Insert image description here


Set thresholds appropriately to find effective excitation signals

Insert image description here

Set thresholds appropriately to find effective excitation signals

2.2 Data enhancement

   If the amount of data is not enough after endpoint detection of the audio, other data enhancements can be performed. (Endpoint detection is also considered data enhancement)

   When performing data enhancement, it is best to only make some small changes so that there is a small difference between the enhanced data and the source data. Remember not to change the structure of the original data, otherwise "dirty data" will be generated. By data processing the audio data Augmentation can help the model avoid overfitting and become more general.

The changes made to the audio are as follows: noise addition, waveform stretching, and treble correction.

Noise
added The noise added is Gaussian white noise with a mean value of 0 and a standard deviation of 1.

#####增加噪声#####
def add_noise(data):
    # 0.02为噪声因子
    wn = np.random.normal(0, 1, len(data))
    return np.where(data != 0.0, data.astype('float64') + 0.02 * wn, 0.0).astype(np.float32)

Waveform Stretching
changes the speed/duration of a sound without affecting its pitch.

#####波形拉伸#####
def time_stretch(x, rate):
    # rate:拉伸的尺寸,
    # rate > 1 加快速度
    # rate < 1 放慢速度
    return librosa.effects.time_stretch(x, rate)

Treble correction
Pitch correction only changes the pitch without affecting the speed of sound.

#####音高修正#####
def pitch_shifting(x, sr, n_steps, bins_per_octave=12):
    # sr: 音频采样率
    # n_steps: 要移动多少步
    # bins_per_octave: 每个八度音阶(半音)多少步
    return librosa.effects.pitch_shift(x, sr, n_steps, bins_per_octave=bins_per_octave)

Example

import librosa
import numpy as np
import matplotlib.pyplot as plt
import soundfile as sf
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示符号
fs = 16000
wav_data, y = librosa.load(r"C:\Users\Administrator\Desktop\test\10_28_1190001.wav", sr=fs, mono=True)

#####1.增加噪声#####
def add_noise(data):
    # 0.02为噪声因子
    wn = np.random.normal(0, 1, len(data))
    return np.where(data != 0.0, data.astype('float64') + 0.02 * wn, 0.0).astype(np.float32)

#####3.波形拉伸#####
def time_stretch(x, rate):
    # rate:拉伸的尺寸,
    # rate > 1 加快速度
    # rate < 1 放慢速度
    return librosa.effects.time_stretch(x, rate)
#####4.音高修正#####
def pitch_shifting(x, sr, n_steps, bins_per_octave=12):
    # sr: 音频采样率
    # n_steps: 要移动多少步
    # bins_per_octave: 每个八度音阶(半音)多少步
    return librosa.effects.pitch_shift(x, sr, n_steps, bins_per_octave=bins_per_octave)

data_noise = add_noise(wav_data)
data_stretch = time_stretch(wav_data, rate=2)
data_pitch2 = pitch_shifting(wav_data, fs, n_steps=-6, bins_per_octave=12)   # 向下移三音(如果bins_per_octave为12,则六步)

# 绘图
plt.subplot(2, 2, 1)
plt.title("波形图", fontsize=15)
time = np.arange(0, len(wav_data)) * (1.0 / fs)
plt.plot(time, wav_data)
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('振幅', fontsize=15)

plt.subplot(2, 2, 2)
plt.title("加噪", fontsize=15)
plt.plot(time, data_noise)
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('振幅/Hz', fontsize=15)

plt.subplot(2, 2, 4)
plt.title("高音修正", fontsize=15)
plt.plot(time, data_pitch2)
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('振幅/Hz', fontsize=15)

plt.subplot(2, 2, 3)
plt.title("波形拉伸", fontsize=15)
time = np.arange(0, len(data_stretch)) * (1.0 / fs)
plt.plot(time, data_stretch)
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('振幅/Hz', fontsize=15)

plt.tight_layout()
plt.show()

 

Insert image description here

Audio data enhancement renderings

 

2.3 Fast Fourier Transform (FFT)

 
   Perform fast Fourier transform on the signal obtained after endpoint detection (or data enhancement) to obtain its amplitude-frequency characteristics. Extract the first 1000 data (or more) of the frequency domain signal to represent the signal.
 
   View Fourier transform from a physical perspective , it actually helps us change the traditional method of analyzing signals in the time domain to thinking about analyzing problems in the frequency domain. The following three-dimensional diagram can help us better understand this conversion of angles: the
Insert image description here   front time domain signal After decomposition by Fourier transform, it becomes a superposition of different sine wave signals. We then analyze the frequencies of these sine waves to transform a signal into the frequency domain. It is difficult to see the characteristics of some signals in the time domain. But if it is transformed to the frequency domain, it is easy to see the characteristics. This is why many signal analyzes use FFT transformation. In addition, FFT can extract the spectrum of a signal, which is also often used in spectrum analysis.

   For computers, only discrete and finite-length data can be processed. Other transformation types can only be used in mathematical calculations. In front of computers, we can only use the DFT method, and FFT is just a fast version of DFT. algorithm.

Regarding how to implement fft, there is an fft library in numpy.
The specific program examples are as follows:

import numpy.fft as nf
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt
import os
import re
from scipy.io import wavfile
# coding=utf-8
import os
import shutil
import pandas as pd
import numpy as np
import scipy.io as sio

# 对原始数据进行fft快速傅里叶变换之后,
# 每个类别整体对数据进行归一化并写入csv文件里存储数据
def myfft(sourceDir, targetDir):
    # 列出源目录文件和文件夹
    for file in os.listdir(sourceDir):
        # 拼接完整路径
        sourceFile = os.path.join(sourceDir, file)
        for files in os.listdir(sourceFile):
            sourceFile1 = os.path.join(sourceFile, files)
            data = []
            for files1 in os.listdir(sourceFile1):
                # 对每一个音频数据进行fft快速傅里叶变换
                try:
                    rate, data1 = wavfile.read(f'{
      
      sourceDir}/{
      
      file}/{
      
      files}/{
      
      files1}')
                    xf = np.fft.fft(data1)  # 快速傅里叶变换
                    xff = np.abs(xf)  # 取复数的绝对值,即复数的模(双边频谱)
                    n = 2 ** 15  # n>2*rate 即>32000 ,取了2^15 = 32768
                    y = xff / n  # 归一化处理
                    y1 = y[0:(int(n / 2))]  # 由于对称性,只取一半区间
                    data.append(y1[0:1000])  #取前一千个数据点
                except Exception as e:
                    continue
            data = np.array(data)
            print(data.shape)
            #对所有的数据进行标准化处理
            zscore_scaler = preprocessing.StandardScaler() 
            data2 = zscore_scaler.fit_transform(data)
            test = pd.DataFrame(data=data2)  
            test.to_csv(f'{
      
      targetDir}/{
      
      file}{
      
      files}.csv', encoding='utf-8', header=False,index=False)  # 保存为csv格式文件

if __name__ == "__main__":
		# 每个类别分别进行一次,完成之后得到4个.csv文件
    myfft("F:/文件/文件/watermelon/watermelon_data/split_data",
              "F:/文件/文件/watermelon/make_DataSet1")

 

Insert image description here

Example of .csv file with four categories obtained from original folder distribution and target folder
 

Insert image description here


Single audio FFT renderings

2.4 Data set production

   There are a total of more than 900 original audio tracks, which have been expanded to more than 60,000 after endpoint detection data enhancement.

Watermelon ripeness and corresponding labels

Maturity Label
medium rare 0
Mature 1
overripe 2
Partially born 3

Data set partitioning

Why divide the data set?
   Deep learning uses a large number of linear classifiers or non-linear classifiers, differentiable or non-differentiable excitation functions, and pooling layers to automatically extract the characteristics of the observation object.

Having such good classification ability will bring about two problems:
   in complex networks, so many W have long lost the meaning of weights in statistics, cannot get clear physical explanations, and cannot effectively carry out reverse engineering. Research.
   The network can learn a lot of things, including noise information or special case information contained in the sample.

Therefore, the reasons and prevention methods of overfitting are as follows:
Reason: Too few samples are not enough to generalize their common features. Too many parameters can fit extremely complex feature content.
Reason for improvement: Increase the number of samples, the more the better in theory.
Check Method: Take some samples for verification. Usually all the sample data obtained will be divided into three sets.

Training set : A sample set used for learning. These vectors are used to determine each specific coefficient in the network. It is used to train the model.

Validation set : It is a sample set used to adjust the parameters of the classifier. During the training process, the network model will be immediately verified on the validation set. We will simultaneously observe how the model performs on the validation set data and whether the loss function value is Will the accuracy decrease? Is the accuracy improving? It is used to adjust the hyperparameters.

Test set : It is a part of the data set set up to test the model's ability after training, mainly its classification ability. It is used to test the accuracy of the model and evaluate its generalization ability.

Since the data
   is a small-scale data set, the data division ratio is based on7 : 3 : 1The ratio is divided into training set, test set, and verification set. The training set and test set are divided into the proportion of 70% and 30%, and then 10% is taken from the training set as the verification set.

 

Processing summary
   : After endpoint detection and data enhancement of the original audio, fast Fourier transform is performed, and the first 1000 data points of each audio signal are taken as its characteristic values. The
   audio data characteristic values ​​of each maturity level are summarized. , stored in csv format for subsequent model training.

 
Read the csv data of 4 categories and label them accordingly.
The program code is as follows:

# 汇总所有的数据,并绘制标签
def makeall(file0, file1, file2, file3, name, savefile):
    eight_medium = pd.read_csv(file0, header=None)  # 8分熟
    mature = pd.read_csv(file1, header=None)  # 成熟
    overrripe = pd.read_csv(file2, header=None)  # 过熟
    ripe_yet = pd.read_csv(file3, header=None)  # 偏生

    bfs_data = np.asarray(eight_medium)
    cs_data = np.asarray(mature)
    gs_data = np.asarray(overrripe)
    ps_data = np.asarray(ripe_yet)

    data = np.concatenate((bfs_data, cs_data, gs_data, ps_data))
    bfs_label = np.zeros((bfs_data.shape[0], 1))
    cs_label = np.ones((cs_data.shape[0], 1))
    gs_label = 2 * np.ones((gs_data.shape[0], 1))
    ps_label = 3 * np.ones((ps_data.shape[0], 1))

    label = np.concatenate(((bfs_label, cs_label, gs_label, ps_label)))

    variable = pd.DataFrame(label)  # 将变量转化为dataframe数据结构
    variable.to_csv(f'{
      
      savefile}/{
      
      name}_label.csv', header=None, index=None)
    variable = pd.DataFrame(data)  # 将变量转化为dataframe数据结构
    variable.to_csv(f'{
      
      savefile}/{
      
      name}_data.csv', header=None, index=None)  # 存储为没有表头和索引的csv文件


if __name__ == "__main__":
    savefile = './makelabel/z-score_fft_alldata_0'
    name = 'train'
    savename = 'train_all'
    t0_train = f'{
      
      savefile}/{
      
      name}0.csv'
    t1_train = f'{
      
      savefile}/{
      
      name}1.csv'
    t2_train = f'{
      
      savefile}/{
      
      name}2.csv'
    t3_train = f'{
      
      savefile}/{
      
      name}3.csv'
    makeall(t0_train, t1_train, t2_train, t3_train, savename, savefile)

Insert image description here

After processing, two .csv files are obtained, as shown in the figure
 

2. Model training

1.Model design

The proposed model design is based on a simple convolutional neural network, with additions and improvements made to its structure.

Tensorflow
   deep learning Tensorflow framework. Whether on a server, edge device, or network, TensorFlow makes it easy to train and deploy models; build and train advanced models without sacrificing speed or performance.

Keras
   tf.keras is the high-level API interface of TensorFlow 2.0. It provides new styles and design patterns for TensorFlow code, which greatly improves the simplicity and reusability of TF code. Officials also recommend using tf.keras for model design. and development.
   Use the deep learning model in Keras - the universal model (Model functional model) to define the model. The universal model can design very complex neural networks with any topology. Compared with the sequential model (Sequential), it can only be designed linearly and sequentially. By adding layers, the general model can more flexibly construct the network structure and set the relationship between each level. The
   functional model interface is a way for users to define complex models such as multi-output models, acyclic directed models or models with shared layers. In other words, As long as your model is not a one-stop model like VGG, or your model requires more than one output, then you should always choose a functional model. Functional models are the most common type of model, sequential models ( Sequential) is just a special case of it.

CNN
   Convolutional Neural Network (CNN) is a feed-forward neural network often used to process data with a grid-like structure, such as images or sounds. CNNs excel in image and speech recognition because they can Features are automatically extracted without manual intervention.

   The core of CNN is the convolutional layer, which performs a convolution operation on the input data by sliding a small window (called a convolution kernel) to extract local features. The convolutional layer is usually followed by a pooling layer (Pooling Layer), which is used to reduce the dimension of the feature map and reduce the amount of calculation.
 

Convolutional networks have two major characteristics:
① There is at least one convolutional layer used to extract features.
② The convolutional layer of the convolutional network works through weight sharing, which greatly reduces the number of weights W, making it possible to During training, the convergence speed is significantly faster than that of the fully connected BP network when reaching the same recognition rate.

The following is a one-dimensional convolutional neural network (1DCNN) based on a simple convolutional neural network.
The model design is as follows

Specific parameters for each layer of model design

Layer (type) Kernel_size Filter number Output Shape
conv1d 3×1 8 ( 1000, 8)
max_pooling1d ( 1000, 8)
batch_normalization ( 1000, 8)
conv1d_1 3×1 16 ( 1000, 16)
max_pooling1d_1 ( 1000, 16)
batch_normalization_1 ( 1000, 16)
conv1d_2 3×1 32 ( 1000, 32)
max_pooling1d_2 ( 500, 32)
batch_normalization_2 ( 500, 32)
flatten 16000
dropout 16000
dense 32 32
dense_2 4 4

The procedure is as follows

def mymodel():
    inputs = keras.Input(shape=(1000, 0))  
    h1 = layers.Conv1D(filters=8, kernel_size=3, strides=1, padding='same', activation='relu')(inputs)
    h1 = layers.MaxPool1D(pool_size=2, strides=1, padding='same')(h1)
    h1 = layers.BatchNormalization()(h1)

    h2 = layers.Conv1D(filters=16, kernel_size=3, strides=1, padding='same', activation='relu' )(h1)
    h2 = layers.MaxPool1D(pool_size=2, strides=1, padding='same')(h2)
    h2 = layers.BatchNormalization()(h2)

    h2 = layers.Conv1D(filters=32, kernel_size=3, strides=1, padding='same', activation='relu')(h2)
    h2 = layers.MaxPool1D(pool_size=2, strides=2, padding='same')(h2)
    h2 = layers.BatchNormalization()(h2)

    h3 = layers.Flatten()(h2)  # 扁平层,方便全连接层传入数据
    h4 = layers.Dropout(0.2)(h3)  # Droupt层舍弃百分之20的神经元

    h4 = layers.GaussianNoise(0.005)(h4)
    h4 = layers.Dense(32, activation='relu')(h4)  # 全连接层,输出为32
  
    outputs = layers.Dense(4, activation='softmax')(h4)  # 再来个全连接层,分类结果为4种
    deep_model = keras.Model(inputs, outputs, name='1DCNN')  # 整合每个层,搭建1DCNN模型成功
    return deep_model

2. Model training hyperparameter settings

   Hyperparameters usually refer to some parameter values ​​set before the start of the training step of the machine learning algorithm. These parameter values ​​​​usually cannot be learned through the algorithm itself - in contrast, they can be learned in the algorithm. Or those parameters learned, such as weight w and bias b.

Batch training parameters
   When training a neural network, two parameters need to be set, namely batch and epoch.
   The first is the (batch) size, the total number of samples in a batch, that is, the number of samples used to train the network once. When training the network, the All data input into the network requires too much calculation. We generally divide the data into several batches, pass them to the network batch by batch, and update the parameters after each batch is transmitted. This has two advantages. On the one hand, it is a batch All the data in the gradient jointly determine the direction of this gradient descent, so it is not easy to go astray when descending, reducing randomness; on the other hand, because the number of samples in a batch is much smaller than the entire data set, the amount of calculation is also small. Not very large.
   The number of training sets read each time is called the batch size. In a convolutional neural network, a large batch can usually make the network converge faster. However, due to the limitation of memory resources, the batch is too large. It may cause insufficient memory or program kernel crash. Bath_size usually takes a value of [16, 32, 64, 128].
    Secondly, putting all the sample data into the neural network model for one training is called 1 Epoch.
   Assume that the number of all samples is 1,000 , we set the Batch Size to 10, that is, reading 10 data at a time for training, then a training round of data needs to be read 100 times to complete the training. In
   this design, the Epoch is set to 100, and the Batch Size is set to 128.

Loss function
   The loss function is a function used to measure the difference between the model's prediction results and the real results. In machine learning, we usually use the loss function to optimize the parameters of the model so that the model can better fit the data. Common loss functions Including mean square error, cross entropy, etc. This design chose "sparse_categorical_accuracy".

Learning rate
   Learning rate (learning rate or lr) refers to the magnitude of updating network weights in the optimization algorithm. Learning rate is one of the hyperparameters that most affects performance. Compared with other hyperparameters, learning rate adjustment is a more A way to effectively control the effective capacity of the model. Therefore, in order to train a neural network, one of the key hyperparameters that need to be set is the learning rate. It is important to choose the optimal learning rate. The
   learning rate can be constant or gradually reduced, Momentum-based or adaptive. Different optimization algorithms determine different learning rates. When the learning rate is too large, the model may not converge, and the loss keeps oscillating up and down; when the learning rate is too small, the model converges slowly, which requires Longer training time. Usually the value of lr is [0.01, 0.001, 0.0001].
   The learning rate of this design is set to 0.001.

Optimizer
   When the data, model and loss function are determined, the mathematical model of the task has been determined, and then a suitable optimizer (Optimizer) must be selected to optimize the model. The
   optimizer base class provides a method for calculating gradient loss, and Gradient can be applied to variables. The most commonly used optimizers are SGDM and Adam. (SGD) although the convergence is slow, adding momentum Momentum can speed up the convergence. At the same time, the stochastic gradient descent algorithm with momentum has a better optimal solution. That is to say, the model will have higher accuracy after it converges. SGDM is widely used in CV, while Adam basically sweeps across NLP, RL, GAN, speech synthesis and other fields. Currently, Adam is a fast-converging and commonly used optimizer, such as In the field of NLP, classic models such as Transformer and BERT all use Adam and its variant AdamW.
   Therefore, the algorithm designed for adaptive learning of the learning rate this time is the Adam learning rate optimization algorithm.

3. Model performance evaluation indicators

   Measuring the quality of a model is a key issue in deep learning. Model evaluation is to judge whether the model fitted by the neural network is excellent. In many cases, it is difficult to judge the quality of the model at a glance, so many model evaluation indicators have appeared, which is confusing. The matrix is ​​one of the indicators for judging the model results. The confusion matrix is ​​the most basic, most intuitive and simplest method to calculate the accuracy of the classification model. The confusion matrix counts the number of observations classified by the classification model into the wrong class and the right class respectively. Then the results are displayed in a table. Taking the two-classification model as an example, the confusion matrix form is as shown in the figure.

Insert image description here


Confusion matrix plot

 

4. Model training

 
After determining the model design and model training hyperparameter settings, model training begins.

The implementation code is as follows

"""
本段本代码是进行模型训练

"""
import  os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
#os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from tensorflow.keras import layers, models, Model, Sequential
import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import layers, Sequential
from numpy import random

#固定随机种子,在调用seed_tensorflow后还需设置model.fit中shuffle=False、worker=1.
#保证每次训练结果一致
def seed_tensorflow(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1' 
seed_tensorflow(42)




def read_csv_file(train_data_file_path, train_label_file_path):
    """
    读取csv文件并将文件进行拼接
    :param train_data_file_path: 训练数据路径
    :param train_label_file_path: 训练标签路径
    :return: 返回拼接完成后的路径
    """
    # 从csv中读取数据
    train_data = pd.read_csv(train_data_file_path, header=None)
    train_label = pd.read_csv(train_label_file_path, header=None)

    # 将数据集拼接起来
    # 数据与标签拼接
    dataset_train = pd.concat([train_data, train_label], axis=1)
    dataset = pd.concat([dataset_train], axis=0).sample(frac=1, random_state=0).reset_index(drop=True)
    return dataset


def get_train_test(dataset, data_ndim=1):
    # 获得训练数据和标签
    X_train = dataset.iloc[:, :-1]
    y_train = dataset.iloc[:, -1]

    # 为了增加模型的泛化能力,需要打乱数据集
    index = [i for i in range(len(X_train))]
    random.seed(42)
    random.shuffle(index)
    X_train = np.array(X_train)[index]
    y_train = np.array(y_train)[index]

    # 改变数据维度让他符合(数量,长度,维度)的要求
    X_train = np.array(X_train).reshape(X_train.shape[0], X_train.shape[1], data_ndim)

    print("X shape: ", X_train.shape)

    return X_train, y_train

# 保存最佳模型
class CustomModelCheckpoint(keras.callbacks.Callback):  # 使用回调函数来观察训练过程中网络内部的状态和统计信息r然后选取最佳的进行保存
    def __init__(self, model, path):  # (自定义初始化)
        self.model = model
        self.path = path
        self.best_loss = np.inf  # np.inf 表示+∞,是没有确切的数值的,类型为浮点型  自定义最佳损失数值

    def on_epoch_end(self, epoch, logs=None):  # on_epoch_end(self, epoch, logs=None)在每次迭代训练结束时调用。在不同的方法中这个logs有不同的键值
        val_loss = logs['val_loss']  # logs是一个字典对象directory;
        if val_loss < self.best_loss:
            print("\nValidation loss decreased from {} to {}, saving model".format(self.best_loss, val_loss))
            self.model.save_weights(self.path, overwrite=True)  # overwrite=True覆盖原有文件  # 此处为保存权重没有保存整个模型
            self.best_loss = val_loss



def mymodel():
    inputs = keras.Input(shape=(1000, 1))
    h1 = layers.Conv1D(filters=8, kernel_size=3, strides=1, padding='same', activation='relu')(inputs)
    h1 = layers.MaxPool1D(pool_size=2, strides=1, padding='same')(h1)
    h1 = layers.BatchNormalization()(h1)

    h2 = layers.Conv1D(filters=16, kernel_size=3, strides=1, padding='same', activation='relu')(h1)
    h2 = layers.MaxPool1D(pool_size=2, strides=1, padding='same')(h2)
    h2 = layers.BatchNormalization()(h2)

    h2 = layers.Conv1D(filters=32, kernel_size=3, strides=1, padding='same', activation='relu')(h2)
    h2 = layers.MaxPool1D(pool_size=2, strides=2, padding='same')(h2)
    h2 = layers.BatchNormalization()(h2)

    h3 = layers.Flatten()(h2)  # 扁平层,方便全连接层传入数据
    h4 = layers.Dropout(0.2)(h3)  # Droupt层舍弃百分之20的神经元

    h4 = layers.GaussianNoise(0.005)(h4)
    h4 = layers.Dense(32, activation='relu' )(h4)  # 全连接层,输出为32

    outputs = layers.Dense(4, activation='softmax')(h4)  # 再来个全连接层,分类结果为4种

    deep_model = keras.Model(inputs, outputs, name='1DCNN')  # 整合每个层,搭建1DCNN模型成功

    return deep_model



def bulid(X_train, y_train, X_test, y_test,X_val,y_val, batch_size=128, epochs=100):
    """
    搭建网络结构完成训练
    :param X_train: 训练集数据
    :param y_train: 训练集标签
    :param X_test: 测试集数据
    :param y_test: 测试集标签
    :param X_val: 验证集数据
    :param y_val: 验证集标签
    :param batch_size: 批次大小
    :param epochs: 循环轮数
    :return: acc和loss曲线
    """

    model = mymodel()
    model.compile(optimizer=tf.keras.optimizers.Adam(lr = 0.001,decay=1e-3),
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
                  metrics=['sparse_categorical_accuracy'])

    history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,
                        validation_data=(X_val,y_val),
                        workers =1,
                        callbacks=[CustomModelCheckpoint(model, r'mybestcnn_minmax_fft_4.h5')])
    keras.models.save_model(model,'./mycnn.h5')
    model.summary()
    # 获得训练集和测试集的acc和loss曲线
    acc = history.history['sparse_categorical_accuracy']
    val_acc = history.history['val_sparse_categorical_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    # 评估模型
    scores = model.evaluate(X_test, y_test, verbose=1)
    print('%s: %.2f%%' % (model.metrics_names[1], scores[1] * 100))
    y_predict = model.predict(X_test)
    y_pred_int = np.argmax(y_predict, axis=1)
    print(classification_report(y_test, y_pred_int, digits=4))

    # 绘制acc曲线
    plt.subplot(1, 2, 1)
    plt.plot(acc, label='Training Accuracy')
    plt.plot(val_acc, label='Validation Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.legend()

    # 绘制loss曲线
    plt.subplot(1, 2, 2)
    plt.plot(loss, label='Training Loss')
    plt.plot(val_loss, label='Validation Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.show()


    #  绘制混淆矩阵
    y_pred_gailv = model.predict(X_test, verbose=1)
    y_pred_int = np.argmax(y_pred_gailv, axis=1)
    con_mat = confusion_matrix(y_test.astype(str), y_pred_int.astype(str))
    con_mat = np.delete(con_mat, [0, 2, 4, 6], axis=0)
    con_mat = np.delete(con_mat, [1, 3, 5, 7], axis=1)
    classes = list(set(y_train))
    classes.sort()
    # plt.imshow(con_mat, cmap=plt.cm.Blues)
    plt.imshow(con_mat, cmap='Blues')
    indices = range(len(con_mat))
    plt.xticks(indices, classes)
    plt.yticks(indices, classes)
    plt.colorbar()
    plt.title('Confusion Matrix')
    plt.xlabel('guess')
    plt.ylabel('true')
    for first_index in range(len(con_mat)):
        for second_index in range(len(con_mat[first_index])):
            plt.text(first_index, second_index, con_mat[second_index][first_index], va='center', ha='center')
    plt.show()



if __name__ == "__main__":

    """
    频域数据集
    """
    #训练集
    x_train_csv_path = f'./makelabel/train_all_data.csv'
    y_train_csv_path = f'./makelabel/train_all_label.csv'
    dataset1 = read_csv_file(x_train_csv_path, y_train_csv_path)
    X_train,  y_train = get_train_test(dataset=dataset1, data_ndim=1)
    #测试集
    x_test_csv_path = f'./makelabel/test_all_data.csv'
    y_test_csv_path = f'./makelabel/test_all_label.csv'
    dataset2 = read_csv_file(x_test_csv_path, y_test_csv_path)
    X_test, y_test = get_train_test(dataset=dataset2, data_ndim=1)
    #验证集
    x_val_csv_path = f'./makelabel/val_all_data.csv'
    y_val_csv_path = f'./makelabel/val_all_label.csv'
    dataset3 = read_csv_file(x_val_csv_path, y_val_csv_path)
    X_val, y_val = get_train_test(dataset=dataset3, data_ndim=1)
    # 模型训练
    bulid(X_train, y_train, X_test, y_test,X_val,y_val)

 

5. Model training results and analysis

5.1 Training results of three types of data sets

Insert image description here

Model training effect
 

As can be seen from the figure, the accuracy of the three data validation sets can reach 99%,
among which the Q data set has the highest accuracy.
The three data sets all reached more than 98% after training for 20 epochs.

Through the confusion matrix obtained after testing the test set in each data set, the detection effect of the model on each maturity level is analyzed, as shown in the table. The
individual recognition rates of the three data sets have reached 99%, and the average accuracy has also reached 99%,
the above data shows that the classification model is very robust and can better detect the maturity of watermelon.

Data set training effect
F1-score
data set
medium rare Mature overripe Partially born average
Q 0.9958 0.9972 0.9975 0.9954 0.9965
P 0.9950 0.9942 0.9948 0.9926 0.9941
T 0.9924 0.9955 0.9945 0.9925 0.9937
ALL 0.9935 0.9983 0.9973 0.9948 0.9960

 
In order to test the recognition effect of the model when different tapping methods are mixed, all the data are fused and then trained. The
training curve is as shown in the figure
. After 100 epochs, the model's verification set accuracy reaches 99.66%, and the loss converges. to 0.02.

Insert image description hereThe figure below shows the confusion matrix obtained after testing the test set in all data.
The confusion matrix shows that the model still has some misjudgments.
Through the confusion matrix, the average accuracy of the model on the test set can be obtained as 99.78%.
Insert image description here

5.2 Comparing model performance

   In order to compare the performance of the model, some classic convolutional neural network models are selected for comparative experiments.
   The selected convolutional neural network models are all two-dimensional and one-dimensionally adapted. The
   convolutional models selected in this article are ResNet18, MobileNet , AlexNet, VGGNet, and GoogLeNet were used for comparative experiments; before the experiment, the same training set, learning rate, batch training parameters (epoch, Batch Size), optimizer, and loss function were used to conduct the same training process.

The recognition results of each model are shown in the table

Compare model training effects
Identify model Model parameter quantity single category average
medium rare Mature overripe Partially born
1DCNN 514,516 0.9935 0.9983 0.9973 0.9948 0.9960
VGG16 41,619,780 0.9855 0.9967 0.9972 0.9855 0.9912
AlexNet 12,360,900 0.9757 0.9928 0.9913 0.9797 0.9849
GoogelNet 3,527,476 0.9792 0.9945 0.9938 0.9803 0.9870
ResNet18 3,856,772 0.9403 0.9741 0.9730 0.9587 0.9615
MobileNet 487,711 0.8765 0.9610 0.9607 0.9344 0.9332

 

Finally,
if there are any errors in the article, please feel free to point them out in the comment area or private message.
Beginners and novices are still learning, so I hope Haihan~

Guess you like

Origin blog.csdn.net/tenju/article/details/131980298