Use chatgpt to quickly realize business scenarios

/*
Prompt word:
I need to classify a piece of audio, input a piece of audio, the output judgment result is: 1. Laughter, 2. Applause, 3. Unknown. Please use tensorflow2.0 to write the program, use the convolutional neural network model, and use softmax for the mfcc feature recognition evaluation function. I now have and only have two types of audio data sets, one is laughter and the other is applause. Please give the program and data Set directory structure
Give a complete code, including the following parts:
1. Load and preprocess data
2. Build and compile the model
3. Train the model and save the model file
4. Load the model file
5. Use the model for prediction
*/

Ok, for your needs, here is a Python program written in TensorFlow 2.0 to classify audio. We will classify audio using a convolutional neural network model and MFCC feature recognition. The procedure is as follows:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint
import librosa
import os
import numpy as np

# 定义分类类别
classes = ['laugh', 'applause', 'unknown']

# 定义 MFCC 参数
num_mfcc = 20
n_fft = 2048
hop_length = 512
num_segments = 5

# 定义每个段的时间长度
segment_duration = 1

# 定义采样率
sample_rate = 44100

# 定义目录结构
train_dir = './data/train'
val_dir = './data/val'

# 加载数据并预处理
def load_data(data_dir):
  # 初始化数据和标签列表
  data = []
  labels = []

  # 处理每个类别的文件夹
  for i, cls in enumerate(classes):
    cls_dir = os.path.join(data_dir, cls)
    for audio_file in os.listdir(cls_dir):
      file_path = os.path.join(cls_dir, audio_file)

      # 加载音频文件
      signal, sr = librosa.load(file_path, sr=sample_rate)

      # 分割音频文件
      n_samples = len(signal)
      samples_per_segment = int(segment_duration * sample_rate / num_segments)
      num_mfcc_vectors_per_segment = math.ceil(samples_per_segment / hop_length)

      for j in range(num_segments):
        start_sample = samples_per_segment * j
        end_sample = start_sample + samples_per_segment

        mfcc = librosa.feature.mfcc(signal[start_sample:end_sample], sr=sr,
                                     n_fft=n_fft, n_mfcc=num_mfcc,
                                     hop_length=hop_length)
        mfcc = mfcc.T

        if len(mfcc) == num_mfcc_vectors_per_segment:
          data.append(mfcc.tolist())
          labels.append(i)

  # 转换标签列表为分类矩阵
  labels = to_categorical(labels, num_classes=len(classes))

  return np.array(data), np.array(labels)

# 加载和预处理训练数据
train_data, train_labels = load_data(train_dir)

# 加载和预处理验证数据
val_data, val_labels = load_data(val_dir)

# 构建和编译模型
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=train_data.shape[1:]))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(classes), activation='softmax'))

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# 训练模型并保存模型文件
checkpoint = ModelCheckpoint('model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)
history = model.fit(train_data, train_labels, epochs=50, batch_size=16,
                    validation_data=(val_data, val_labels),
                    callbacks=[checkpoint])

# 加载模型文件
model = tf.keras.models.load_model('model.h5')

# 使用模型进行预测
def predict(filename):
  signal, sr = librosa.load(filename, sr=sample_rate)

  # 分割音频文件
  n_samples = len(signal)
  samples_per_segment = int(segment_duration * sample_rate / num_segments)
  num_mfcc_vectors_per_segment = math.ceil(samples_per_segment / hop_length)

  segments = []
  for j in range(num_segments):
    start_sample = samples_per_segment * j
    end_sample = start_sample + samples_per_segment

    mfcc = librosa.feature.mfcc(signal[start_sample:end_sample], sr=sr,
                                 n_fft=n_fft, n_mfcc=num_mfcc,
                                 hop_length=hop_length)
    mfcc = mfcc.T

    if len(mfcc) == num_mfcc_vectors_per_segment:
      segments.append(mfcc.tolist())

  # 转换为 NumPy 数组形式
  segments = np.array(segments)

  # 预测音频类别
  predictions = model.predict(segments)
  predicted_class_indices = np.argmax(predictions, axis=1)
  return classes[predicted_class_indices[0]]

The above code includes the following parts:

  1. Load and preprocess data function: This function can load an audio dataset and preprocess it, including segmenting audio and extracting MFCC features.
  2. Model definition and compilation: This section defines the convolutional neural network model and compiles it using softmax as the evaluation function.
  3. Model training and saving: This section defines the training process of the model, and saves the best model file (that is, the model with the highest verification accuracy) at the end of each epoch.
  4. Model Loading: Used to load the best model file saved.
  5. predict function: This function can accept the file path of an audio file and use the loaded model to predict its category.

The directory structure of the dataset should look like this:

  • data
    • train
      • laugh
        • audio1.wav
        • audio2.wav
        • ...
      • applause
        • audio1.wav
        • audio2.wav
        • ...
    • val
      • laugh
        • audio1.wav
        • audio2.wav
        • ...
      • applause
        • audio1.wav
        • audio2.wav
        • ...

where laughand applauserepresent two different audio categories, trainand represent the training set and validation set, valrespectively . Each category folder contains multiple audio files.

The realized code, code analysis, and hints are given. What do you think after reading the process of a few seconds above! Far more than that!

Guess you like

Origin blog.csdn.net/huapeng_guo/article/details/130638295