Project Interpretation_v2

1. Project introduction

  1. If you use task2-1 as an example, you need to confirm that process is calling a function when running process.py.preprocess_ast_wav2vec(wav, fr)

1.1 Mission introduction

The first open source pediatric breath sound data set, annotated by inviting 11 doctors;

The sampling frequency and quantization resolution of the digital stethoscope are 8 kHz and 16 bits respectively.

Breath sounds were weaker in pediatric participants than in adults. In addition, when collected from the chest, breath sounds are greatly affected by heart sounds. Therefore, respiratory sounds were acquired at four dorsal locations, including left posterior, left lateral, right posterior, and right side (Fig. 4). The collection duration at each position lasted over 9 seconds to ensure at least two respiratory cycles.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

292 participants, 8.2 hours in total.

  • There are a total of 2683 recording files at record level, and 9089 breath sound event levels are marked; (compared to 920 recording files in icbhi2017)

  • The recording file is marked with event level event level for task 1 task, and record level, for task2 task;

There are two major categories of tasks, as follows:

# Important Assumption (used in model/metric.py)
# Normal is always index 0
# PQ, if exists, is index 1

def resp_classes(task, level):
    assert task in (1,2), 'Task has to be either 1 or 2.'
    assert level in (1,2), 'Level has to be either 1 or 2.'
    if task==1:
        if level==1:
            CLASSES = ('Normal', 'Adventitious')  # 2 class
        elif level==2:          # 7 class
            CLASSES = ('Normal', 'Rhonchi', 'Wheeze', 'Stridor', 'Coarse Crackle', 'Fine Crackle', 'Wheeze & Crackle') 
    elif task==2:
        if level==1:   # 3 class;
            CLASSES = ('Normal', 'Poor Quality', 'Adventitious')
        elif level==2:    # 5 class;
            CLASSES = ('Normal', 'Poor Quality', 'CAS', 'DAS', 'CAS & DAS')
    return CLASSES

The mean duration of respiratory sound events and records are 1.3s and 11s, respectively.

For task 1, event-level audio, there are a total of 6656 audios in the training set;

task1-1: Two classification tasks: normal: 5159, Adventitious: 1497; Randomly expand the samples in the abnormal class to the same number as the normal samples;

task1-2:  七分类任务:the number of Normal, Rhonchi,Wheeze, Stridor, Coarse Crackle, Fine Crackle, and Wheeze & Crackle are 6,887, 53, 865, 17, 66, 1,167, and 34, respectively.

For task 2, recording-level audio, there are a total of 1949 audios in the training set;

task2-1: 3 classification task: normal: 1303, Adventitious:469 ‘Poor Quality’: 177 - Randomly expand the samples in the abnormal class to the same number as the normal samples;

task2-2: 5 classification tasks:

normal: 1303, ‘Poor Quality’: 177 , CAS,126, DAS: 248; CAS&DAS:95

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

icbhi data set 0

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

task1, event level classification, event level:

Training set: 6656 audio events

Test set: Corresponds to 2433 audio events;

task2, classification of recording levels, record level,

Training set: Contains 1949 recordings, (note that it will be reduced to 1772 recordings through subsequent filtering task2;)

Test set: 734 recordings,

1.2 Data preprocessing

preprocess.pyData preprocessing, please refer to Section 9 for detailed analysis process;

Among them, according to the configuration in task_config.jsondata_loader, input_dir , the option is task1 correspondingprocessed_wav2vec or task2 correspondingprocessed_ast_wav2vec ,

According to the above different tasks, the preprocess() function will call different preprocessing functions,  processed_wav2vec() or processed_ast_wav2vec(),

1.3 Dataset creation

Create a subclass of Dataset for creating data sets;

In__getitem(), generate a training sample and the label of the sample;

Note that the training samples here can be original audio data;

Alternatively, the processed features can be directly input into the network for training.

And using data enhancement in __getitem__() can make each batch use different data enhancement methods;

# location,   data/SPRSound/Dataset.py
from torch.utils.data import Dataset 
# RespDataLoader 中调用当前类 RespDataset();

class RespDataset(Dataset):
    def __init__(self, data_dir, task, input_dir=None):
        assert task in (1,2)
        self.task = task
        task_file_name = 'task1.csv' if task==1 else 'task2_filtered.csv'
        # task_file_name = f'task{task}.csv'
        self.csv = pd.read_csv(join(data_dir, task_file_name))
        self.input_dir = input_dir
        if input_dir is None:       # note, 这里使用的原始划分的音频文件;
            if task == 1:       # 若果没有指定 input dir 用于训练的音频文件, 则 clip 中存放的是task1 的事件级别的检测任务;
                self.dir = join(data_dir, 'clip')
            else:           # 如果, task2, 使用wav 文件,其中存放的是record 记录级别的事件;
                self.dir = join(data_dir, 'wav')
        else:       # note , 这里是自定义 的文件夹;
            self.dir = join(data_dir, input_dir)

    def __len__(self):
        return len(self.csv)

    def __getitem__(self, index):   #  这里获取的是音频, 和对应的label;
        entry = self.csv.iloc[index]
        wav_name = entry['wav_name']
        target = (entry[f'label_{
      
      self.task}1'], entry[f'label_{
      
      self.task}2'])
        if self.input_dir is None:
            wav, _ = torchaudio.load(join(self.dir, wav_name))
        else:
            wav = torch.load(join(self.dir, wav_name), map_location='cpu')
            # # normalize
            # wav = (wav-37.3)/(2.3*2)
        return wav, target
        
  
    

1.4 Project process

train.py(): It is the carrier of the entire project execution process;

The order is,

  1. Instantiation: training set and validation set;
  2. Model instantiation:
  3. Setting of loss function and evaluation index;
  4. Learnable parameters, optimizer and learning rate parameter configuration;
  5. Instantiate the training class,
  6. Scheduling the trian function in the training class, starts training;

2. Instantiation of DataLoader loader

The training set loader train_loader and the validation set loader valid_dataLoader are implemented by calling the following functions respectively;

data_loader = config.init_obj('data_loader', module_data)
valid_data_loader =  data_loader.split_validation()

## 2.0 Inheritance relationship between three classes;

RespDataLoader(BaseDataLoader) 继承自 BaseDataLoader(DataLoader),

BaseDataLoader(DataLoader) 继承自pytorchDataLoader()

2.1 class BaseDataLoader()

note: The following subclassesRespDataLoader(), when using the super().__init__() function, will redefine the current parent classBaseDataLoader()Initialize, note that when passing in the parameters in super().__init__(), Passed in the custom collate_fn() function

# location:  base/base_data_loader.py
from torch.utils.data import DataLoader

 # 根据 RespDataLoader 中传来的 dataset, 完成训练集 和测试集的划分;
class BaseDataLoader(DataLoader): 
	def __init__(self, dataset, bt, shuffle, validation_split, num_workers, collate_fn= default_collate)
    初始化,训练集测试集的分配比率;
    
    # 分别获取训练集, 验证集的下标索引;
    self.sampler, self.valid_sampler =  self._split_sampler(self.validation_split)
    
    # 注意到,这里的初始化参数通过子类RespDataLoader中, 重新传入参数赋值进来, 尤其关注到 collate_fn
    # 被重新赋值;
    self.init_kwargs = {
    
    
        'dataset': dataset,
        'batch_size':bt,
        'shuffle':shuffle,
        'collate_fn':collate_fn,
        'num_workers':num_workers,
    }
    
    def _split_sampler(self, split)# 将整体数据集,重新划分为训练集和测试集, 
        # 获取各自训练和验证集上,所对应的下标索引;
       
   def  split_validation(self):
       #  用于获取验证集的数据,通过 属性,下标索引, 
       #   传入 DataLoader() 
      return DataLoader(sampler = self.valid_sampler,  **self.init_kwargs)
	

2.2 class RespDataLoader()

# location: data_loader/data_loaders.py

def resp_classes(task, level):
    根据当前任务, 
    返回当前任务上每个类别所对应的标签;


from data.SPRSound import Datasets

class RespDataLoader(BaseDataLoader)def __init__(self, ...):
          初始化,当前任务上的类别标签属性;
          dataset = Datasets.RespDataset(data_dir, task= task, input_dir=input_dir)
          # 使用当前类中的属性重新初始化父类BaseDataLoader , 对父类中的 __init__() 函数重新初始化;
          super().__init__(dataset, bt, shuffle, validation_split, num_workers, collate_fn=self.collate_fn)
      
    
       def  collate_fn(self, batch):
           tensors, targets = [], []
           获取一个batch 中的 tensor,  以及对应的label;
          # 此处,需要搞清楚,这里的 tensor 到底对应的 特征级别的 tensor, 用于后续直接输入到网络模型中;
          # 还是这里tensor 依然代表的是音频数据的 tensor; 
           return  tensors, targets
                  

2.3 Instantiation of train_dataLoader:

data_loader = config.init_ob(data_loader, module_data), where data_loader in the parameter configuration refers to the specified classRespDataLoader in the Json configuration file , through the process of instantiating this class into an object, reinitializing its parent class one by one, and finally reinitializing the base class in pytorchDataLoader(), the process is as follows:

  • data_loader = config.init_ob(data_loader, module_data)

  • —>RespDataLoader(BaseDataLoader), calls two functions:

  1. Get the overall data set of the current task, dataset = Datasets.RespDataset() ;
  2. obtains the sample subscript index of the training set and test set by reinitializing its parent class; Specifically, super().__init__(dataset, bt, shuffle, validation_split, num_workers, collate_fn= self.collate_fn) reinitializes its parent class by passing in parameters< /span>BaseDataLoader(), enter the parent class for initialization,
  • —-> BaseDataLoader(DataLoader), the initialization process is divided into two steps:
  1. self.sampler, self.valid_sampler = self._split_sampler(self.validation_split)Generate the subscript indexes of the training set and the test set respectively.

  2. Reinitialize the corresponding parent classDataLoader() by passing in super().__init__(sampler= self.sampler, **self.init_kwargs) where**self.init_kwargs contains the previous subclass Incoming customcollate_fnmethod;

  3. In the previous step, the subscript index of the training set, self.sampler, and collate_fn函数 were passed to DataLoader() , thus obtaining the training set;

After the DataLoader() function, there is the collate_fn function

Batch processing function collate_fn

The batch processing function collate_fn is responsible for processing the samples in each sampled batch. The default collate_fn will perform the following operations:

  • Add a new dimension as the batch dimension;
  • Automatically convert NumPy arrays and Python numbers into PyTorch tensors;
  • Keep the original data structure, for example, if the input is a dictionary, it will output a dictionary containing the same keys, but replace the values ​​with batched tensors (if possible).

For example, if the sample is a 3-channel image and an integer class label, ie (image, class_index), then the default collate_fn will convert such a tuple list into a batched image tensor and a batched class label tensor. tuple.

We can also pass in the hand-written collate_fn function to perform custom processing on the data, such as the padding operation we introduced earlier.

Reference reading: https://transformers.run/intro/2021-12-14-transformers-note-3/#dataloaders

2.4 Instantiation of valid_dataLoader:

valid_data_loader =  data_loader.split_validation()

tuning BaseDataLoader()neutral BaseDataLoader().split_validation()function,

Inside this function, the subscript index of the test set is passed in, and the collate_fn() function is also passed in, through the **self.init_kwargs function;

Then get the data set by calling DataLoader() in pytorch, DataLoader(sampler = self.valid_sampler, **self.init_kwargs),

3. Load the model

model = config.init_obj('arch', module_arch)

Get the model architecture name in the Json configuration file through keywordsarch,

  1. And how many classification problems belong to the current task?

  2. The shape of the model input;

After , enter the initialization function of the currently called model through getattr(module, module_name)(*args, **module_args)  ,

class  ASTModel(nn.Module)
       def __init__():
        # 完成该模型的初始化;

3.1 light cnn

3.2 Pre-trained ResNet18,

3.3 Pre-trained AST Model

Pre-trained Audio Spectrogram Transformer model,

AST has demonstrated its performance on the audio classification task on AudioSet, an audio class dataset of 10 YouTube video clips [23].

In this project, it is expected that AST can learn better breath sound features for audio classification than image-based classifiers.

4. Setting of loss function and evaluation index

Set the loss function and evaluation indicators on the current task, which are also set in the Json file;

    "loss": {
    
    
        "type": "cross_entropy",
        "args": {
    
    
            "weight": [0.2, 0.5, 0.3]
        }
    },
    "metrics": [
        "accuracy", "specificity", "sensitivity_task2", "score_task2"
    ],
# 评价指标,包含4个方面, 精度, 特异度,  敏感度, 分数;
criterion = config.init_ftn('loss',  module_loss,  device=device)
metric =  [getattr(module_metric, met)  for met in config['metrics']]

5. Optimizer and learning rate configuration

Confirm the learnable parameters, build the optimizer, and learn the rate;

trainable_params = filter(lambda p: p.requires_grad, model.parameters() )

# optimizer 中配置好, 优化器,学习率,可学习参数等信息;
optimizer = config.init_obj('optimizer', torch.optim,  trainable_params)
lr_scheduler = config.init_obj('lr_scheduler', torch.optim.lr_sheduler, optimizer)

Similarly, by calling the parameters inconfig_, get the parameter information corresponding to the optimizer and learning rate;

    "optimizer": {
    
    
        "type": "Adam",
        "args":{
    
    
            "lr": 0.0001,
            "weight_decay": 0,
            "amsgrad": true
        }
    },
    
        "lr_scheduler": {
    
    
        "type": "StepLR",
        "args": {
    
    
            "step_size": 50,
            "gamma": 0.1
        }
    },

6. Instantiate the training class

Inheritance relationship of training class,

Trainer() inherits from the parent class BaseTrainer(), and BaseTrainer() is the original base class;

  • trainer = Trainer():Instantiate the training class by instantiating the class Trainer(),

    trainer = Trainer(传入模型,损失函数, 优化器, 训练集和测试集)

# 实例化,训练类;
trainer = Trainer(model, criterion, metrics, optimizer,
 				   config = config,  device = device,
 				   data_loader=data_loader, 
 				   valid_data_loader=valid_data_loader,
 				   lr_scheduler=lr_scheduler )
 				

6.1 class BaseTrainer()

# current location: base/base_trainer.py

from  logger import  TensorboardWriter

class BaseTrainer:
    def __init__():
        初始以下各类属性, 模型, 损失函数,  评价指标;
        优化器, epoch 数目; 
        监视器,用于监控模型的性能,保存住最佳模型,通过 min , val loss 来判断最佳;
        可视化实例;
    
    def _train_epoch():
       由子类, 重写进行覆盖; 由下面的 train() 函数调用
    
    def train():
        train该函数, 在实例化子类Trainer()后,被调用,
        作为训练函数的调用接口函数;
        
        并且其自身,调用上面的 _train_epoch()函数;
        
        监听模型性能: 根据指标的变化, 保存当前模型的权重文件;
     
    	调用下面的_save_checkpoiont()保存当前模型的训练过程;
    
    def _save_checkpoint():
        保存模型的训练信息,
        包含模型的参数权重, 状态字典; 当前epoch 数目, 优化器参数;
    
    
    def _resume_checkpoint()
       从保存的训练信息中, 加载模型,继续训练;
        
    

6.2 class Trainer()

Trainer()Inherited from parent classBaseTrainer()

# current location:  trainer/trainer.py

from base import BaseTrainer 

class Trainer(BaseTrainer):
    def __init__():  
        该初始化函数中, 
        设置属性,用来 传入训练集, 验证集; 模型;
        传入当前任务上的评价指标;
        # 传入参数, 重新初始化其父类 BaseTrainer 中的初始化函数;
    	super().__init__(model, criterion, metric_ftns, optimizer, config)  
    
    
    def _train_epoch(): 该函数,重写了父类中 _trian_epoch()中的方法;
        是网络训练的主体部分, 整个训练过程,在这个函数中体现出来;
        并将当前epoch  上训练得到的,结果保存在log 中;
        
        for bt_idx, (data, target) in enumerate(self.data_loader):
            ...
        
        
    def _valid_epoch()
        用于每个epoch 训练结束时, 在_train_epoch() 函数中被调用,得到当前epoch 上的验证精度;
    
    def _progress():
        当前epoch 时, 每个batch 达到 self.log_step() 进行打印输出信息, 在_train_epoch() 函数中被调用;
    
    def _createConfusionMatrix():
        构建了混淆矩阵,  并且以热力图的形式保存,
        当前未找到,调用关系;
    
        

6.3 Training process

Training process, Section 7 below expands on the training process.

trainer.train()

SinceTrainer(BaseTrainer) Trainer inherits from BaseTrainer, so the train() function in trainer.train() comes from the parent class function;

So trainer.train() actually calls the function in BaseTrainer.train();train()

Calling process:

  1. trainer. train() –> BaseTrainer.train()

  2. BaseTrainer.train() The train() function calls –> self._train_epoch(), which is rewritten and implemented in the subclass Trainer(); () a>

  3. _train_epoch()Called in —> self.data_loader (), and the data loading process of each batch in data_loader,

7. Training process

7.1 Overview of training process

The training process is analyzed according to the following steps:

  1. During the training process, the data acquisition process
  2. Reset the gradient corresponding to the parameters in the optimizer to zero;
  3. The data is input into the model for inference and the predicted value is obtained;
  4. ​ Input the predicted value and label into the loss function to calculate the loss;
  5. Start backpropagating the loss,
  6. Update gradients in optimizer
  7. Update the performance parameters in the customized evaluation indicators;
  8. Record the above performance information during training to tensorboard and logger;
  9. After the training of the previous epoch is completed, start a verification on the verification set and call the verification function;
  10. Print information and save weights;

self.data_loader is called every time a batch of data is fetched, and the custom function in the RespDataLoader().collate_fn() class will eventually be called,

This function is used to package the retrieved audio file and the corresponding tag into a tensor data of batch and return it.

Training set and test setdata_loder, valid_data_loader are instantiated objects from the same class(RespDataLoader), Therefore, here we only take the analysisdata_loader as an example,

for idx, (data, target) in enumerate(self.data_loader):
    data, target =  data.to(self.device),  target.to(self.device),
    

The process of retrieving data is executed first. The  magic function in DataLoader() is executed; __iter__()

Then, call the functions one by one until the  __getitem__() method in the Dataset() subclass is called to retrieve the data;


#  当对 data_loader  使用 enumerate() 函数时,
# 1. 将自动调用 DataLoader 类中的 迭代器函数 __iter__(self), 
# 该函数返回的是一个可迭代对象;

# We quote '_BaseDataLoaderIter' since it isn't defined yet and the definition can't be moved up
# since '_BaseDataLoaderIter' references 'DataLoader'.
def __iter__(self) -> '_BaseDataLoaderIter':
    # When using a single worker the returned iterator should be
    # created everytime to avoid reseting its state
    # However, in the case of a multiple workers iterator
    # the iterator is only created once in the lifetime of the
    # DataLoader object so that workers can be reused
    if self.persistent_workers and self.num_workers > 0:
        if self._iterator is None:
            self._iterator = self._get_iterator()
            else:
                self._iterator._reset(self)
                return self._iterator
            else:
                return self._get_iterator()

self._get_iterator(): Depending on whether to use multiple processes, choose to call the single-process data loader or select the multi-process data loader;

    def _get_iterator(self) -> '_BaseDataLoaderIter':
        if self.num_workers == 0:
            return _SingleProcessDataLoaderIter(self)
        else:
            self.check_worker_number_rationality()
            return _MultiProcessingDataLoaderIter(self)

7.2 Training - process of obtaining data:

data_loader The training set is an instantiation object of RespDataLoader, by inheriting the parent class BaseDataLoader(), DataLoader()

When each batch of data is taken out from self.data_loader, the following call event occurs,

  1. Call--> Magic function in private class _BaseDataLoaderIter(object).__next__(): Continue to call in this function

    – > self._next_data()

The above means that the __next__() magic function is called self._next_data(),

_BaseDataLoaderIter(object)In its own class, the _next_data() private method is not implemented,

is implemented in its subclass_SingleProcessDataLoaderIter(_BaseDataLoaderIter)._next_data(), so the method in its subclass is called.

Therefore, the actual calling relationship here is:

—> _BaseDataLoaderIter(object).__next__():

––> Methods in private single-threaded classes _SingleProcessDataLoaderIter(_BaseDataLoaderIter)._next_data()

# location:  `torch.utils.data.dataloader.py`中,

class _SingleProcessDataLoaderIter(_BaseDataLoaderIter):
    def __init__(self, loader):
        super(_SingleProcessDataLoaderIter, self).__init__(loader)
        assert self._timeout == 0
        assert self._num_workers == 0

        self._dataset_fetcher = _DatasetKind.create_fetcher(
            self._dataset_kind, self._dataset, self._auto_collation, self._collate_fn, self._drop_last)

    def _next_data(self):
        index = self._next_index()  # may raise StopIteration
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
        if self._pin_memory:
            data = _utils.pin_memory.pin_memory(data)
        return data

  1. 1 And _SingleProcessDataLoaderIter(_BaseDataLoaderIter)._next_data() this method calls the following function during the implementation process:

    —> self._next_index(), it is not implemented in the current subclass. Use this method in the parent class(_BaseDataLoaderIter) through inheritance,

    And the self._next_index() method in the parent class will continue to call the following method,

    ​ –> return next(self._sampler_iter), continue calling

    –>  torch.utils.data.sampler.py medium class BatchSampler.__iter__(), This function implements taking out the corresponding subscript index of a batch of data.

    2.2 self._next_index(), After the call is completed, the subscript index of a batch data is obtained,

    ​ then continue to call self._dataset_fetcher.fetch(index),

    —-> The implementation of this function is to call the _MapDatasetFetcher(_BaseDatasetFetcher).fetch()method

    # location: torch.utils.data._utils.fetch.py 中
    
    class _MapDatasetFetcher(_BaseDatasetFetcher):
        def __init__(self, dataset, auto_collation, collate_fn, drop_last):
            super(_MapDatasetFetcher, self).__init__(dataset, auto_collation, collate_fn, drop_last)
    
        def fetch(self, possibly_batched_index):
            if self.auto_collation:  
                # 注意到, 这里通过self.dataset 该属性,获取了该下标所对应的数据;
                data = [self.dataset[idx] for idx in possibly_batched_index]
            else:
                data = self.dataset[possibly_batched_index]
            return self.collate_fn(data)
    
    

    Note the above fetch() This method uses the self.dataset attribute to find the data corresponding to the current subscript,

    obtains through index, and the following call relationship event occurs: data

    ​ —> fetch(index) –>data = self.dataset[index]

    —> At this time, it will return to Dataset().__getitem__(),

    The__getitem() method is usually implemented in a subclass, here it is RespDataset(Dataset),

    At this point, through the current subscript indexindex, obtaindata. Note that data here refers to On the data set, the corresponding audio data and labels;

    Here you need to go through the data preprocessing part,process.py to confirm whether it is the feature level or the audio level

    Note that if the audio file obtained here is generated in a customized way self.input_dir, the audio here may be feature-level data;

    For example, the input input_dir= processed_ast_wav2vec is customized audio data, which represents the characteristics. Here, wav= (768, 128),

class RespDataset(Dataset):
    def __init__():
        读入当前任务task 所对应的 .csv 文件,csv 文件,包含了音频以及对应的标签信息;
        读入音频文件,  根据传入的音频文件夹的位置;
    
    def __len__():
        返回csv 文件的长度,即当前任务上音频的总个数, 包括训练集和验证集;

    def __getitem__(self, index):   #  这里获取的是音频, 和对应的label;
        entry = self.csv.iloc[index]
        wav_name = entry['wav_name']
        target = (entry[f'label_{
      
      self.task}1'], entry[f'label_{
      
      self.task}2'])
        if self.input_dir is None:
            wav, _ = torchaudio.load(join(self.dir, wav_name))
        else:
            wav = torch.load(join(self.dir, wav_name), map_location='cpu')
            # # normalize
            # wav = (wav-37.3)/(2.3*2)
        return wav, target
       

2.3 At the end of the journey,  data = self.dataset(index) –>self.dataset.__getitem(index) after,

continues to execute the last method in the class _MapDatasetFetcher(_BaseDatasetFetcher) , return self.collate_fn(data);

7.3 Transmission process of collate_fn()

2.4 Andcollate_fn() What kind of transfer process does this function go through? First, the method is defined in RespDataLoader(BaseDataLoader).collate_fn() ,

After calling inDataLoader, continue to call the private function in its own class function, in this function Continue calling to__iter()_get_iterator()_SingleProcessDataLoaderIter()

After collate_fn(), it is passed in the following classes:

_SingleProcessDataLoaderIter() —> _DatasetKind —> _MapDatasetFetcher

​ Finally, we came to the method originally set in RespDataLoader().collate_fn()  . The function of this method is to package the acquired data and labels into a batch of data.

Then return, the return process is a stack popping process:

First turn –> _SingleProcessDataLoaderIter()._next_data() Medium data= self._dataset_fetcher.fetch(index) ;

​ –> _BaseDataLoaderIter.__next__() in this magic function data = self._next_data()

​ —> Back to the training process  for batch_idx, (data, target) in enumerate(self.data_loader):

At this point, during the training process, the analysis of the extraction process of training set data is completed;

class RespDataLoader(BaseDataLoader):

    def __init__(self, data_dir, batch_size, shuffle=True, validation_split=0.0, num_workers=1, training=True, task=1, level=1, input_dir='processed'):
        self.CLASSES = resp_classes(task, level)
        self.CLASS2INT = {
    
    label:i for (i, label) in enumerate(self.CLASSES)}
        self.LEVEL = level
        # note,  dataset 获取训练集和 测试集;
        dataset = Datasets.RespDataset(data_dir, task=task, input_dir=input_dir)
        super().__init__(dataset, batch_size, shuffle, validation_split, num_workers, collate_fn=self.collate_fn)
    # 这里根据预处理,获取用于输入的 训练样本 和 标签;
    def collate_fn(self, batch):
        tensors, targets = [], []

        # Gather in lists, and encode labels as indices
        for wave, label in batch:
            label = label[self.LEVEL-1]  # 根据级别,获取当前的label 标签;
            tensors += [wave]
            targets += [torch.LongTensor([self.CLASS2INT[label]])]
        # Group the list of tensors into a batched tensor
        tensors = torch.stack(tensors)
        targets = torch.stack(targets)
        targets.squeeze_(1)
        return tensors, targets

During the training process, each time from the training set (self.data_loader) or verification set (self.valid_data_loader)

When takes out a batch of data, the RespDataLoader().collate_fn() function will be executed to return a batch of data.

8. DataLoader and_BaseDataLoaderIter()

When creating an instantiated object DataLoader() , it is actually iterating the data set through _BaseDataLoaderIter ,

This design method is to separate the data set and the process of iterating data.

DataLoader(): Used to manage the dataset, and prepare the settings required before iterating the data;

_BaseDataLoaderIter: It is the execution, the actual iterative process, including obtaining data from the thread;

This way of separating the data set itself from the method of iterating the data,

You can customize a subclass by inheriting the class_BaseDataLoaderIter, and rewrite the data iteration method in the subclass to have more control over the data iteration process.

8.1 DataLoader

When calling the magic function inDataLoader() , the magic function actually returns one ,__iter()_BaseDataLoaderIter

    # We quote '_BaseDataLoaderIter' since it isn't defined yet and the definition can't be moved up
    # since '_BaseDataLoaderIter' references 'DataLoader'.
    def __iter__(self) -> '_BaseDataLoaderIter':
        # When using a single worker the returned iterator should be
        # created everytime to avoid reseting its state
        # However, in the case of a multiple workers iterator
        # the iterator is only created once in the lifetime of the
        # DataLoader object so that workers can be reused
        if self.persistent_workers and self.num_workers > 0:
            if self._iterator is None:
                self._iterator = self._get_iterator()
            else:
                self._iterator._reset(self)
            return self._iterator
        else:
            return self._get_iterator()

__iter()  Continue to call the private function in its own class. _get_iterator() function. You can see that depending on whether multi-threading is enabled at this time,

will return different thread iteration methods of the data set, num_worker==0, then the (single-process) main process will be used to complete the data iteration,

Whether it is single process_SingleProcessDataLoaderIter(_BaseDataLoaderIter) or multiple processes, they all inherit the same parent class_BaseDataLoaderIter,

    def _get_iterator(self) -> '_BaseDataLoaderIter':
        if self.num_workers == 0:
            return _SingleProcessDataLoaderIter(self)
        else:
            self.check_worker_number_rationality()
            return _MultiProcessingDataLoaderIter(self)

8.2 _BaseDataLoaderIter

You can see that these two classes inherit from_BaseDataLoaderIter,

_SingleProcessDataLoaderIter(_BaseDataLoaderIter)
_MultiProcessingDataLoaderIter(_BaseDataLoaderIter)

8.3 _SingleProcessDataLoaderIter()

# location:  torch.utils.data.dataloader.py

class _SingleProcessDataLoaderIter(_BaseDataLoaderIter):
    def __init__(self, loader):
        super(_SingleProcessDataLoaderIter, self).__init__(loader)
        assert self._timeout == 0
        assert self._num_workers == 0

        self._dataset_fetcher = _DatasetKind.create_fetcher(
            self._dataset_kind, self._dataset, self._auto_collation, self._collate_fn, self._drop_last)

    def _next_data(self):
        index = self._next_index()  # may raise StopIteration
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
        if self._pin_memory:
            data = _utils.pin_memory.pin_memory(data)
        return data

You can see that during the execution of data = self._dataset_fetcher.fetch(index)  , the method in the private class _DatasetKind was called; < /span>create_fetcher

# location:  torch.utils.data.dataloader.py
class _DatasetKind(object):
    Map = 0
    Iterable = 1

    @staticmethod
    def create_fetcher(kind, dataset, auto_collation, collate_fn, drop_last):
        if kind == _DatasetKind.Map:
            return _utils.fetch._MapDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)
        else:
            return _utils.fetch._IterableDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)

create_fetchermethod, continue to call the private class, _MapDatasetFetcher()

#location: torch.utils.data._utils.fetch.py

class _MapDatasetFetcher(_BaseDatasetFetcher):
    def __init__(self, dataset, auto_collation, collate_fn, drop_last):
        super(_MapDatasetFetcher, self).__init__(dataset, auto_collation, collate_fn, drop_last)

    def fetch(self, possibly_batched_index):
        if self.auto_collation:
            data = [self.dataset[idx] for idx in possibly_batched_index]
        else:
            data = self.dataset[possibly_batched_index]
        return self.collate_fn(data)

Yes, you can see that it starts from_SingleProcessDataLoaderIter() ,

collate_fnThis method has been passed over, and the following process is passed in the following classes:

_SingleProcessDataLoaderIter() —> _DatasetKind —> _MapDatasetFetcher

9. Data preprocessing

Data preprocessing is actually the beginning of the entire project. Since it will be lengthy, it will be analyzed here;

task1, event level classification, event level:

Training set: 6656 audio events

Test set: Corresponds to 2433 audio events;

task2, classification of recording levels, record level,

Training set: Contains 1949 recordings, (note that it will be reduced to 1772 recordings through subsequent filtering task2;)

Test set: 734 recordings,

It should be noted that in different preprocessing functions, audio with different audio lengths is not unified to the same audio length;

They all go through the same function, and then reshape to make all feature shapes the same.

preprocess.pyData preprocessing, used to combine 6656 audio events at the clip event level with 1949 recordings at the wav recording level.

That is, 6656 audio events at the event level + 1949 recordings at the recording level = 8605 audios;

The training set is all event-level audio + recording-level audio;

After passing through the preprocessing function (calling different 9.1-9.5 preprocessing functions), it is stored in the same folder preprocessed_file.

After , when configuring data_loader in task_config.json, the input_dir in the option is the preprocessed_file file generated above.

if __name__ == '__main__':
    REC_DIR = "wav"
    CLIP_DIR = "clip"
    # PROC_DIR = "processed_wav2vec"
    PROC_DIR = "processed_ast"

    if not exists(PROC_DIR):
        makedirs(PROC_DIR)

    for dir in (REC_DIR, CLIP_DIR):
        print(f" \n Processing waves in {
      
      dir}/ folder")
        for wav_name in tqdm(listdir(dir)):
            wav, fr = load(join(dir, wav_name))
            # 如果,输入到预处理函数中,不需要经过AST model, 则需要将下行注释,用于将tensor 转化成 numpy;
            wav = wav.squeeze().cpu().detach().numpy()
            processed = preprocess(wav,fr)
            torch.save(processed, join(PROC_DIR, wav_name))

tips:

  1. If you use task2-1 as an example, you need to confirm that process is calling a function when running process.py.preprocess_ast_wav2vec(wav, fr)

    According to the above different tasks, the preprocess() function will call different preprocessing functions,  processed_wav2vec() or processed_ast_wav2vec(), or one of the following five different ones One of the preprocessing functions;

9.1 preprocess_stft

for task 1-1:

processed_ast_wav2vec Preprocessing function,

The extracted feature vector representation dimension is (1, 224, 224),

After collate_fn, output (bt, 1, 224, 224),

Input into light cnn;

9.2 preprocess_wavelet

processed_ast_wav2vec Preprocessing function,

The extracted feature vector representation dimension is (3, 224, 224),

After collate_fn, output (bt, 3, 224, 224),

9.3 preprocess_ast

processed_astpreprocessing function,

The extracted feature vector representation dimension is (256, 128), and the number of frames is unified to the same length through reshape. 128 represents the number of n_filters;

After collate_fn, output (bt, 256, 128),

9.4 processed_ast_wav2vec

wav2vec2 is a speech coding representation vector trained on 960 hours of audio; in the experiment, the pre-trained weights of the AST Model were used,

After inputting the audio, extract the output of the last layer of the AST network model to represent the encoding vector of this audio;

processed_ast_wav2vec Preprocessing function,

The extracted feature vector representation dimension is (768, 128)

After collate_fn(), output (BT, 768, 128);

After , enter into AST Model ;

9.5 processed_wav2vec

for task 1-1:

When using:processed_wav2vec preprocessing function,

The extracted feature vector representation dimension is (1, 224, 224),

At this time, the original Dataset() .getitem() takes out this item.

After collate_fn, output (bt, 1, 224, 224),

Input into light cnn;

Pay attention to the configuration parameters inconfig_task 中, 需要根据 arch`, such as

arch: parameter

    "arch": {
    
    
        "type": "ASTModel", #  规定了网络模型架构;
        "args": {
    
    
            "label_dim":3,    #  输出的几分类;
            "input_fdim":128,  #  规定了网络模型 输入的尺寸;
            "input_tdim":768,
            "audioset_pretrain": true
           
        }
    },
    "data_loader": {
    
    
        "type": "RespDataLoader",  # 规定了数据加载器;
        "args":{
    
    
            "data_dir": "data/SPRSound/",
            "batch_size": 16,
            "shuffle": true,
            "validation_split": 0.2,
            "num_workers": 2,
            "task":2,
            "level":1,
            "input_dir":"processed_ast_wav2vec"
        }
    },

Guess you like

Origin blog.csdn.net/chumingqian/article/details/134125034