Lhotse audio library manages audio data sets

The original text is here,the original author

Generative AI by Feiteng

Lhotse is a Python library designed to make speech and audio data preparation more flexible and accessible, and together with k2, forms part of the next generation Kaldi speech processing library.

main target:

1. Python-centric design engages the broader community in speech processing tasks.

2. Provide an expressive command line interface for experienced Kaldi users.

3. Provide standard data preparation solutions for commonly used corpora.

4. Provides PyTorch dataset classes for speech and audio related tasks.

5. Implement flexible data preparation in model training through the concept of audio clips.

6. Improve efficiency, especially in terms of I/O bandwidth and storage capacity.

Using Lhotse to structurally abstract, store and convert data sets into PyTorch data pipelines can easily implement speech recognition and speech synthesis engineering projects.

Whether it is a large or small audio file, you can use cut to effectively express:

Lhotse supports nearly a hundred data sets out of the box. New data sets can be completed by referring to these examples.

It is also very convenient to operate data sets

Easily integrates with PyTorch

Lhotse's scalability

In addition to text and voice information, Lhotse can also customize a lot of information: forced alignment, duration, pitch, etc., which can easily support a variety of voice tasks.

For the storage of feature extraction, Lhotse's writing efficiency will gradually slow down with the file size. If necessary, CutSet.split needs to be executed into multiple JOBs to improve efficiency.

In addition, although Lhotse provides command line tools, there is a lack of web tools to analyze datasets and sample data.

Projects that depend on Lhotse

  • https://github.com/k2-fsa/icefall

  • https://github.com/lifeiteng/vall-e

References:

  • https://lhotse.readthedocs.io/en/latest/index.html

  • Slides for the Interspeech 2023 tutorial

  • https://github.com/k2-fsa/icefall/issues/1230

Guess you like

Origin blog.csdn.net/chumingqian/article/details/134561816