Tsinghua University teamed up with ByteDance to open source auditory large language model SALMONN

The Department of Electronic Engineering of Tsinghua University and ByteDance's Volcanic Voice team have joined hands to launch a new open source large language model SALMONN.

According to reports, SALMONN supports speech, audio and music input, it can perceive and understand different types of audio content input, and has functions such as multilingual speech recognition and translation, and speech reasoning.

It is reported that SALMONN has better versatility than traditional speech recognition, audio subtitle generation and other speech and audio processing tasks, and can accurately follow the user's instructions.

In general, SALMONN is currently capable of English speech recognition, English-to-Chinese speech translation, emotion recognition, audio subtitle generation, music description and other important speech and audio tasks. Multilingual and cross-modal capabilities, covering non-English speech recognition, speech translation from English to other languages ​​(other than Chinese), summarization and keyword extraction of speech content, audio-based story generation, audio question answering, speech and audio joint reasoning tasks.

Guess you like

Origin www.oschina.net/news/254874