[Special Express] Application of Audio Generation, TTS and AIGC in Music

  //  

What has the development of AIGC brought to audio? How does AIGC empower music creation? How to identify fake audio? In which scenarios can TTS solve specific problems? On July 29th, LiveVideoStackCon2023 Shanghai Station Audio New Experience Session will answer your questions.

New audio experience

With the continuous updating of multimedia and communication network technologies, and the continuous emergence of new audio and video application scenarios, audio processing technology is developing towards a more intelligent and immersive trend. People's requirements for audio listening experience are also gradually increasing, and the sound experience in various scenes is clearer and presents an immersive sense of sound.

Topic introduction

TOPIC1 "AIGC Technology Exploration and Application Innovation"

Jiangyuan University of Science and Technology Xunfei Chief Scientist of Xunfei Music

Focusing on the development of key technologies such as Metaverse, AIGC, and large-scale models in recent years, share iFLYTEK’s technical layout and exploration research in the three major fields of audio, vision, and cognition in terms of AIGC technology, as well as innovative application cases in related industries . Looking forward to discussing with the industry to use the power of artificial intelligence to build a better digital world.

Speech outline: 1. The current development of the AIGC field; 2. The progress and application cases of iFLYTEK in the field of audio generation creation; 3. The progress and application cases of iFLYTEK in the field of visual generation creation; Know the progress and layout of intelligent large model field.

TOPIC2 "Exploration of Music Streaming Media Platform in Music AIGC"

Deng Yang Netease Cloud Music Audio and Video Lab Senior Audio Algorithm Engineer

As the cost of music production equipment continues to decrease, the music industry is undergoing an unprecedented transformation. In order to maintain youthful vitality in this new era of music, we are determined to create a new music creation engine, using the most advanced AI technology to empower the music creation and experience process. Our vision is to transform music from a static medium into a real-time interactive and perceivable element, allowing users to enjoy a richer music creation experience during the creation process. However, complex technical barriers and research and development thresholds make us face many challenges.

This sharing will focus on the exploration of music AIGC by music streaming platforms, and discuss in depth the architecture design of Tianyin TY-AIGC content production engine . In the first part, we will start with the international mainstream music production solutions and their technical difficulties, and introduce in detail how to combine the cloud music ecology with AIGC to design a highly available music production solution. In the second part, we will dig deep into the key technical lines and technical points of AIGC, including technical details of architecture design, algorithm optimization, audio processing and data management. Finally, in the third part, we will share the specific products and business results of TY-AIGC technology implementation, and contribute technical results to the AI ​​digital upgrade of the music industry.

TOPIC3 "Audio character feature generation and identification development practice"

Wen Zhengqi General Manager of Zhongke Extreme Element

Audio character feature generation and identification is a technology for identifying imitated specific human voices, and plays a key role in the fields of security, network information, public security, and communications. The fake audio generated using deep imitation technology is already very realistic, and the related open source code has increased by 217% year-on-year in the past four years. The threshold for generating fake audio has been greatly reduced, making it very easy to mix false audio on the Internet and telecommunications networks. False audio has brought great harm to national security, social stability, and property safety. The identification and defense of false audio has become one of the hot issues that governments, enterprises and even individuals around the world are concerned about.

This sharing will be divided into three parts. The first part introduces the research progress related to the generation of audio character features. The second part introduces the work on audio character feature identification. challenges and our responses. Through the above three parts, we will systematically introduce the team's work progress in the field of audio character feature generation and identification.

TOPIC4 "When "AI" Meets "Love" - ​​How Artificial Intelligence Emotional Technology Empowers Himalayan Creation Ecology"

Lin Yiting, Head of Data R&D, Himalaya Lab, Everest

The audio industry has always had the problem of long creative cycles and high investment costs. In order to solve this problem, Himalaya adopts TTS (Speech Synthesis) technology, empowers creators with AI, and better creates a creator ecosystem. This speech will introduce how Himalaya uses TTS to perform cross-language synthesis and emotion processing for specific scenarios, as well as the existing difficulties and pain points.


b866a04e394ca0556d8cbd8f6508b136.png

Scan the QR code in the picture or click " Read the original text " 

buy tickets now

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/131587677