[Teach you how to make live broadcast products] Online karaoke software development technology selection

Summary

There are many technical difficulties in the development of online karaoke software. In addition to technical difficulties such as audio recording and processing, real-time audio transmission and synchronization, audio compression and decompression, and device compatibility issues, developers should also pay attention to music copyright issues. Ensure that the developed applications are compliant and legal.

foreword

I have written several articles about live broadcast SDK technology selection , mainly from multiple perspectives in different instant messaging scenarios such as RTC real-time audio and video, ultra-low latency live broadcast, and CDN live broadcast. Many students expressed their interest in the pan-entertainment industry. I am very interested in the live broadcast technology part of the website, and hope to have some more in-depth analysis and introduction.

Comparison of mainstream third-party live broadcast SDKs (Tencent Cloud, Instant, Alibaba Cloud, Shengwang, NetEase Yunxin, Wangsu)

I used ChatGPT for live broadcast technology selection, and my colleagues were killed

arrange! Today, we will discuss the technology selection of the online karaoke scene in the pan-entertainment industry . This article will analyze the core technical indicators of the online karaoke function and the functional gameplay in this scenario. Help developers make correct technical selections for karaoke software development in different scenarios.

The relationship between online karaoke software development and live broadcast technology

Online karaoke is a new type of interactive gameplay in the social entertainment scene. Through music, people can connect with each other, making communication easier, and effectively increasing the length of stay of platform users. A variety of karaoke playing methods can make the application more interesting and attract more users. In addition, karaoke gameplay can also be applied to a variety of social scenarios, such as language chat rooms, dating platforms, and live broadcast rooms.

The online karaoke function is mainly composed of the following parts:

1. Audio recording and processing technology: enable users to record their own audio, and perform noise reduction, echo removal, reverberation and other processing on the audio to improve the sound quality.

2. Real-time streaming media technology: The audio recorded by the user is transmitted to the server for processing and storage in real time, realizing the real-time chorus function.

3. Audio synthesis technology: Synthesize the audio recorded by multiple users in real time to realize the real-time chorus function.

4. Music data processing technology: through the matching of lyrics and audio, the lyrics are displayed synchronously.

5. Cloud computing technology: use cloud servers for audio processing and storage, and improve the stability and scalability of online karaoke real-time chorus.

How does the Zhige K song solution improve developer development efficiency

The online karaoke function is a complex system involving various technologies, which requires the support of audio, video, network, artificial intelligence and other technologies. Among them, the main technical difficulties in realizing online karaoke include the following aspects:

1. Audio processing technology: Online karaoke needs to realize real-time collection, noise reduction, reverberation , pitch shifting and other processing of users' voices to ensure the sound quality.

2. Video processing technology: Online karaoke not only needs to process the user's voice, but also needs to process the user's video, including real-time shooting, beautification, special effects , etc., to enhance the user's experience.

3. Network transmission technology: Online karaoke needs to realize real-time transmission of audio and video, so low-latency, high-bandwidth, and high-reliability network transmission technology is required.

4. Artificial intelligence technology: In order to improve the user's karaoke experience, online karaoke needs to realize real-time ratings and suggestions for users' singing , which needs to be realized by using artificial intelligence technology.

5. Security technology: Online karaoke requires users to provide audio, video and other personal information, so it is necessary to adopt security technology to protect user privacy and information security.

In order to realize the online KTV function, the team needs to have corresponding technical capabilities and experience and invest a lot of time and labor costs. At the same time, it needs to have technical capabilities in audio and video processing, network transmission, security encryption, etc. In-depth research and optimization of user experience.

To sum up, the author recommends the use of a third-party live broadcast SDK manufacturer. As far as I know, it is an online KTV solution ([click here], which supports solo, round singing, chorus, offline OMO karaoke and other playing methods, helping developers to speed up Build an online karaoke room with massive copyright music.

The one-stop online KTV solution can simplify the workload of developers, and the main tasks that can help developers complete include:

  • Provide live broadcast API and efficient audio and video processing functions, developers can quickly integrate live broadcast functions and realize audio recording, synthesis, mixing and other processing without building a server.
  • Provides real-time streaming media technology, which transmits the audio recorded by users to the server for processing and storage in real time, and realizes the real-time chorus function
  • Rich templates and components: Provides ready-made templates and components, including UI components, online karaoke scene templates, etc., to quickly build the interface and scenes of online karaoke applications
  • Provide cloud service technology: use cloud servers for audio processing and storage, and improve the stability and scalability of the real-time chorus scene of online karaoke applications.

The use of third-party services can save development costs and development cycles. Mature service providers on the market have professional technical capabilities and experience, and can provide stable technical support and services. At the same time, it can also provide more comprehensive and perfect solutions and user playing methods, such as real-time chorus, singing and other novel playing methods.

Music + real-time scene gameplay that is used by top players in the industry

The author's research found that products in the pan-entertainment social industry have been facing challenges in terms of user retention and commercial realization, and leading players in the industry have begun to try to add online karaoke gameplay to their applications. Because karaoke itself has high user stickiness, it can help users retain better, and can quickly accumulate user UGC content. At the same time, online karaoke can also expand commercial monetization channels, such as realizing profits through payment, senior membership, and virtual gifts. Therefore, online karaoke has become an increasingly popular pan-entertainment social product.

type typical product Case introduction
singing room Douyin KTV The largest song room product on the market, in the past two years, it has focused on activity + revenue, adding the classic gameplay of online karaoke: mic row, kicking mic and keeping mic.
singing room National K song room In recent years, Tencent Music has focused on retention and activity, adding the classic gameplay of online karaoke: rowing, kicking, and keeping mic. The retention and activity are extremely high, with millions of daily active users.
singing room NetEase Cloud Party Music software is a social party, with singing, self-study room and other playing methods
social products Soul-KTV Soul's group chat party mode, online KTV real-time chorus mode supplements real-time interactive gameplay. After going online, the retention rate and wheat acceptance rate have increased significantly.
social products Blued The KTV chorus function is added to the real-time chat room and 1V1 room, which improves the retention and duration of users in real-time scenes, and the monetization has been greatly improved.
live streaming Sichuan peppercorn Add music to the live broadcast scene as BGM or K song live broadcast
live streaming Inke Add music to the live broadcast scene as BGM or K song live broadcast
play with TT voice Locating game tools, adding KTV real-time chorus to the game matching language chat room, which obviously drives revenue and retention
play with Comparing Same as TT voice, introduce KTV capability to achieve double growth of retention and revenue

Function Description:

Mic queuing: users join the queuing list of songs, waiting to sing.

Kick Mic: Removes the user from the queuing list, giving other users a chance to sing.

Baomai: Reserve the position of the queuing list for those who leave or cannot sing, without re-queuing.

What factors need to be considered in the selection of live broadcast technology for online karaoke applications

In general, the following four factors need to be considered when choosing a live broadcast technology solution. Among them, the two important factors of solution scalability and core technology performance are carefully explained:

  1. Support multi-platform development
  2. Solution scalability
  3. Core technical performance
  4. Typical customer case

First of all, the solution needs to support multi-platform development so that the application can run on different devices and operating systems to meet the needs of more users. Secondly, the integrity of the solution is also very important, including audio and video collection, codec, transmission, playback and other links, there should be a complete solution. In addition, the performance of technical indicators is also an important consideration, such as delay, bit rate, image quality and other indicators need to reach the level acceptable to users. Finally, mature commercialization cases can also help us better evaluate the feasibility and practicability of the scheme, and we can refer to the experience and cases of other companies to make more informed choices for our own schemes.

1. Whether to support multi-platform development

Choose a live broadcast technology that supports multiple devices and platforms. It has the advantages of unified API interface, code reuse, automatic construction, and cross-platform debugging, which can improve development efficiency, cover a wider user group, and improve application utilization and user satisfaction. Spend.

2. How scalable is the solution?

When selecting an online KTV model, the scalability of the solution is a very important consideration. It can help us choose a solution that can meet changing user needs and market competition. Scalability means that the solution can be upgraded and expanded by adding new functions and modules in the future to adapt to changing user needs and market competition. The scalability of the solution mainly includes the following three aspects:

  1. Rich game modes
  2. Product function ecology
  3. Third-party expansion capabilities

1. Rich game modes:

In the actual application process, the needs of users are often diversified. Therefore, choosing a solution with rich gameplay modes can meet the needs of more users and increase the utilization rate of the application. For example, an online KTV application can provide multiple game modes including karaoke, live broadcast, and PK to attract more users.

model the case how to play features
solo National karaoke The audience waited for the song after the mic, and the solo began after the song started playing. A single-person singing mode, often used in language-centric show live broadcast scenes.
sing along TT voice After the audience goes to the microphone, they order and wait for the song. After the song starts to play, they will sing along with the lead singer A multiplayer singing mode that improves user participation and interaction rate. Often used in voice chat scenarios.
Challenge PK National K song, TT voice The PK challenge is carried out in units of rooms/hosts, and the winner is determined through singing competitions. It can increase the interaction between users and improve user stickiness. The main revenue source of the karaoke scene.
1V1 singing Comparing The user chooses the song he likes and initiates a 1V1 singing invitation, and the system automatically matches the accompanying singer to sing together 1V1 scene payment companion mode, high revenue and medium ARPU
Pay-to-play National karaoke Users pay to order, and the anchor sings for them. Friend hall + KTV gameplay, high revenue and high ARPU
real time chorus TT voice After the audience goes to the mic, order the song and wait for the mic, and sing the song together with the lead singer The current new gameplay in the karaoke scene, and top applications in the industry have all joined this gameplay. Under this gameplay, user activity and revenue data perform well.

The real-time chorus solution has become one of the very popular features in the online karaoke scene because it allows users to share music with more people. By realizing low-latency, high-quality audio and video transmission and multi-person collaborative singing, it can meet the diversified needs of users and increase the activity and revenue of the platform.

After investigation, it was found that in recent years, mainstream audio and video manufacturers have launched real-time chorus solutions. The chorus solution generally adopted in the industry is "serial chorus". Under this scheme, the experience of the lead singer is missing, which is essentially a kind of "pseudo-real-time chorus".

Jigou Technology has realized functions such as low latency, multi-party accompaniment synchronization, and accurate mixing of streams on the server through technical means, providing users with a more realistic and high-quality real-time chorus experience, and realizing "real-time chorus.

Advantages of instant real-time chorus solution

  • The end-to-end delay is as low as 70 ms , reaching the level of no sensory delay for the human body, and users around the world can enjoy a truly real-time experience.
  • The accompaniment of multiple parties is precisely synchronized, and each end starts accompaniment playback at the same time, creating a high-quality chorus experience.
  • The server side accurately mixes the stream, and the voices and accompaniment of all choruses are mixed into one stream, and the voices of all parties are accurately aligned through the NTP time and then mixed. Listeners only need to pull one stream to hear a good chorus effect, and the weak network experience is good.

The interactive gameplay and functional components of online karaoke are closely related, and each component needs to work together to realize a complete online karaoke platform. Interactive gameplay is the core of online karaoke, including music, lyrics, accompaniment, scoring, and interaction between users. These interactive gameplays need to rely on a series of functional components to achieve, including audio processing, video processing, network transmission, data storage, etc.

2. Product function ecology:

The scalability of the solution also includes the richness of product function ecology. In different stages of application, new functions and modules need to be added continuously to meet changing user needs. Therefore, choosing a solution with a complete product function ecology can help applications to iterate and upgrade quickly. For example, online KTV applications can add some auxiliary functions, such as automatic tuning, vocal cancellation, etc., to improve the user's singing experience.

Referring to the more mature third-party audio and video manufacturers on the market , iGoo Technology [ learn more ],
iGou's online karaoke solution provides a massive library of genuine songs , nine scene-based capabilities, and a self-developed audio and video engine for high-definition sound quality experience. It is understood that the KTV SDK integrates a number of well-known domestic music copyright providers, and a set of SDKs can quickly access music copyrights without changing the SDK. are all covered.

WeChat picture_20230704171226.jpg

Massive genuine music library to solve copyright compliance issues

serial number Function Functional description
1 Hot songs Contains 4 major charts and dozens of classified playlists, hot songs can be obtained directly
2 Music library component Massive copyright music resources, you can search, obtain, download songs and lyrics resources
3 play component Support start/pause/resume songs, and support switching between original singing and accompaniment, adjusting vocal and accompaniment volume, etc.
4 Lyric component Support line-by-line/word-by-word synchronous playback of lyrics and songs, real-time alignment
5 Interactive gameplay Contains functions such as singing scores, obtaining fragment resources, etc., and may support business scenarios such as singing rankings and rushing to sing high songs
6 rich sound Users use rich sound effects when singing to enhance the singing effect
7 Wheat management The homeowner can control the microphone position, and the user can perform microphone loading and unloading operations
8 Intelligent noise reduction Intelligently reduce noise such as the environment through algorithms to improve the quality of human voice
9 real time chorus Ultra-low latency real-time chorus effect, highly restored offline chorus experience

3. Third-party expansion capabilities:

In the ever-changing market competition, choosing a solution with strong third-party expansion capabilities can help applications better integrate with other applications and services, and improve application interoperability and user experience. For example, an online KTV application can be integrated with other music applications or social applications to provide more interactive and social functions.

Function Functional description Applicable scene
sound player Support playing audio files in MP3, WAV and other formats Atmosphere, play short-term sound effects such as applause and laughter
media Player Support MP3, MP4 and other local files and online (HTTP) media files , BGM scene: play background music and accompaniment
remix Support for mixing audio from media players, sound effects players, etc. Play music and other content in the room, and output after mixing
reverberation Supports simulating various sound effects, such as studio, record, ethereal, rock, etc. Show different sound effects when singing, improve output quality
Voice Changer Support changing sound characteristics, such as Transformers, Uncle and other sound characteristics Change the vocalist's voice for added spice
back to ear After plugging in the earphones, singing returns to your own voice, providing ultra-low latency ear return The necessary ability to sing, enjoy KTV and stage-level experience
Sound Waves and the Audio Spectrum The sound wave indicates the volume of the speech, and the audio channel indicates the component information of the current audio field Display current sleeping speech and frequency domain component information
Media Minor Information Streaming the application layer Music scenes are used to transmit playback progress in order to display lyrics
local audio recording Record singing audio data to local files Scenarios that require secondary processing of audio clips, such as sharing, detection, etc.
High precision lyrics synchronization Accompaniment and lyrics are aligned in real time Verbatim lyrics, allowing users to sing to the rhythm

3. Core technical performance

Through research, the author sorted out the technical indicators of the online karaoke scene. In order to provide a high-quality music experience, we need to pay attention to the following four core technical indicators: end-to-end delay, vocal accompaniment, sound quality fidelity, and noise reduction echo processing.

  1. End-to-end latency: Refers to the time interval from when a user starts singing to hearing their own voice. The lower the latency, the better the user's real-time experience when singing.
  2. 人声伴奏:指将用户唱的人声和伴奏合并成一首完整的歌曲。伴奏应该与用户的人声精准同步,且音质应该保证高清晰度和高还原度。
  3. 音质保真:指将用户的人声和伴奏录制的音质保持高还原度,使得用户的唱歌声音真实自然。
  4. 降噪回声处理:指处理人声中的噪音和回声等杂音,以提高音质,使得用户的唱歌声音更加清晰自然。

以上四个指标是在线K歌场景下的核心技术指标,直播厂商需要根据这些指标来提供高质量的在线K歌解决方案。

以即构、声网、TRTC为例:

核心技术指标 即构 声网 腾讯音视频
延迟 60ms,从演唱到收听,端到端延迟低至60毫秒 低至 64ms 端到端超低延时 低于300ms超低时延合唱体验
人声伴奏齐唱 5ms,人声与伴奏几乎完全同步,误差不超过5ms,远超行业水准 伴奏、歌词、人声多端精准同步 伴奏、人声、歌词精准同步。
音质高保真效果 混音20ms,服务端精准混流,多端误差不超过20毫秒 声网 SOLO™、NOVA™ 语音引擎,支持 48kHz 全频带采样,还原声音高保真度,音频 MoS 分高达 4.7 支持48kHz采样率,128kbps码率及立体声音频,媲美纯正 CD 效果
降噪回声处理 3A+AI回声处理,智能回声处理,扬声器外放也保持高音质体验无回声,人声表现接近无损 音频 MOS 分高达 4.7 AI 智能降噪
低延迟耳返 超低延迟耳返体验,ios20ms+、android 40ms+,开启耳返后,演唱者在唱歌时可获得更好的反馈。 50 ms 超低延时耳返,告别走音跑调 支持低延时耳返功能,告别走音跑调。

音视频厂商推荐-即构在线KTV实时合唱解决方案

音视频厂商-成熟的在线K歌解决方案

写到最后,对比了几个主流的音视频厂商,官网上都说自己提供了高质量的在线K歌解决方案,开发者可根据不同的需求和场景可以进行选型对比。如果您对于实时合唱的延迟和音质有较高的要求,可以选择[即构在线K歌实时合唱解决方案]

The real-time chorus solution provided by Jigou supports multiple people to sing online at the same time, and the end-to-end delay is less than 70ms. It supports accurate synchronization of multi-party accompaniment and accurate mixed stream on the server side . Improve the user's karaoke experience.

The following materials are from Instant Real-time Chorus Solution, you can refer to them.

The first exclusive "true" real-time chorus solution in the whole network

Precise stream mixing on the server side

top technical indicators

epilogue

Finally, when selecting a solution, we should first deeply analyze our business scenario, including but not limited to: business model, user experience, product function, business process, technical framework and other factors, so that we can choose the solution that is most suitable for the business scenario. How to implement at a low cost without spending energy on learning, and use the technical solutions you have mastered faster, is applicable to individual/team technology selection. In this regard, everyone is welcome to private message and comment to communicate with each other...

Guess you like

Origin blog.csdn.net/RTC_SDK_220704/article/details/131540209