Datawhale dry goods
Latest: AI application , editor: Xinzhiyuan
[Guide] Recently, an "unpopular singer" relied on an AI substitute to cover Chinese music songs and became popular all over the Internet.
Overnight, "AI Stefanie Sun" became popular all over the Internet.
On Station B, AI Stefanie Sun covered JJ Lin's "She Said", Jay Chou's "Love in BC", Zhao Lei's "Chengdu" and so on, which made many netizens fall into a deep trap.
"Unpopular singer" Stefanie Sun has just become a popular singer in 2023, setting off a star-chasing carnival among many people.
A netizen said, "After listening to AI Stefanie Sun all night, I can't get out..."
These cover songs are self-made and uploaded by UP masters such as Eternity丨L and Roster_x through open source projects.
(The author seems to have deliberately added a second of blank space in the "Peninsula Iron Box" to make up 5 minutes and 20 seconds)
UP Master: Eternity丨L
In addition to AI Stefanie Sun, there are also AI Jay Chou, AI Wang Xinling, AI Lin Zhixuan...
Perhaps many people never dreamed that in 2023, the Chinese music scene would be revived in this form.
"AI Stefanie Sun" opens online
Some time ago, a TikTok netizen used AI to create a song "Heart on My Sleeve", which quickly became popular on the Internet, attracting more than 10 million people to watch.
Netizens who have listened to this song said that they surprised me, it was crazy!
The song was written with the voices of two American pop musicians, Drake and The Weeknd. First train the AI through the voice of the singer, and then use the AI to create.
In China, the Chinese music songs sung by AI on Station B have gradually become the focus of many people's attention. Stars such as Stefanie Sun, Wang Xinling, and Jay Chou have "come back".
And the most popular is Stefanie Sun, who directly became the new darling of AI with the title of "Queen of Voice".
UP master: Roster_x
Someone even made the Cantonese version of "Love Comes Too Late" by AI Stefanie Sun.
However, for AI music production, it is not a new thing in the entire music industry. It's just that the popularity of generative AI has lowered the threshold for AI cover songs again.
For example, at the beginning of the year, Google also launched the text-to-music model MusicLM, by treating the music generation process as a layered sequence-to-sequence modeling task, and generating high-fidelity music at a frequency of 24 kHz.
For many fans, the AI cover satisfies many of their fantasies to a certain extent.
There are also some fans who have trained the AI of late classic old singers, including Ah Sang, Leslie Cheung, Yao Beina, Teresa Teng and so on.
This may be a kind of digital immortality, a way to bring long-lost voices back to people's hearts.
Midjourney's super ability to produce realistic drawings made people exclaim that the painter was about to lose his job. For AI cover, is the singer also going to be replaced?
After a UP master @阿张Rayzhang sang Killer Queen with the AI trained by his own timbre, he felt terrible for a moment.
After urgently recording a video, he attached the title "Will the AI singer make the cover area collectively unemployed? I was killed by the AI version of me!".
Some netizens said that they are the first batch of AI victim painters, and they feel that no profession can escape.
Some people also said that some parts of the cover are not like it at all.
You must know that for AI cover songs, rich training data for specific artist timbres are also needed, so that the works generated by AI are more realistic.
As far as the current technology is concerned, although the singer's singing, skills and style cannot be completely imitated, the timbre can basically be completely reproduced.
But the real everyone cannot be replaced.
Although AI cover songs are popular, the other side of music created by AI is the imminent copyright issue.
After the "Heart on My Sleeve" created by AI became popular on TikTok, the full version was uploaded to Apple Music, Spotify, YouTube and other platforms.
In this regard, American singer Drake expressed his dissatisfaction on Ins, "This is the last straw (that broke the camel's back)." Currently, the song has been removed due to copyright infringement.
According to the Financial Times, Universal Music Group, which owns the copyrights of superstars such as Taylor Swift and Bob Dylan, is urging Spotify and Apple to prevent AI tools from grabbing lyrics and melodies from their artists' copyrighted songs.
But some artists are not stingy with their own voices. Grimes, Musk’s ex-girlfriend, said online,
"Anyone can use my voice AI to generate songs." However, another 50% of the copyright has to be paid.
And the author of the original project "so-vits-svc" behind the AI cover of this fire is said to have deleted the project because too many people abused it.
SoVitsSvc: Singing voice conversion
Project address: https://github.com/svc-develop-team/so-vits-svc
The singing voice conversion model uses the SoftVC content encoder to extract the speech features of the source audio, and then feeds the vectors directly into VITS instead of converting to an intermediate text-based format. Therefore, both pitch and pitch can be preserved.
In addition, the project developers also solved the problem of sound interruption by using NSF HiFiGAN as a vocoder.
· Feature input is changed to Content Vec · The sampling rate is uniformly used at 44100Hz
Due to the change of parameters and the simplification of the model structure, the GPU memory required for inference is significantly reduced.
· Added option 1: automatic pitch prediction in vc mode, which means that there is no need to manually enter the pitch key when converting voices, and the pitch of male and female voices can be automatically converted. However, this mode causes a pitch shift when converting songs.
Added option 2: Reduce timbre leakage through the k-means clustering scheme, making the timbre more similar to the target timbre.
Add option 3: Add NSF-HIFIGAN enhancer, which can enhance the sound quality of some models with few training sets, but has a negative impact on the trained model, so it is turned off by default.
Pretrained model file
Put checkpoint_best_legacy_500.pt in the hubert directory.
Put G_0.pth and D_0.pth in the logs/44k directory.
preprocessing
0. Audio Slicing
Use the audio-slicer-GUI or audio-slicer-CLI tools to slice the original audio to 5-15 seconds.
It’s okay to be longer, but too long (such as 30 seconds) may cause “torch.cuda.OutOfMemoryError” during training or even preprocessing, commonly known as bursting video memory.
After slicing, remove long and short audio.
1. Resampled to 44100Hz and Mono
python resample.py
2. Automatically divide the data set into training set and validation set, and generate configuration files
python preprocess_flist_config.py
3. Generate hubert and f0
python preprocess_hubert_f0.py
After completing the above steps, the dataset directory will contain preprocessed data and the dataset_raw folder can be deleted.
Now, you can modify some parameters in the generated config.json -
keep_ckpts: Keep the last keep_ckpts model during training. Setting to 0 will keep all models, default is 3.
all_in_mem: Load all datasets into RAM. It can be enabled when the disk IO is too low on some platforms and the system memory is much larger than your dataset.
train
python train.py -c configs/config.json -m 44k
reasoning
The model needs to use "inference_main.py".
for example:
python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -s "nen" -n "君の知らない物語-src.wav" -t 0
Although the original project team has stopped maintaining, many netizens have forked and made some updates.
For example, the following graphical interface:
Project address: https://github.com/voicepaw/so-vits-svc-fork
AI "Resurrection"
In addition to AI cover, many netizens have done similar projects before. For example, "AI-Talk" allows Musk and Jobs to have a conversation through time and space.
In the video, AI not only simulates their voices, but also simulates their dialogue ideas to a certain extent, making the communication process very smooth.
AI makes it possible for us to have a dialogue with the dead. Previously, the UP master of station B also resurrected the old lady with AI.
For the voice production of the old lady, the audio that has been in the past is directly uploaded, and the material basically comes from the past telephone recording, video video or WeChat voice.
And use the audio editing software AU to adjust, the direction of adjustment is mainly in noise reduction, human voice enhancement and so on.
Then cut the clearer audio samples into short sentences of several seconds for easy annotation. Finally, the processed audio is packaged and put into the speech synthesis system.
Using the speech synthesis system, you can try to enter text-to-speech.
Netizens witness the hard work of science and technology
AI Stefanie Sun's song has reached the hearts of many netizens.
Recently, I have been obsessed with AI "cover songs", from AI Kanye singing fine wine, down to Su Xiaoding singing the truth is true. But to be serious, Stefanie Sun's cover song is indeed the best in AI.
AI Stefanie Sun, who has been addicted to station B these days, just listened to "A Game, A Dream".
After listening to the songs sung by AI, many netizens felt the horror of AI singers:
The power of technology is truly mind-boggling.
Deeply feel what is called the power of technology...
This is AI life, digital soaring!
There are also netizens' nostalgia for the deceased singer.
References:
https://github.com/svc-develop-team/so-vits-svc
https://www.bilibili.com/video/BV1io4y1w73k/?vd_source=eecf800392d116d832e90ad1c9ae70f6
It's not easy to organize, so I like it three times ↓