Generating music based on deep learning

   I was watching Andrew Ng's deep learning video tutorial before. In the after-school homework in the RNN section, I implemented a music generator based on deepjazz . After the experiment, I found that the results were still similar, which aroused my interest. , so I checked some materials to see what recent progress has been made in the automatic generation of music, especially the application of deep learning in this area. I will summarize it a little here and write some interesting applications.

-------------------------------------------------- -- Part I : A brief introduction to deep jazz --------------------------------------- ---------------------

1.What is deepjazz?

   The following content is taken from deepjazz's official website :

deepjazz is a deep learning based jazz music generator using theano and keras. I used 36 hours to create deepjazz in a hackathon. It uses both theano and keras libraries to generate jazz music. Specifically , which builds a two-layer LSTM that learns from midi files. It uses deep learning techniques, as well as AI techniques, which created the famous google AlphaGo and IBM's Watson, to produce music. And music is considered very human of.

2. How to use deepjazz?

2.1 Train the model and generate a midi file

   Let's go to the github homepage of deepjazz first. It's relatively simple to say on github, we first git clone https://github.com/jisungk/deepjazz, then cd deepjazz, and then run python generator.py It should be able to generate learning .mid file. But in the process of testing, it will first report an error (python3): ''' from itertools import groupby, izip_longest ImportError: cannot import name 'izip_longest' '''. This is because the original script was written in python2 Yes, we need to replace all the original "from itertools import izip_longest" with "from itertools import zip_longest".

   Continue to run, or due to compatibility issues, it will still report an error:

"melody_voice.insert(0, key.KeySignature(sharps=1,mode='major'))TypeError: init () got an unexpected keyword argument 'mode'
Here we just need to remove mode = 'major' , you also need to modify the _sample function in generator.py to:

def __sample(a, temperature=1.0):
    a = np.log(a) / temperature
    # a = np.exp(a) / np.sum(np.exp(a))
    # return np.argmax(np.random.multinomial(1, a, 1))
    dist = np.exp(a)/np.sum(np.exp(a))
    choices = range(len(a)) 
    return np.random.choice(choices, p=dist)

After making the above modifications, running python generator.py should be able to run normally. After the operation is over, there will be an extra deepjazz_on_metheny...128_epochs.midi file in the midi folder, which is what we pass through original_metheny. The generated files obtained by mid training and learning. It should be noted that whether we train or generate audio files in common .mp3 and .wav formats, they are files in .mid (or .midi) format. Then these two What is the difference between the two? Here is a brief explanation from the definition of Baidu Encyclopedia:

Unlike wave files, MIDI files do not sample the music, but record each note of the music as a number, so the files are much smaller compared to wave files and can meet the needs of long-term music. The MIDI standard specifies the mixing and articulation of various tones, and these numbers can be re-synthesized into music through an output device.
The main limitation of MIDI music is that it lacks the ability to reproduce real natural sounds, so it cannot be used where speech is required. Additionally, MIDI can only record a limited combination of instruments as specified by the standard, and playback quality is limited by the sound card's synthesis chip. In recent years, popular foreign sound cards generally use the wavetable method for music synthesis, which greatly improves the quality of MIDI music.
There are several workarounds for MIDI files, such as RMI and CIF. The CMF file (creative music format) is the music file used with the Sound Blaster card. The RMI file is a sub-format of the RIFF (resource interchange file format) file used by Windows, called RMID, which is a format that contains MIDI files.
   To put it simply: common .mp3, .wav format files record real audio content, so generally the volume will be relatively large (ranging from several megabytes to tens of megabytes), while midi format files do not record real audio content. Audio information, it just records a number representing a format, the computer can recognize this number according to a certain standard, and then convert it into the corresponding audio to play. Therefore, the file size of the general .midi format is very small, which is a great advantage of it, and its disadvantage is that the reproduction of the real sound is poor (because it can only simulate the sound through a limited number of specified instruments).

2.2 How to play midi files?

   Closer to home, after getting the generated midi file, of course we need to play it and enjoy it. There are many softwares under windows that can play midi format files, but the system I use is ubuntu16.04, the default player does not support midi format, after consulting the information, I found that timidity needs to be installed, and
sudo apt- Just get install timidity. I found a python script play_midi.py (based on pygame), which can play midi files. The code is as follows:

import pygame
import pygame as pg
def play_music(music_file):
    '''
    stream music with mixer.music module in blocking manner
    this will stream the sound from disk while playing
    '''
    clock = pg.time.Clock()
    try:
      pg.mixer.music.load(music_file)
      print("Music file {} loaded!".format(music_file))
    except pygame.error:
        print("File {} not found! {}".format(music_file, pg.get_error()))
        return
    pg.mixer.music.play()
    # check if playback has finished
    while pg.mixer.music.get_busy():
        clock.tick(30)
# pick a midi or MP3 music file you have in the working folder
# or give full pathname
music_file = input("Please input the midi file path:")
#music_file = "Drumtrack.mp3"
freq = 44100  # audio CD quality
bitsize = -16  # unsigned 16 bit
channels = 2  # 1 is mono, 2 is stereo
buffer = 2048  # number of samples (experiment to get right sound)
pg.mixer.init(freq, bitsize, channels, buffer)
# optional volume 0 to 1.0
pg.mixer.music.set_volume(0.8)
try:
    play_music(music_file)
except KeyboardInterrupt:
    # if user hits Ctrl/C then exit
    # (works only in console mode)
    pg.mixer.music.fadeout(1000)
    pg.mixer.music.stop()
    raise SystemExit

Run python play_midi.py and enter the path of the midi file to play. Try to play the midi file we generated, you will find that it sounds pretty good!
Of course, after installing timidity, we can directly play the midi file, just run timidity xxx.midi directly. But there may be problems, because we also need some additional configuration files, run the command 'sudo apt-get install fluid-soundfont-gm fluid-soundfont-gs' to install soundfont (sound font, used to parse midi, and play ), then open the /etc/timidity/timidity.cfg file, comment out the last line 'source freepats.cfg', if it is an ubuntu system, change it to:
dir /usr/share/sounds/sf2/
soundfont FluidR3_GM.sf2
if it is In centos system, change it to:
dir /usr/share/soundfonts/
soundfont FluidR3_GM.sf2
and then restart timidity, execute the command: sudo /etc/init.d/timidity restart
so that we can execute timidity xxx.midi and it should be able to play normally!

2.3 How to convert midi files into general audio files (mp3, wav and other formats)

   Now we can play midi files normally. But there is still a problem. Generally, the audio formats we use are wav and mp3, because they are easier to be recognized and played by common players. So is there any way to convert the midi file into this format? Of course there is a way, the easiest way is to use timidity (we have already installed it before), run the following command:
timidity --output-24bit - -output-mono -A120 source.mid -Ow -o source.wav
can convert source.mid to source.wav. Among them --output specifies the output format, -A specifies the volume (volume), -Ow means to convert to the RIFF WAVE file output format, -o specifies the name of the output audio file, you can see the meaning of each parameter with timing --help.
If If there is no problem, we have a .wav file, so you can play it with any music player!
By the way, a small question. Wav files are generally larger in size (better quality), and mp3 files are more common on the Internet, so how should the two be converted? Here I provide two solutions:
1. Use the ffmpeg audio and video library for Conversion. Run the command
ffmpeg -i source.wav -acodec libmp3lame source.mp3
to convert source.wav to source.mp3. Here -i means input audio, -acodec means set audio codec (audio encoding) format, which is an alias of "-codec:a", for more information, you can enter ffmpeg --help or man ffmpeg to view
2. You can also install python The audio library pydub is converted, this is my previous blogSeveral python audio processing libraries have been introduced. If you are interested, you can check them yourself.

How to convert wav, mp3 files to midi files?

   At first I thought it was a very easy task, but I found out that it was a very hard problem after checking the information. At present, there are still many people studying the problem of music transcription (music conversion). I haven't found an api that solves this problem well. For details, please refer to this discussion on stackoverflow. There are many plugins that can do this, such as Sonic Annotator, etc., but it involves very professional knowledge. I thought of one Still gave up. . . In short, if you want to do batch conversion from wav, mp3 to midi, it is still very difficult, especially if you require relatively high quality, if you are interested, you can study it yourself. However, if large-scale automatic conversion is not required, there are still many softwares that can complete the conversion of wav (mp3) to midi. For example, this website can convert mp3 to midi format online.

How to train my own midi file?

   Before, we used an original_metheny.mid file given by the author for training and then generated a mid file. So can we use our own midi files for training? Here is a website where we can pack and download a lot of midi files, or visit this website to download the midi format of our favorite popular music. We found the format and tracks of the midi files we downloaded. , divisions are different from the original_metheny.mid format provided by the deepjazz author, so if we just replace the mid file with our own, there is no way to train smoothly, and an error will always be reported. I probably looked at the code, mainly because there is a problem with the part of the code that uses music21 to process midi format conversion. I tried for a long time, because I am not very familiar with music21 and midi format, so this problem has not been solved for the time being. If I have time later, I will analyze the source code of deepjazz again to solve this problem.

---------------------------------------------------------------------------Part II magenta ---------------------------------------------------------------

1.What is magenta?

   The following is the introduction of magenta official github .

magenta is a research project exploring the use of machine learning to create art and music. Currently it mainly uses emerging deep learning techniques as well as reinforcement learning techniques to generate songs, paintings, pictures, etc. It also aims to explore building intelligent tools and interfaces so that artists can use these models to extend (rather than replace) Part of their work.
magenta was originally initiated by some researchers at Google Brain, but many other researchers have also contributed greatly to the project. We use tensorflow to publish our models and code on github. If you want To learn more about magenta, you can check out our blog , where we cover a lot of technical details. You can also join the discussion group .

2.How to install magenta and use it?

   Installing magenta is very simple and can be installed directly with pip install magenta, but please note that you need to install tensorflow before this.
magenta supports gpu acceleration (you only need to install the gpu version of tensorflow), use pip install magenta-gpu to install .magenta actually provides a lot of models , including voice, pictures, etc. Here we mainly focus on models for music generation.

2.1 drums_rnn model

   This is a model trained on drums-style music . This model uses an LSTM to apply a language model to drum track generation. Unlike melodies, drum tracks are polyphonic, and multiple drums may exist at the same time. Nonetheless, we Or treat the drum track as a single sequence by:
a) mapping all the different midi drums to a smaller drum class
b) expressing each event as a single value that represents the strike The drums classes category to which (tapping) belongs.
The model provides two configurations: one drum and drum kit. For details, please refer to the description of the original website
   to explain how to train the drums_rnn model. Magenta has actually provided a pre-trained model, we can first Let’s take a quick inference. First download the drum_kit file, and then put the downloaded drum_kit_rnn.mag file into a folder (such as model/under). Then we write a script generate_drums_rnn.sh:

#!/bin/bash
drums_rnn_generate \
        --config='drum_kit' \
        --bundle_file=../data/drum_kit_rnn.mag \
        --output_dir=../output \
        --num_outputs=5 \
        --num_steps=256 \
        --primer_drums="[(36,)]"

Here --config is the configuration configuration, there are two options 'drum_kit' and 'one_drum'
--bundle_file specifies the address of our bundle file (that is, the drum_kit_rnn.mag file just downloaded)
--output_dir specifies the address of the output midi file
--num_outputs Specify the number of output midi files (default is 10)
--num_steps Specify the training epochs (the number of training rounds)
--primer_drums Specify some syllables to start (required)
The above script will start with a bass drum hit (bass) , if you want, you can also use other python lists in string form, but the elements in the list must be a tuple, and they must be integers representing the midi syllables of drum. For example: --primer_drums="[( 36, 42), (), (42,)]" means a bass and a hit-hat, then a silence, and finally a hit-hat. If you don't use the --primer_drums parameter, you can also Use the --primer_midi parameter to use a drum midi file as the primer (start).
If you try the above way, you will get some midi files. Then play it, some of them are quite good!
   We used the pre-trained model above, and then we can directly get the generated midi file, so how to train our own model? Training our own model is a bit complicated, we can follow the following steps to operate:

step1:build your dataset

   Reference URL First, we need to prepare our own midi datasets, which can be packaged and downloaded at this URL , or downloaded manually at this midiworld , and then we need to convert these midi files into NoteSequences. Use the following script convert_midi.sh to convert:

#!/bin/bash
convert_dir_to_note_sequences \
  --input_dir=$INPUT_DIRECTORY \
  --output_file=$SEQUENCES_TFRECORD \
  --recursive

The above parameters:
--input_dir means the folder address of the input midi files (can contain subfolders)
--output_file means the address of the output .tfrecord file
--recursive means recursively traverse the midi files
Note if you are using the previous midi datasets If the data set is very large (about 1.6G) and contains a lot of midi files, the training may be very time-consuming. I trained for about two hours before finishing the training and finally terminated early. Of course, if Your computer performance is very good, you can also try to train.
After training, we will get a lmd_matched_notesequences.tfrecord file. Next go to step2

step2:create sequenceExamples

   Note that our input to the model for training and evaluation is SequenceExamples. Each SequenceExample will contain a sequence input and a sequence label, representing a drum track. You can run the following command to convert the NoteSequences obtained earlier into SequenceExamples. This will produce two Parts of SequenceExamples, one for training and one for evaluation. Specifically, --eval_ratio can be used to specify the ratio of the two. For example, specifying eval_ratio = 0.1 (or 10%) will use 10% of the extracted drums tracks with For evaluation, the remaining 90% is used for training.

drums_rnn_create_dataset \
--config=<one of 'one_drum' or 'drum_kit'> \
--input=/tmp/notesequences.tfrecord \
--output_dir=/tmp/drums_rnn/sequence_examples \
--eval_ratio=0.10

In the above parameters:
--config can only be 'one_drum' or 'drum_kit'
--input is the address of the tfrecord file obtained in step1
--output_dir is the folder address of the output SequenceExamples
--eval_ratio specifies the ratio of evaluation and training

step3:train and evaluate the model

   Run the following code (train.sh) to train.

#!/bin/bash
drums_rnn_train \
--config=drum_kit \
--run_dir=/tmp/drums_rnn/logdir/run1 \
--sequence_example_file=/tmp/drums_rnn/sequence_examples/training_drum_tracks.tfrecord \
--hparams="batch_size=64,rnn_layer_sizes=[64,64]}" \
--num_training_steps=20000

The meaning of each parameter is as follows:
--config:'one_drum' or 'drum_kit'
--run_dir is the address of the folder where the checkpoints of the running tensorflow training model are stored --sequence_example_file is the address
of the SequenceExamples tfrecord file used to train the model
--num_training_steps specifies the training steps (number of rounds), if not specified, will run until manual termination (CTRL-C or CTRL-Z)
--hparams is used to specify other hyperparameters, such as here we specify batch_size = 64 instead of the default 128. Using a smaller batch size will help reduce the risk of OOM (memory overflow). Of course, if your memory is large enough, you can also set a larger batch_size. Here is also set to use 2 layers of RNN, each layer The hidden units are all 64, instead of the default 3 layers, each layer has 256 hidden units. This can speed up the training (of course, a certain loss of accuracy), if your computer performance is high, you can try a larger hidden units for better results. We can also set the --attn_length parameter to specify how many steps to perform an attention machanism. Here we use the default value of 32.
   Run the following code (eval.sh) to perform evaluation .

!/bin/bash
drums_rnn_train \
--config=drum_kit \
--run_dir=/tmp/drums_rnn/logdir/run1 \
--sequence_example_file=/tmp/drums_rnn/sequence_examples/eval_drum_tracks.tfrecord \
--hparams="batch_size=64,rnn_layer_sizes=[64,64]" \
--num_training_steps=20000 \
--eval

Similar to train.sh, the only difference is that --sequence_example_file needs to specify the tfrecord file of eval, and there is an additional --eval to specify that this is an eval process, not a train. Note that the eval process will not change any of them parameter, it is only used to evaluate the performance of the model.
Of course, we can also run: tensorboard --logdir=/tmp/drums_rnn/logdir to use tensorboard to view the results of train and eavl, just open it in the browser:
http://localhost :6006 will do.

step4:generate drum tracks

   After completing step1~step3, we can generate our own midi file. The script to run is:

#!/bin/bash
drums_rnn_generate \
--config=drum_kit \
--run_dir=/tmp/drums_rnn/logdir/run1 \
--hparams="batch_size=64,rnn_layer_sizes=[64,64]" \
--output_dir=/tmp/drums_rnn/generated \
--num_outputs=10 \
--num_steps=128 \
--primer_drums="[(36,)]"

Most of the parameters have been explained above and will not be repeated here.

2.2 melody_rnn model

   The melody_rnn model is very similar to the drums_rnn above, except that what is produced here is melody, and what is produced above is drums. I won’t go into details here. For details, see melody_rnn

2.3 Other models

   In addition to drums_rnn and melody_rnn mentioned above, magenta has many other interesting models, such as neural style transfer (neural style transfer, which can generate pictures of specified style), etc. If you are interested, you can go to magenta to learn more.

-----------------------End of this article, thank you for reading!-------------------- -------------------------------------------------- -

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325852156&siteId=291194637
Recommended