机器学习笔记 - 使用TensorFlow进行音乐生成

一、概述

        这里我们将探索构建用于音乐生成的循环神经网络 (RNN)。我们将训练一个模型来学习 [ABC记谱法] 中原始乐谱中的模式,然后使用该模型生成新音乐。

1、关于ABC记谱法

        关于ABC记谱法的概述,可以查看下面的网址。

杂记 什么是ABC记谱法_bashendixie5的博客-CSDN博客ABC记谱法是计算机记谱法的简写形式。在基本形式中,它使用带有 a-g、A-G 和 z 的字母符号来表示相应的音符和休止符,并使用其他元素来增加这些音符的附加值——升、降、升八度或降八度、音符长度、键和装饰。这种记谱形式开始于亥姆霍兹音高记谱法和使用ASCII字符模仿标准乐谱法(小节线、速度标记等)的结合,可以方便在线分享音乐,也为软件增加了一种新的、简单的语言开发人员,与其他为方便而设计的符号不同,例如指法和唱名法。https://blog.csdn.net/bashendixie5/article/details/124122387

2、关于数据集

        这里使用MIT实验室的数据集,数据集收集了以 ABC 符号表示数千首爱尔兰民歌的数据集。 使用下面的命令进行安装。

pip install mitdeeplearning

        我们可以使用下面的代码进行数据集的加载。

songs = mdl.lab1.load_training_data()

        查看其中的数据,

# 打印其中一首歌曲
example_song = songs[1]
print("\nExample song: ")
print(example_song)

        如果linux系统安装了abcmidi、timidity,可以直接播放,windows可以考虑使用下面的网址进行播放。

abcjs: Quick EditorSimple visual editing of ABC stringshttps://editor.drawthedots.com/        如下图所示,左侧就是ABC记谱法(音乐的字符表示),右侧是对应的谱,左下有播放按钮,可以进行播放。

 二、进行数据处理

        让我们考虑我们的预测任务。我们正在尝试训练一个 RNN 模型来学习 ABC 音乐中的模式,然后使用这个模型根据这些学习信息生成(即预测)一首新音乐。

        要达到这一点,我们真正要让模型做的事情是:给定一个字符或一系列字符,最可能的下一个字符是什么? 我们将训练模型来执行此任务。

        为此,我们将向模型输入一系列字符,并训练模型预测输出,即每个时间步的下一个字符。 RNN 维护一个依赖于先前看到的元素的内部状态,因此在生成预测时将考虑到在给定时刻之前看到的所有字符的信息。

        首先查找连接字符串中的所有唯一字符

# 将我们的歌曲字符串列表加入到包含所有歌曲的单个字符串中
songs_joined = "\n\n".join(songs) 

# 查找连接字符串中的所有唯一字符
vocab = sorted(set(songs_joined))
print("There are", len(vocab), "unique characters in the dataset")

1、向量化文本

        在开始训练 RNN 模型之前,我们需要创建基于文本的数据集的数字表示。为此,我们将生成两个查找表:一个将字符映射到数字,另一个将数字映射回字符。 上一步刚刚确定了文本中存在的唯一字符。

### 定义文本的数字表示 ###

# 创建从字符到唯一索引的映射。
# For example, to get the index of the character "d", 
#   we can evaluate `char2idx["d"]`.  
char2idx = {u:i for i, u in enumerate(vocab)}

# 创建从索引到字符的映射。 这是 char2idx 的倒数,它允许我们从唯一索引转换回我们词汇表中的字符。
idx2char = np.array(vocab)

        这为我们提供了每个字符的整数表示。观察文本中的唯一字符(即我们的词汇表)被映射为从 0 到 `len(unique)` 的索引。 让我们看一下我们数据集的这个数字表示:

print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

        打印如下

{
  '\n':   0,
  ' ' :   1,
  '!' :   2,
  '"' :   3,
  '#' :   4,
  "'" :   5,
  '(' :   6,
  ')' :   7,
  ',' :   8,
  '-' :   9,
  '.' :  10,
  '/' :  11,
  '0' :  12,
  '1' :  13,
  '2' :  14,
  '3' :  15,
  '4' :  16,
  '5' :  17,
  '6' :  18,
  '7' :  19,
  ...
}

        向量化歌曲字符串

def vectorize_string(string):
  vectorized_output = np.array([char2idx[char] for char in string])
  return vectorized_output

vectorized_songs = vectorize_string(songs_joined)

        查看文本的一部分是如何映射到整数表示的

print ('{} ---- characters mapped to int ----> {}'.format(repr(songs_joined[:10]), vectorized_songs[:10]))
# check that vectorized_songs is a numpy array
assert isinstance(vectorized_songs, np.ndarray), "returned result should be a numpy array"

        打印如下

'X:1\nT:Alex' ---- characters mapped to int ----> [49 22 13  0 45 22 26 67 60 79]

2、创建训练数据和标签

        我们的下一步是将文本实际划分为我们将在训练期间使用的示例序列。我们输入RNN的每个输入序列都将包含来自文本的“seq_length”字符。我们还需要为每个输入序列定义一个目标序列,用于训练 RNN 以预测下一个字符。对于每个输入,对应的目标将包含相同长度的文本,除了向右移动一个字符。

        为此,我们将文本分成 seq_length+1 的块。 假设 seq_length 是 4,我们的文本是“Hello”。 那么,我们的输入序列是“Hell”,目标序列是“ello”。

        然后,批处理方法将让我们将此字符索引流转换为所需大小的序列。

### 批量定义以创建训练示例 ###

def get_batch(vectorized_songs, seq_length, batch_size):
  # the length of the vectorized songs string
  n = vectorized_songs.shape[0] - 1
  # randomly choose the starting indices for the examples in the training batch
  idx = np.random.choice(n-seq_length, batch_size)

  '''TODO: construct a list of input sequences for the training batch'''
  input_batch = [vectorized_songs[i : i+seq_length] for i in idx]
  # input_batch = # TODO
  '''TODO: construct a list of output sequences for the training batch'''
  output_batch = [vectorized_songs[i+1 : i+seq_length+1] for i in idx]
  # output_batch = # TODO

  # x_batch, y_batch provide the true inputs and targets for network training
  x_batch = np.reshape(input_batch, [batch_size, seq_length])
  y_batch = np.reshape(output_batch, [batch_size, seq_length])
  return x_batch, y_batch


#  执行一些简单的测试以确保您的批处理功能正常工作!
test_args = (vectorized_songs, 10, 2)
if not mdl.lab1.test_batch_func_types(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_shapes(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_next_step(get_batch, test_args): 
   print("======\n[FAIL] could not pass tests")
else: 
   print("======\n[PASS] passed all tests!")

        对于这些向量中的每一个,每个索引都在单个时间步处理。 因此,对于时间步 0 的输入,模型接收序列中第一个字符的索引,并尝试预测下一个字符的索引。 在下一个时间步,它做同样的事情,但是除了当前输入之外,RNN 还会考虑上一步的信息,即它的更新状态。

        我们可以通过查看它在文本中的前几个字符上的工作方式来具体化:

x_batch, y_batch = get_batch(vectorized_songs, seq_length=5, batch_size=1)

for i, (input_idx, target_idx) in enumerate(zip(np.squeeze(x_batch), np.squeeze(y_batch))):
    print("Step {:3d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

三、训练循环神经网络 (RNN) 模型

        现在我们准备好在我们的ABC音乐数据集上定义和训练一个RNN模型,然后使用该训练好的模型生成一首新歌曲。我们将使用我们在上一节中生成的数据集中的批量歌曲片段来训练我们的 RNN。

        模型基于 LSTM 架构,我们使用状态向量来维护有关连续字符之间时间关系的信息。 然后将 LSTM 的最终输出馈送到一个完全连接的 Dense 层,我们将在其中输出一个softmax词汇表中的每个字符,然后从这个分布中采样以预测下一个字符。

        这里使用 Keras API,Sequential定义模型。

tf.keras.layers.Embedding: 输入层,由一个可训练的查找表组成,该表将每个字符的数字映射到一个具有“embedding_dim”维度的向量。
tf.keras.layers.LSTM: 我们的 LSTM 网络,大小为 units=rnn_units。
tf.keras.layers.Dense: 输出层,带有 vocab_size 输出。

 1、定义RNN模型

def LSTM(rnn_units): 
  return tf.keras.layers.LSTM(
    rnn_units, 
    return_sequences=True, 
    recurrent_initializer='glorot_uniform',
    recurrent_activation='sigmoid',
    stateful=True,
  )

        进行实例化

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    # Layer 1: Embedding layer to transform indices into dense vectors 
    #   of a fixed embedding size
    tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),

    # Layer 2: LSTM with `rnn_units` number of units. 
    # TODO: Call the LSTM function defined above to add this layer.
    LSTM(rnn_units), 
    # LSTM('''TODO'''),

    # Layer 3: Dense (fully-connected) layer that transforms the LSTM output
    #   into the vocabulary size. 
    # TODO: Add the Dense layer.
    tf.keras.layers.Dense(vocab_size)
    # '''TODO: DENSE LAYER HERE'''
  ])

  return model


model = build_model(len(vocab), embedding_dim=256, rnn_units=1024, batch_size=32)

        打印模型摘要

model.summary()

        结构如下

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (32, None, 256)           21248     
                                                                 
 lstm (LSTM)                 (32, None, 1024)          5246976   
                                                                 
 dense (Dense)               (32, None, 83)            85075     
                                                                 
=================================================================
Total params: 5,353,299
Trainable params: 5,353,299
Non-trainable params: 0
_________________________________________________________________

2、检查模型

        可以使用 100 的序列长度快速检查输出的维度。请注意,该模型可以在任何长度的输入上运行。

x, y = get_batch(vectorized_songs, seq_length=100, batch_size=32)
pred = model(x)
print("Input shape:      ", x.shape, " # (batch_size, sequence_length)")
print("Prediction shape: ", pred.shape, "# (batch_size, sequence_length, vocab_size)")

        使用未经训练的模型的预测,为了从模型中获得实际预测,我们从输出分布中进行采样,该分布由字符词汇表上的“softmax”定义。 这将为我们提供实际的字符索引。这意味着我们正在使用 [分类分布]对示例预测进行抽样。这给出了每个时间步的下一个字符(特别是它的索引)的预测。

        请注意,我们从这个概率分布中采样,而不是简单地获取 argmax,这可能导致模型陷入循环。

        让我们为批处理中的第一个示例尝试此采样。

sampled_indices = tf.random.categorical(pred[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

print("Input: \n", repr("".join(idx2char[x[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices])))

        输出如下,并不是合法的ABC记谱法。

 'AROi4AfBNGK9KgTF!|L<Omq\'^A4,GIBfTWl.T#[Y|QqHpgBJEe0NQZCaHkSjdxu1cM\'g^zS"#.T)Hobp:ChgJKR\'=[Zn!OpXC2GV'

3、训练模型

(1)定义损失函数

### Defining the loss function ###

'''TODO: define the loss function to compute and return the loss between
    the true labels and predictions (logits). Set the argument from_logits=True.'''
def compute_loss(labels, logits):
  loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
  # loss = tf.keras.losses.sparse_categorical_crossentropy('''TODO''', '''TODO''', from_logits=True) # TODO
  return loss

'''TODO: compute the loss using the true next characters from the example batch 
    and the predictions from the untrained model several cells above'''
example_batch_loss = compute_loss(y, pred)
# example_batch_loss = compute_loss('''TODO''', '''TODO''') # TODO

print("Prediction shape: ", pred.shape, " # (batch_size, sequence_length, vocab_size)") 
print("scalar_loss:      ", example_batch_loss.numpy().mean())

(2)超参数设置与优化

### 超参数设置与优化 ###

# Optimization parameters:
num_training_iterations = 2000  # Increase this to train longer
batch_size = 4  # Experiment between 1 and 64
seq_length = 100  # Experiment between 50 and 500
learning_rate = 5e-3  # Experiment between 1e-5 and 1e-1

# Model parameters: 
vocab_size = len(vocab)
embedding_dim = 256 
rnn_units = 1024  # Experiment between 1 and 2048

# Checkpoint location: 
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "my_ckpt")

(3)定义优化器及开始训练

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size)
# model = build_model('''TODO: arguments''')

'''TODO: instantiate an optimizer with its learning rate.
  Checkout the tensorflow website for a list of supported optimizers.
  https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/
  Try using the Adam optimizer to start.'''
optimizer = tf.keras.optimizers.Adam(learning_rate)
# optimizer = # TODO

@tf.function
def train_step(x, y): 
  # Use tf.GradientTape()
  with tf.GradientTape() as tape:
  
    '''TODO: 将当前输入输入模型并生成预测'''
    y_hat = model(x) # TODO
    # y_hat = model('''TODO''')
  
    '''TODO: 计算损失'''
    loss = compute_loss(y, y_hat) # TODO
    # loss = compute_loss('''TODO''', '''TODO''')

  # Now, compute the gradients 
  '''TODO: complete the function call for gradient computation. 
      Remember that we want the gradient of the loss with respect all 
      of the model parameters. 
      HINT: use `model.trainable_variables` to get a list of all model
      parameters.'''
  grads = tape.gradient(loss, model.trainable_variables) # TODO
  # grads = tape.gradient('''TODO''', '''TODO''')
  
  # Apply the gradients to the optimizer so it can update the model accordingly
  optimizer.apply_gradients(zip(grads, model.trainable_variables))
  return loss

##################
# Begin training!#
##################

history = []
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss')
if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists

for iter in tqdm(range(num_training_iterations)):

  # Grab a batch and propagate it through the network
  x_batch, y_batch = get_batch(vectorized_songs, seq_length, batch_size)
  loss = train_step(x_batch, y_batch)

  # Update the progress bar
  history.append(loss.numpy().mean())
  plotter.plot(history)

  # Update the model with the changed weights!
  if iter % 100 == 0:     
    model.save_weights(checkpoint_prefix)
    
# Save the trained model and the weights
model.save_weights(checkpoint_prefix)

四、使用 RNN 模型生成音乐

# 使用build_model方法创建模型
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1) 

# 加载最后一个检查点的模型权重
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

# 打印模型
model.summary()


### 预测生成歌曲 ###

def generate_text(model, start_string, generation_length=1000):
  # Evaluation step (generating ABC text using the learned RNN model)

  '''TODO: convert the start string to numbers (vectorize)'''
  input_eval = [char2idx[s] for s in start_string] # TODO
  # input_eval = ['''TODO''']
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  # Here batch size == 1
  model.reset_states()
  tqdm._instances.clear()

  for i in tqdm(range(generation_length)):
      '''TODO: evaluate the inputs and generate the next character predictions'''
      predictions = model(input_eval)
      # predictions = model('''TODO''')
      
      # Remove the batch dimension
      predictions = tf.squeeze(predictions, 0)
      
      '''TODO: use a multinomial distribution to sample'''
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
      # predicted_id = tf.random.categorical('''TODO''', num_samples=1)[-1,0].numpy()
      
      # Pass the prediction along with the previous hidden state
      #   as the next inputs to the model
      input_eval = tf.expand_dims([predicted_id], 0)
      
      '''TODO: add the predicted character to the generated text!'''
      # Hint: consider what format the prediction is in vs. the output
      text_generated.append(idx2char[predicted_id]) # TODO 
      # text_generated.append('''TODO''')
    
  return (start_string + ''.join(text_generated))


# TODO: 使用上面定义的模型和函数生成长度为1000的ABC格式文本!
generated_text = generate_text(model, start_string="X", generation_length=1000) # TODO
# generated_text = generate_text('''TODO''', start_string="X", generation_length=1000)


### Play back generated songs ###

generated_songs = mdl.lab1.extract_song_snippet(generated_text)

# 如果安装了abcmidi、timidity,可以直接播放,否则将generated_text内容复制到网站内进行播放
for i, song in enumerate(generated_songs): 
  # Synthesize the waveform from a song
  waveform = mdl.lab1.play_song(song)

  # If its a valid song (correct syntax), lets play it! 
  if waveform:
    print("Generated song", i)
    ipythondisplay.display(waveform)

五、完整代码

import tensorflow as tf
import mitdeeplearning as mdl
import numpy as np
import os
import time
import functools
from IPython import display as ipythondisplay
from tqdm import tqdm

# Download and import the MIT 6.S191 package
# !pip install mitdeeplearning
# !apt-get install abcmidi timidity > /dev/null 2>&1

# Check that we are using a GPU, if not switch runtimes
#   using Runtime > Change Runtime Type > GPU
assert len(tf.config.list_physical_devices('GPU')) > 0


# 下载数据集
songs = mdl.lab1.load_training_data()

# 打印其中一首歌曲以更详细地检查它!
example_song = songs[0]
print("\nExample song: ")
print(example_song)

# 将 ABC 符号转换为音频文件并收听
mdl.lab1.play_song(example_song)

# 将我们的歌曲字符串列表加入到包含所有歌曲的单个字符串中
songs_joined = "\n\n".join(songs)

# 查找连接字符串中的所有唯一字符
vocab = sorted(set(songs_joined))
print("There are", len(vocab), "unique characters in the dataset")

### 定义文本的数字表示 ###

# 创建从字符到唯一索引的映射。
# For example, to get the index of the character "d",
#   we can evaluate `char2idx["d"]`.
char2idx = {u:i for i, u in enumerate(vocab)}

# 创建从索引到字符的映射。 这是 char2idx 的倒数,它允许我们从唯一索引转换回我们词汇表中的字符。
idx2char = np.array(vocab)

print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

### Vectorize the songs string ###

'''TODO: Write a function to convert the all songs string to a vectorized
    (i.e., numeric) representation. Use the appropriate mapping
    above to convert from vocab characters to the corresponding indices.

  NOTE: the output of the `vectorize_string` function 
  should be a np.array with `N` elements, where `N` is
  the number of characters in the input string
'''
def vectorize_string(string):
  vectorized_output = np.array([char2idx[char] for char in string])
  return vectorized_output

# def vectorize_string(string):
  # TODO

vectorized_songs = vectorize_string(songs_joined)

print ('{} ---- characters mapped to int ----> {}'.format(repr(songs_joined[:10]), vectorized_songs[:10]))
# check that vectorized_songs is a numpy array
assert isinstance(vectorized_songs, np.ndarray), "returned result should be a numpy array"

### 批量定义以创建训练示例 ###

def get_batch(vectorized_songs, seq_length, batch_size):
  # the length of the vectorized songs string
  n = vectorized_songs.shape[0] - 1
  # randomly choose the starting indices for the examples in the training batch
  idx = np.random.choice(n-seq_length, batch_size)

  '''TODO: construct a list of input sequences for the training batch'''
  input_batch = [vectorized_songs[i : i+seq_length] for i in idx]
  # input_batch = # TODO
  '''TODO: construct a list of output sequences for the training batch'''
  output_batch = [vectorized_songs[i+1 : i+seq_length+1] for i in idx]
  # output_batch = # TODO

  # x_batch, y_batch provide the true inputs and targets for network training
  x_batch = np.reshape(input_batch, [batch_size, seq_length])
  y_batch = np.reshape(output_batch, [batch_size, seq_length])
  return x_batch, y_batch


#  执行一些简单的测试以确保您的批处理功能正常工作!
test_args = (vectorized_songs, 10, 2)
if not mdl.lab1.test_batch_func_types(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_shapes(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_next_step(get_batch, test_args):
   print("======\n[FAIL] could not pass tests")
else:
   print("======\n[PASS] passed all tests!")


x_batch, y_batch = get_batch(vectorized_songs, seq_length=5, batch_size=1)

for i, (input_idx, target_idx) in enumerate(zip(np.squeeze(x_batch), np.squeeze(y_batch))):
    print("Step {:3d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))


def LSTM(rnn_units):
  return tf.keras.layers.LSTM(
    rnn_units,
    return_sequences=True,
    recurrent_initializer='glorot_uniform',
    recurrent_activation='sigmoid',
    stateful=True,
  )


### Defining the RNN Model ###

'''TODO: Add LSTM and Dense layers to define the RNN model using the Sequential API.'''
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    # Layer 1: Embedding layer to transform indices into dense vectors
    #   of a fixed embedding size
    tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),

    # Layer 2: LSTM with `rnn_units` number of units.
    # TODO: Call the LSTM function defined above to add this layer.
    LSTM(rnn_units),
    # LSTM('''TODO'''),

    # Layer 3: Dense (fully-connected) layer that transforms the LSTM output
    #   into the vocabulary size.
    # TODO: Add the Dense layer.
    tf.keras.layers.Dense(vocab_size)
    # '''TODO: DENSE LAYER HERE'''
  ])

  return model

# Build a simple model with default hyperparameters. You will get the
#   chance to change these later.
model = build_model(len(vocab), embedding_dim=256, rnn_units=1024, batch_size=32)

model.summary()

x, y = get_batch(vectorized_songs, seq_length=100, batch_size=32)
pred = model(x)
print("Input shape:      ", x.shape, " # (batch_size, sequence_length)")
print("Prediction shape: ", pred.shape, "# (batch_size, sequence_length, vocab_size)")

sampled_indices = tf.random.categorical(pred[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
sampled_indices

print("Input: \n", repr("".join(idx2char[x[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices])))

### Defining the loss function ###

'''TODO: define the loss function to compute and return the loss between
    the true labels and predictions (logits). Set the argument from_logits=True.'''
def compute_loss(labels, logits):
  loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
  # loss = tf.keras.losses.sparse_categorical_crossentropy('''TODO''', '''TODO''', from_logits=True) # TODO
  return loss

'''TODO: compute the loss using the true next characters from the example batch 
    and the predictions from the untrained model several cells above'''
example_batch_loss = compute_loss(y, pred)
# example_batch_loss = compute_loss('''TODO''', '''TODO''') # TODO

print("Prediction shape: ", pred.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

### 超参数设置与优化 ###

# Optimization parameters:
num_training_iterations = 10000  # Increase this to train longer
batch_size = 4  # Experiment between 1 and 64
seq_length = 100  # Experiment between 50 and 500
learning_rate = 5e-3  # Experiment between 1e-5 and 1e-1

# Model parameters:
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024  # Experiment between 1 and 2048

# Checkpoint location:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "my_ckpt")

### Define optimizer and training operation ###

'''TODO:  使用`build_model`实例化一个新模型进行训练函数和上面创建的超参数。'''
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size)
# model = build_model('''TODO: arguments''')

'''TODO: instantiate an optimizer with its learning rate.
  Checkout the tensorflow website for a list of supported optimizers.
  https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/
  Try using the Adam optimizer to start.'''
optimizer = tf.keras.optimizers.Adam(learning_rate)


# optimizer = # TODO

@tf.function
def train_step(x, y):
    # Use tf.GradientTape()
    with tf.GradientTape() as tape:
        '''TODO: 将当前输入输入模型并生成预测'''
        y_hat = model(x)  # TODO
        # y_hat = model('''TODO''')

        '''TODO: 计算损失'''
        loss = compute_loss(y, y_hat)  # TODO
        # loss = compute_loss('''TODO''', '''TODO''')

    # Now, compute the gradients
    '''TODO: complete the function call for gradient computation. 
        Remember that we want the gradient of the loss with respect all 
        of the model parameters. 
        HINT: use `model.trainable_variables` to get a list of all model
        parameters.'''
    grads = tape.gradient(loss, model.trainable_variables)  # TODO
    # grads = tape.gradient('''TODO''', '''TODO''')

    # Apply the gradients to the optimizer so it can update the model accordingly
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss


##################
# Begin training!#
##################

history = []
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss')
if hasattr(tqdm, '_instances'): tqdm._instances.clear()  # clear if it exists

for iter in tqdm(range(num_training_iterations)):

    # Grab a batch and propagate it through the network
    x_batch, y_batch = get_batch(vectorized_songs, seq_length, batch_size)
    loss = train_step(x_batch, y_batch)

    # Update the progress bar
    history.append(loss.numpy().mean())
    plotter.plot(history)

    # Update the model with the changed weights!
    if iter % 100 == 0:
        model.save_weights(checkpoint_prefix)

# Save the trained model and the weights
model.save_weights(checkpoint_prefix)





####################### 使用 RNN 模型生成音乐
'''TODO: Rebuild the model using a batch_size=1'''
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1) # TODO
# model = build_model('''TODO''', '''TODO''', '''TODO''', batch_size=1)

# Restore the model weights for the last checkpoint after training
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

model.summary()


### Prediction of a generated song ###

def generate_text(model, start_string, generation_length=1000):
    # Evaluation step (generating ABC text using the learned RNN model)

    '''TODO: convert the start string to numbers (vectorize)'''
    input_eval = [char2idx[s] for s in start_string]  # TODO
    # input_eval = ['''TODO''']
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

    # Here batch size == 1
    model.reset_states()
    tqdm._instances.clear()

    for i in tqdm(range(generation_length)):
        '''TODO: evaluate the inputs and generate the next character predictions'''
        predictions = model(input_eval)
        # predictions = model('''TODO''')

        # Remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        '''TODO: use a multinomial distribution to sample'''
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        # predicted_id = tf.random.categorical('''TODO''', num_samples=1)[-1,0].numpy()

        # Pass the prediction along with the previous hidden state
        #   as the next inputs to the model
        input_eval = tf.expand_dims([predicted_id], 0)

        '''TODO: add the predicted character to the generated text!'''
        # Hint: consider what format the prediction is in vs. the output
        text_generated.append(idx2char[predicted_id])  # TODO
        # text_generated.append('''TODO''')

    return (start_string + ''.join(text_generated))

'''TODO: Use the model and the function defined above to generate ABC format text of length 1000!
    As you may notice, ABC files start with "X" - this may be a good start string.'''
generated_text = generate_text(model, start_string="X", generation_length=1000) # TODO
# generated_text = generate_text('''TODO''', start_string="X", generation_length=1000)

### Play back generated songs ###

generated_songs = mdl.lab1.extract_song_snippet(generated_text)

for i, song in enumerate(generated_songs):
  # Synthesize the waveform from a song
  waveform = mdl.lab1.play_song(song)

  # If its a valid song (correct syntax), lets play it!
  if waveform:
    print("Generated song", i)
    ipythondisplay.display(waveform)

猜你喜欢

转载自blog.csdn.net/bashendixie5/article/details/124216471
今日推荐