Model-Agnostic Meta-Learning (MAML) source code interpretation

Recently I have read the MAML source code for a long time, and also learned TF. I want to write an article to summarize the MAML source code and unveil the mystery of MAML.

paper

Insert picture description here

Summary

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two fewshot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

Summary: MAML finds a good initial parameter instead of 0, which greatly reduces the training time and the number of samples.

algorithm

Insert picture description here
Insert picture description here
As shown in the figure, we have the initial parameter θ, there are 3 tasks, and each task has the best parameter θ*. At this time, θ can have 3 gradient descent directions, but we did not choose gradient descent, but took a step in the direction shared by these 3 points. In this way, the newly obtained θ only takes a few steps to reach the θ* of other tasks.

Specifically: The sixth step of the algorithm calculates the θ* of each task (not θ*, because only a few steps have not been completed), assuming that θ has reached this step, it becomes θi*, and then find this point The gradient, add up and average, get a neutral direction, go in this direction, that is, update the original θ in step 8, and get a meta-learner

Source code

The source code I am looking for is not the source code given in this paper, but a simplified version of the source code implemented by "dragen1860" according to the official source code, the link , you can see the highlight of others:

  • adopted from cbfin’s official implementation with equivalent performance on mini-imagenet
  • clean, tiny code style and very easy-to-follow from comments almost every lines
  • faster and trivial improvements, eg. 0.335s per epoch comparing with 0.563s per epoch, saving up to 3.8 hours for total 60,000 training process

File structure

Insert picture description here

Algorithm flow of meta-learn

  1. Go in from the main function entry
  2. Setting parameters
  • nway: 5, classification number, such as cat, dog, horse...
  • kshot: 1, the number of samples
  • kquery : 15 ,?
  • meta_batchsz: 4, the number of batches in meta-learning, that is, the number of tasks
  • K: 5, in order to find the best θ* for each task, MAML can perform K gradient descent, not a fixed time
  1. Generate data (tensor)

Let’s review support set and query set first: each task is a task of traditional machine learning, including training set and test set, but it is easy to confuse, we will not call it so, called support set and query set (these two How big is the set setting?), then 4 tasks are used as meta-train, called train set, and 4 tasks are used as meta-test, called test set

Insert picture description here

The following are the support set and query set of the 4 tasks in the meta-train phase

# image_tensor: [4, 80, 84*84*3]
support_x = tf.slice(image_tensor, [0, 0, 0], [-1,  nway *  kshot, -1], name='support_x')
query_x = tf.slice(image_tensor, [0,  nway *  kshot, 0], [-1, -1, -1], name='query_x')
support_y = tf.slice(label_tensor, [0, 0, 0], [-1,  nway *  kshot, -1], name='support_y')
query_y = tf.slice(label_tensor, [0,  nway *  kshot, 0], [-1, -1, -1], name='query_y')
# support_x : [4, 1*5, 84*84*3]
# query_x   : [4, 15*5, 84*84*3]
# support_y : [4, 5, 5]
# query_y   : [4, 15*5, 5]

Insert picture description here

The same method to construct two sets of the meta-test stage

# construct test tensors.
image_tensor, label_tensor = db.make_data_tensor(training=False)
support_x_test = tf.slice(image_tensor, [0, 0, 0], [-1,  nway *  kshot, -1], name='support_x_test')
query_x_test = tf.slice(image_tensor, [0,  nway *  kshot, 0], [-1, -1, -1],  name='query_x_test')
support_y_test = tf.slice(label_tensor, [0, 0, 0], [-1,  nway *  kshot, -1],  name='support_y_test')
query_y_test = tf.slice(label_tensor, [0,  nway *  kshot, 0], [-1, -1, -1],  name='query_y_test')

The final result is that only the task content of a train set is drawn. The actual support_x contains 4 tasks. In the 5way, the support set must be the same as the query set:
Insert picture description here

  1. Build the MAML model, call the build method (the following are all in the build method, jump out of step 8)
#这里的参数如84用来做tensor的reshape,为什么是这个数我也不知道
model = MAML(84, 3, 5)
model.build(support_x, support_y, query_x, query_y, K, meta_batchsz, mode='train')
  1. Then we enter the build method: for each task, call the meta_task algorithm
result = tf.map_fn(meta_task, elems=(support_xb, support_yb, query_xb, query_yb),dtype=out_dtype, parallel_iterations=meta_batchsz, name='map_fn')

This meta_task algorithm corresponds to the section marked in red in the algorithm, that is, to find the best parameter θi* for each task:

Insert picture description here

  1. Let's take a look at the details of the meta_task algorithm, which is actually the process of normal forward derivation-back propagation-updating parameters:

Insert picture description here
We use supportx to add the weight to calculate the gradient, gradient descent to get the fast weight, use this fast weight to test on the query set to get the query loss, and then iteratively update the fast weight K times

Note that I only wrote one-step gradient descent in my picture, and there are K-step gradient descent in the actual code. Every time the fast weight is obtained in the support set, the loss is calculated on the query set, and the query loss is obtained

  1. Out of the loop, perform a second gradient descent

We average the loss of so many tasks on the quert set, and calculate the gradient for query loss

# meta-train optim
optimizer = tf.train.AdamOptimizer(self.meta_lr, name='meta_optim')
# meta-train gradients, query_losses[-1] is the accumulated loss across over tasks.
gvs = optimizer.compute_gradients(self.query_losses[-1])

Then update the gradient of the real parameter θ

# meta-train grads clipping
gvs = [(tf.clip_by_norm(grad, 10), var) for grad, var in gvs]
# update theta
self.meta_op = optimizer.apply_gradients(gvs)

This step corresponds to the red section of the algorithm:
Insert picture description here

  1. Jump out of the build,
    that is, the above is all one line of code to do:
if  training:
		model.build(support_x, support_y, query_x, query_y, K, meta_batchsz, mode='train')
		model.build(support_x_test, support_y_test, query_x_test, query_y_test, K, meta_batchsz, mode='eval')
	else:
		model.build(support_x_test, support_y_test, query_x_test, query_y_test, K + 5, meta_batchsz, mode='test')

Next, enter the train() method, that is, 600,000 iterations. Each iteration completes the following functions.
I have not understood the result array here.

# this is the main op
		ops = [model.meta_op]

		# add summary and print op
		if iteration % 200 == 0:
			ops.extend([model.summ_op,
			            model.query_losses[0], model.query_losses[-1],
			            model.query_accs[0], model.query_accs[-1]])

		# run all ops
		result = sess.run(ops)

		# summary
		if iteration % 200 == 0:
			# summ_op
			# tb.add_summary(result[1], iteration)
			# query_losses[0]
			prelosses.append(result[2])
			# query_losses[-1]
			postlosses.append(result[3])
			# query_accs[0]
			preaccs.append(result[4])
			# query_accs[-1]
			postaccs.append(result[5])

			print(iteration, '\tloss:', np.mean(prelosses), '=>', np.mean(postlosses),
			      '\t\tacc:', np.mean(preaccs), '=>', np.mean(postaccs))
			prelosses, postlosses, preaccs, postaccs = [], [], [], []

		# evaluation
		if iteration % 2000 == 0:
			# DO NOT write as a = b = [], in that case a=b
			# DO NOT use train variable as we have train func already.
			acc1s, acc2s = [], []
			# sample 20 times to get more accurate statistics.
			for _ in range(200):
				acc1, acc2 = sess.run([model.test_query_accs[0],
				                        model.test_query_accs[-1]])
				acc1s.append(acc1)
				acc2s.append(acc2)

			acc = np.mean(acc2s)
			print('>>>>\t\tValidation accs: ', np.mean(acc1s), acc, 'best:', best_acc, '\t\t<<<<')

			if acc - best_acc > 0.05 or acc > 0.4:
				saver.save(sess, os.path.join('ckpt', 'mini.mdl'))
				best_acc = acc
				print('saved into ckpt:', acc)

saver.save(sess, os.path.join('ckpt', 'mini.mdl'))After saving the model parameters, we get the meta-learner, then we enter the test step to see if this learner is really as the author's abstract said, can complete model training in a few steps of gradient descent + a small amount of data?

Algorithm flow of meta-test

Train θ on support for K+5 times to get θ*. Verify that θ* is good on query, because the classes in support set and query set are the same.

ops = [model.test_support_acc]
ops.extend(model.test_query_accs)
result = sess.run(ops)
test_accs.append(result)

Code usage

The readme is very clear, but I am under the win10 system, which is a bit different. The specific usage is as follows:

  1. Download the imagenet picture collection from the link given by the author, about hundreds of thousands of pictures, 3G
  2. Modify the proc_images.py file and change the python linux command to windows
path = 'C:/Users/Administrator/Desktop/MAML-TensorFlow-master/miniimagenet/'
# Put in correct directory
for datatype in ['train', 'val', 'test']:
    os.system('mkdir ' + datatype)

    with open(datatype + '.csv', 'r') as f:
        reader = csv.reader(f, delimiter=',')
        last_label = ''
        for i, row in enumerate(reader):
            if i == 0:  # skip the headers
                continue
            label = row[1]
            image_name = row[0]
            if label != last_label:
                cur_dir = ''+datatype + '/' + label + '/'
                if not os.path.exists(path + cur_dir):
                    os.mkdir(path + cur_dir)
                last_label = label
            print( path+image_name + ' ' + path+cur_dir)
            #os.system('cpoy images/' + image_name + ' ' + cur_dir)
            shutil.move(path+'images/'+image_name, path+cur_dir)
  1. Configure environment with aniconda: python3.6 TF1.15.0, conda activate is activated
  2. In the new environment, python main.py is fine
  3. Result: The speed is extremely slow. The results of using your computer's cpu for several hours are as follows:

Insert picture description here

It can be seen that the accuracy rate is gradually improving.

2020.8.27todo:

  • In the next step, after the results are run, you need to see the correspondence between the experimental part of the paper and this result.

  • And draw the picture on the test set.

  • In addition, analyze the contents of these two functions test val.

Guess you like

Origin blog.csdn.net/Protocols7/article/details/108250998