Source code analysis - transfer learning Inception V3 network re-training to achieve picture classification

1 Introduction

In recent years, with the convolution neural network (CNN), represented by deep learning breakthrough in the field of image recognition, image recognition algorithms more and more are emerging. Last year, we tried initial success in the test image recognition application areas: converting website style Mansian, the wireless field model adaptation problems as "normal pictures and images of abnormal binary classification under a particular scene," and with the Goolge Inception V3 open source network transfer learning, retraining the picture classification model corresponds to the scene, the problem of picture accuracy rate of more than 95%.

Over the past year, we do pictures Intelligent Recognition of the main work includes:

  • Floor and tuning parameters of the model
  • The service model
  • Model optimization and services (including the introduction of a database connection pool, the introduction gunicorn container, Docker etc.)

This article is mainly for re-training model and analyze the source code to learn, to deepen the understanding of the training process model for subsequent targeted when the model training process to adjust.

Here to be a simple explanation for the migration study: image recognition often contains millions of parameters, from scratch requires a lot of training to lay the label of pictures, but also requires a lot of computing power (often hundreds of hours of GPU time). In this regard, migration is a shortcut to learning, it may have been trained in similar working model based on the continued training of the new model.

2. retrain.py source code analysis

We are currently using the image intelligent services for the study of the migration of code, the code is open source reference GitHub: tensorflow / Hub / image_retraining / retrain.py .

The following is a source of learning and interpretation:

2.1 executes the main entrance main:

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--image_dir',
      type=str,
      default='',
      help='Path to folders of labeled images.'
  )
  parser.add_argument(
      '--output_graph',
      type=str,
      default='/tmp/output_graph.pb',
      help='Where to save the trained graph.'
  )
  ......省略......
  parser.add_argument(
      '--logging_verbosity',
      type=str,
      default='INFO',
      choices=['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'],
      help='How much logging output should be produced.')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

We can see the program main entrance is mainly on the input parameters of the statement and resolution, passed in the actual implementation will be credited to FLAGS variable parameter, and then execute tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)begin formal training.

2.2 main (_) Method

def main(_):
  # Needed to make sure the logging output is visible.
  # See https://github.com/tensorflow/tensorflow/issues/3047
  
  ## 设置log级别
  logging_verbosity = logging_level_verbosity(FLAGS.logging_verbosity)
  tf.logging.set_verbosity(logging_verbosity)

  ## 判断image_dir参数是否传入,该参数表示用于训练的图片集路径
  if not FLAGS.image_dir:
    tf.logging.error('Must set flag --image_dir.')
    return -1

  # Prepare necessary directories that can be used during training
  ## 重建summaries_dir,并确保intermediate_output_graphs_dir存在
  prepare_file_system()

  # Look at the folder structure, and create lists of all the images.
  ## 根据输入的图片集路径、测试图片占比、验证图片占比来划分输入的图集,将图集划分为训练集、测试集、验证集
  image_lists = create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage,
                                   FLAGS.validation_percentage)
                   
  ## 根据image_dir下的子目录个数,判断要分类的数量。每个子目录为一个类别,每个类别会各自分为训练集、测试集、验证集。如果类别数为0或1,则返回错误,因为分类问题至少要有2个类。
  class_count = len(image_lists.keys())
  if class_count == 0:
    tf.logging.error('No valid folders of images found at ' + FLAGS.image_dir)
    return -1
  if class_count == 1:
    tf.logging.error('Only one valid folder of images found at ' +
                     FLAGS.image_dir +
                     ' - multiple classes are needed for classification.')
    return -1

  # See if the command-line flags mean we're applying any distortions.
  ## 根据传入的参数判断是否要对图片进行一些调整
  do_distort_images = should_distort_images(
      FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
      FLAGS.random_brightness)

  # Set up the pre-trained graph.
  ## 载入module,默认使用inception v3,可以用参数--tfhub_module调整为使用其他已训练的模型
  module_spec = hub.load_module_spec(FLAGS.tfhub_module)
  ## 创建模型图graph
  graph, bottleneck_tensor, resized_image_tensor, wants_quantization = (
      create_module_graph(module_spec))

  # Add the new layer that we'll be training.
  ## 调用add_final_retrain_ops方法获得训练步骤、交叉熵、瓶颈输入、真实的输入、最终的tensor
  with graph.as_default():
    (train_step, cross_entropy, bottleneck_input,
     ground_truth_input, final_tensor) = add_final_retrain_ops(
         class_count, FLAGS.final_tensor_name, bottleneck_tensor,
         wants_quantization, is_training=True)

  with tf.Session(graph=graph) as sess:
    # Initialize all weights: for the module to their pretrained values,
    # and for the newly added retraining layer to random initial values.
    ## 初始化变量
    init = tf.global_variables_initializer()
    sess.run(init)

    # Set up the image decoding sub-graph.
    ## 调用图片解码操作的函数获得输入的图片tensor和解码后的图片tensor
    jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(module_spec)
  
    if do_distort_images:
      # We will be applying distortions, so set up the operations we'll need.
      (distorted_jpeg_data_tensor,
       distorted_image_tensor) = add_input_distortions(
           FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
           FLAGS.random_brightness, module_spec)
    else:
      # We'll make sure we've calculated the 'bottleneck' image summaries and
      # cached them on disk.
      ## 创建各个image的bottlenecks,并缓存到磁盘disk
      cache_bottlenecks(sess, image_lists, FLAGS.image_dir,
                        FLAGS.bottleneck_dir, jpeg_data_tensor,
                        decoded_image_tensor, resized_image_tensor,
                        bottleneck_tensor, FLAGS.tfhub_module)

    # Create the operations we need to evaluate the accuracy of our new layer.
    ## 创建评估的operation
    evaluation_step, _ = add_evaluation_step(final_tensor, ground_truth_input)

    # Merge all the summaries and write them out to the summaries_dir
    ## 将summary merge并写到summaries_dir目录下
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
                                         sess.graph)

    validation_writer = tf.summary.FileWriter(
        FLAGS.summaries_dir + '/validation')

    # Create a train saver that is used to restore values into an eval graph
    # when exporting models.
    train_saver = tf.train.Saver()

    # Run the training for as many cycles as requested on the command line.
    ## 根据传入的迭代次数,开始训练
    for i in range(FLAGS.how_many_training_steps):
      # Get a batch of input bottleneck values, either calculated fresh every
      # time with distortions applied, or from the cache stored on disk.
      if do_distort_images:
        (train_bottlenecks,
         train_ground_truth) = get_random_distorted_bottlenecks(
             sess, image_lists, FLAGS.train_batch_size, 'training',
             FLAGS.image_dir, distorted_jpeg_data_tensor,
             distorted_image_tensor, resized_image_tensor, bottleneck_tensor)
      else:
        ## 获取用于training的图片bottlenecks值,默认train_batch_size=100,即每次迭代会批量取100张图片进行训练
        (train_bottlenecks,
         train_ground_truth, _) = get_random_cached_bottlenecks(
             sess, image_lists, FLAGS.train_batch_size, 'training',
             FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
             decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
             FLAGS.tfhub_module)
      # Feed the bottlenecks and ground truth into the graph, and run a training
      # step. Capture training summaries for TensorBoard with the `merged` op.
      ## 执行merge操作,并用feed_dict的内容填充placeholder
      train_summary, _ = sess.run(
          [merged, train_step],
          feed_dict={bottleneck_input: train_bottlenecks,
                     ground_truth_input: train_ground_truth})
      train_writer.add_summary(train_summary, i)

      # Every so often, print out how well the graph is training.
      ## 判断是否最后一步训练
      is_last_step = (i + 1 == FLAGS.how_many_training_steps)
    
      ## 默认eval_step_interval=10,即每训练10次或训练全部完成,打印一下当前的训练结果
      if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
      ## 打印训练精确度和交叉熵
        train_accuracy, cross_entropy_value = sess.run(
            [evaluation_step, cross_entropy],
            feed_dict={bottleneck_input: train_bottlenecks,
                       ground_truth_input: train_ground_truth})
        tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' %
                        (datetime.now(), i, train_accuracy * 100))
        tf.logging.info('%s: Step %d: Cross entropy = %f' %
                        (datetime.now(), i, cross_entropy_value))
        # TODO: Make this use an eval graph, to avoid quantization
        # moving averages being updated by the validation set, though in
        # practice this makes a negligable difference.
        ## 获取验证集的图片的bottleneck值,也是每批次取100
        validation_bottlenecks, validation_ground_truth, _ = (
            get_random_cached_bottlenecks(
                sess, image_lists, FLAGS.validation_batch_size, 'validation',
                FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
                decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
                FLAGS.tfhub_module))
        # Run a validation step and capture training summaries for TensorBoard
        # with the `merged` op.
        validation_summary, validation_accuracy = sess.run(
            [merged, evaluation_step],
            feed_dict={bottleneck_input: validation_bottlenecks,
                       ground_truth_input: validation_ground_truth})
        validation_writer.add_summary(validation_summary, i)
     
        ## 打印验证集的测试精确度和测试的图片数
        tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
                        (datetime.now(), i, validation_accuracy * 100,
                         len(validation_bottlenecks)))

      # Store intermediate results
      ## 存储瞬时结果
      intermediate_frequency = FLAGS.intermediate_store_frequency

      if (intermediate_frequency > 0 and (i % intermediate_frequency == 0)
          and i > 0):
        # If we want to do an intermediate save, save a checkpoint of the train
        # graph, to restore into the eval graph.
        train_saver.save(sess, CHECKPOINT_NAME)
        intermediate_file_name = (FLAGS.intermediate_output_graphs_dir +
                                  'intermediate_' + str(i) + '.pb')
        tf.logging.info('Save intermediate result to : ' +
                        intermediate_file_name)
        save_graph_to_file(intermediate_file_name, module_spec,
                           class_count)

    # After training is complete, force one last save of the train checkpoint.
    train_saver.save(sess, CHECKPOINT_NAME)

    # We've completed all our training, so run a final test evaluation on
    # some new images we haven't used before.
    ## 执行最终的评估
    run_final_eval(sess, module_spec, class_count, image_lists,
                   jpeg_data_tensor, decoded_image_tensor, resized_image_tensor,
                   bottleneck_tensor)

    # Write out the trained graph and labels with the weights stored as
    # constants.
    tf.logging.info('Save final result to : ' + FLAGS.output_graph)
    if wants_quantization:
      tf.logging.info('The model is instrumented for quantization with TF-Lite')
    save_graph_to_file(FLAGS.output_graph, module_spec, class_count)
    with tf.gfile.GFile(FLAGS.output_labels, 'w') as f:
      f.write('\n'.join(image_lists.keys()) + '\n')
   
    ## 保存训练的graph
    if FLAGS.saved_model_dir:
      export_model(module_spec, class_count, FLAGS.saved_model_dir)

The main method used has been explained in some detail in the above-mentioned Chinese Remarks code ( "##" at the beginning), which are the main steps:

  • Set log level
  • Ready workspace
  • Loading image_dir input from Photo Gallery, and create image_lists, the image_lists is a field, key for each category, value for the atlas of the corresponding category (includes training set, test set, validation set, the default is 0.8,0.1,0.1 division ratio )
  • ImageNet load on the network has been trained Inception V3 features tensor
  • After for each picture, the picture decoding operation calls to get the original picture decoding tensor and tensor
  • For each picture jpeg_data_tensor and decoded_image_tensor, create corresponding bottlenects (actually 1 * 2048-dimensional tensor), and cached to disk
  • Get the training step, cross entropy
  • Start iterative training
  • Every 10 iterations, the accuracy of printed training and cross-entropy, print verification test result set. Training and test sets are taken 100 chart by default
  • After training, using a test set for final assessment
  • Print and save results

2.3 Other methods

After analyzing the primary code execution path, the other interpretation methods below. Since the total length of the code is very limited space, the following contents in order of brief other methods.

2.3.1 create_image_lists

def create_image_lists(image_dir, testing_percentage, validation_percentage):
    ...... 省略......
    result[label_name] = {
        'dir': dir_name,
        'training': training_images,
        'testing': testing_images,
        'validation': validation_images,
    }
  return result

The division ratio of the portfolio image_dir address, testing_percentage testing_percentage and the format is similar to return as follows:

{
    'correct': {
        'dir': correct_image_dir,
        'training': correct_training_images,
        'testing': correct_testing_images,
        'validation': correct_validation_images
    },
    'error': {
        'dir': error_image_dir,
        'training': error_training_images,
        'testing': error_testing_images,
        'validation': error_validation_images
    }
}

Each training / testing validation corresponding to the value / is the image file_name list.

2.3.2 get_image_path

Gets the full path of the picture

2.3.3 get_bottleneck_path

Get a different category (training, testing, validation) of the bottleneck path

2.3.4 create_module_graph

According to a given model has been trained Hub Module, create graphical models

2.3.5 run_bottleneck_on_image

def run_bottleneck_on_image(sess, image_data, image_data_tensor,
                            decoded_image_tensor, resized_input_tensor,
                            bottleneck_tensor):
  """Runs inference on an image to extract the 'bottleneck' summary layer.
  Args:
    sess: Current active TensorFlow Session.
    image_data: String of raw JPEG data.
    image_data_tensor: Input data layer in the graph.
    decoded_image_tensor: Output of initial image resizing and preprocessing.
    resized_input_tensor: The input node of the recognition graph.
    bottleneck_tensor: Layer before the final softmax.
  Returns:
    Numpy array of bottleneck values.
  """
  # First decode the JPEG image, resize it, and rescale the pixel values.
  resized_input_values = sess.run(decoded_image_tensor,
                                  {image_data_tensor: image_data})
  # Then run it through the recognition network.
  bottleneck_values = sess.run(bottleneck_tensor,
                               {resized_input_tensor: resized_input_values})
  bottleneck_values = np.squeeze(bottleneck_values)
  return bottleneck_values

The tensor given after decoding an input picture, calculating bottleneck_values, squeeze and perform operations (delete entries single dimension, the shape is removed as a dimension of 1)

2.3.6 ensure_dir_exists

Make sure the directory exists: If the directory does not exist, create the directory

2.3.7 create_bottleneck_file

Tune run_bottleneck_on_image method of calculating the value bottleneck, and cache to disk file

2.3.8 get_or_create_bottleneck

Batch obtain a set of values ​​bottleneck pictures

2.3.9 cache_bottlenecks

Batch cache bottleneck

2.3.10 get_random_cached_bottlenecks

Get a group of random cache bottlenecks, as well as the corresponding real mark ground_truths filenames and file name

2.3.11 add_final_retrain_ops

def add_final_retrain_ops(class_count, final_tensor_name, bottleneck_tensor,
                          quantize_layer, is_training):
                          
  batch_size, bottleneck_tensor_size = bottleneck_tensor.get_shape().as_list()
  assert batch_size is None, 'We want to work with arbitrary batch size.'
  with tf.name_scope('input'):
    bottleneck_input = tf.placeholder_with_default(
        bottleneck_tensor,
        shape=[batch_size, bottleneck_tensor_size],
        name='BottleneckInputPlaceholder')

    ground_truth_input = tf.placeholder(
        tf.int64, [batch_size], name='GroundTruthInput')

  # Organizing the following ops so they are easier to see in TensorBoard.
  layer_name = 'final_retrain_ops'
  with tf.name_scope(layer_name):
    with tf.name_scope('weights'):
      initial_value = tf.truncated_normal(
          [bottleneck_tensor_size, class_count], stddev=0.001)
      layer_weights = tf.Variable(initial_value, name='final_weights')
      variable_summaries(layer_weights)

    with tf.name_scope('biases'):
      layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
      variable_summaries(layer_biases)

    with tf.name_scope('Wx_plus_b'):
      logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
      tf.summary.histogram('pre_activations', logits)

  final_tensor = tf.nn.softmax(logits, name=final_tensor_name)

  # The tf.contrib.quantize functions rewrite the graph in place for
  # quantization. The imported model graph has already been rewritten, so upon
  # calling these rewrites, only the newly added final layer will be
  # transformed.
  if quantize_layer:
    if is_training:
      tf.contrib.quantize.create_training_graph()
    else:
      tf.contrib.quantize.create_eval_graph()

  tf.summary.histogram('activations', final_tensor)

  # If this is an eval graph, we don't need to add loss ops or an optimizer.
  if not is_training:
    return None, None, bottleneck_input, ground_truth_input, final_tensor

  with tf.name_scope('cross_entropy'):
    cross_entropy_mean = tf.losses.sparse_softmax_cross_entropy(
        labels=ground_truth_input, logits=logits)

  tf.summary.scalar('cross_entropy', cross_entropy_mean)

  with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
    train_step = optimizer.minimize(cross_entropy_mean)

  return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
          final_tensor)

Softmax add a new layer fully connected layer (y = WX + b) at the end, for training and evaluation. Here the logistic model is the same, by way of gradient descent to minimize the cross entropy iterative training.

2.3.12 add_evaluation_step

def add_evaluation_step(result_tensor, ground_truth_tensor):
  with tf.name_scope('accuracy'):
    with tf.name_scope('correct_prediction'):
      ## 对每组向量按列找到最大值的index
      prediction = tf.argmax(result_tensor, 1)
      ## 将每组张量比较预测的结果和实际的结果的一致性,一致则为True,否则为False
      correct_prediction = tf.equal(prediction, ground_truth_tensor)
    with tf.name_scope('accuracy'):
      ## 将True或False转为float格式,并计算平均值
      evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  tf.summary.scalar('accuracy', evaluation_step)
  return evaluation_step, prediction

See comments above code, and returns the final accuracy of the predicted value list.

2.3.13 run_final_eval

Perform a final evaluation, using test set to evaluate the results. If you pass parameters print_misclassified_test_images, it will print the names and assess the results of the identification error of the picture.

2.3.14 save_graph_to_file

The graph is saved to a file

2.3.15 prepare_file_system

Ready workspace

2.3.16 add_jpeg_decoding

The input image is parsed as tensor, and decodes

2.3.17 export_model

Output model

Guess you like

Origin www.cnblogs.com/znicy/p/10937111.html
Recommended