关于mxnet的for_training 和is_train

  • forward(is_train):
    is_train was not related to memory saving, but will only affect some runtime behavior of operators. Currently it is mainly for things like dropout and batchnorm

    • During testing, dropout does not change the input.
      random.seed(998)
      input_array = array([[3., 0.5,  -0.5,  2., 7.],
                          [2., -0.4,   7.,  3., 0.2]])
      a = symbol.Variable('a')
      dropout = symbol.Dropout(a, p = 0.2)
      executor = dropout.simple_bind(a = input_array.shape)
      
      
      ## If training
      
      executor.forward(is_train = True, a = input_array)
      executor.outputs
      [[ 3.75   0.625 -0.     2.5    8.75 ]
       [ 2.5   -0.5    8.75   3.75   0.   ]]
      
      
      ## If testing
      
      executor.forward(is_train = False, a = input_array)
      executor.outputs
      [[ 3.     0.5   -0.5    2.     7.   ]
       [ 2.    -0.4    7.     3.     0.2  ]]
  • save model

    model.save_checkpoint('unet', num_epoch, save_optimizer_states=False)
  • load model

    unet = mx.mod.Module.load('unet', 0)
    unet.bind(for_training=False, data_shapes=[['data', (6,1,64,64)]])
    • bind(for_training):for_training会影响是否计算输入的梯度,而且forward()中的is_train=for_training。
  • BatchNorm

    • forward()中的is_train会影响batchnorm的计算
      • is_train = true:在训练阶段. 不使用use global statistics. 即在训练阶段不使用use_global_stats, 否则网络不能收敛. 训练阶段基于mini-batch做BN处理, 针对当前 mini-batch 计算期望和方差.
      • is_train=false:在测试阶段,采用的技术是使用moving average算法, 在为此在训练过程中需要记录每一个Batch的均值和方差, 以便训练完成之后按照下式计算整体的均值和方差。
    if (ctx.is_train && !param_.use_global_stats) {
          Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s);
          Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s);
          CHECK(req[batchnorm_v1::kMean] == kNullOp || req[batchnorm_v1::kMean] == kWriteTo);
          CHECK(req[batchnorm_v1::kVar] == kNullOp || req[batchnorm_v1::kVar] == kWriteTo);
          // The first three steps must be enforced.
          mean = scale * sumall_except_dim<1>(data);
          var = scale * sumall_except_dim<1>(F<mshadow_op::square>(
              data - broadcast<1>(mean, data.shape_)));
          Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope, out.shape_) *
                 (data - broadcast<1>(mean, data.shape_)) /
                 F<mshadow_op::square_root>(broadcast<1>(var + param_.eps, data.shape_)) +
                 broadcast<1>(bias, out.shape_));
        } else {
          Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope /
                                              F<mshadow_op::square_root>(moving_var + param_.eps),
                                              data.shape_) * data +
                 broadcast<1>(bias - (slope * moving_mean) /
                              F<mshadow_op::square_root>(moving_var + param_.eps), data.shape_));
          // Set mean and var tensors to their moving values
          Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s);
          Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s);
          mean = F<mshadow_op::identity>(moving_mean);
          var  = F<mshadow_op::identity>(moving_var);
        }
      }

小结

  • bind(for_training) -> forward(is_train)->batchnorm()

参考

猜你喜欢

转载自blog.csdn.net/sda42342342423/article/details/81331603