forward(is_train):
is_train was not related to memory saving, but will only affect some runtime behavior of operators. Currently it is mainly for things like dropout and batchnorm- During testing, dropout does not change the input.
random.seed(998) input_array = array([[3., 0.5, -0.5, 2., 7.], [2., -0.4, 7., 3., 0.2]]) a = symbol.Variable('a') dropout = symbol.Dropout(a, p = 0.2) executor = dropout.simple_bind(a = input_array.shape) ## If training executor.forward(is_train = True, a = input_array) executor.outputs [[ 3.75 0.625 -0. 2.5 8.75 ] [ 2.5 -0.5 8.75 3.75 0. ]] ## If testing executor.forward(is_train = False, a = input_array) executor.outputs [[ 3. 0.5 -0.5 2. 7. ] [ 2. -0.4 7. 3. 0.2 ]]
- During testing, dropout does not change the input.
save model
model.save_checkpoint('unet', num_epoch, save_optimizer_states=False)
load model
unet = mx.mod.Module.load('unet', 0) unet.bind(for_training=False, data_shapes=[['data', (6,1,64,64)]])
- bind(for_training):for_training会影响是否计算输入的梯度,而且forward()中的is_train=for_training。
-
- forward()中的is_train会影响batchnorm的计算
- is_train = true:在训练阶段. 不使用use global statistics. 即在训练阶段不使用use_global_stats, 否则网络不能收敛. 训练阶段基于mini-batch做BN处理, 针对当前 mini-batch 计算期望和方差.
- is_train=false:在测试阶段,采用的技术是使用moving average算法, 在为此在训练过程中需要记录每一个Batch的均值和方差, 以便训练完成之后按照下式计算整体的均值和方差。
if (ctx.is_train && !param_.use_global_stats) { Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s); Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s); CHECK(req[batchnorm_v1::kMean] == kNullOp || req[batchnorm_v1::kMean] == kWriteTo); CHECK(req[batchnorm_v1::kVar] == kNullOp || req[batchnorm_v1::kVar] == kWriteTo); // The first three steps must be enforced. mean = scale * sumall_except_dim<1>(data); var = scale * sumall_except_dim<1>(F<mshadow_op::square>( data - broadcast<1>(mean, data.shape_))); Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope, out.shape_) * (data - broadcast<1>(mean, data.shape_)) / F<mshadow_op::square_root>(broadcast<1>(var + param_.eps, data.shape_)) + broadcast<1>(bias, out.shape_)); } else { Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope / F<mshadow_op::square_root>(moving_var + param_.eps), data.shape_) * data + broadcast<1>(bias - (slope * moving_mean) / F<mshadow_op::square_root>(moving_var + param_.eps), data.shape_)); // Set mean and var tensors to their moving values Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s); Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s); mean = F<mshadow_op::identity>(moving_mean); var = F<mshadow_op::identity>(moving_var); } }
- forward()中的is_train会影响batchnorm的计算
小结
- bind(for_training) -> forward(is_train)->batchnorm()