caffe2多GPU训练模型，保存模型及加载保存的模型

本文主要参考caffe2提供的Multi-GPU_Training.ipynb例子进行模型训练
1 关于生产LMDB数据集，根据caffe2提供的lmdb_create_example.py生成的lmdb，在使用中会发现，得到的数据都是0，参考issue里面提供的的命令行
caffe2/build/bin/make_image_db -color -db lmdb -input_folder /root/notebook_caffe2/data/86stars/0730/test/ -list_file /root/notebook_caffe2/data/86stars/0730/test.txt -num_threads 10 -output_db_name /root/notebook_caffe2/data/86stars/0730/86stars_test_lmdb -raw -scale 256 -shuffle
可以生成可供使用的lmdb，但是必须要在brew.image_input中，将use_caffe_datum设置为False
2 多GPU训练过程，对于BN层，特别需要注意。训练过程中，在brew.spatial_bn中，需要把is_test设置为False，而在跑前向预测过程中，一定要记得设置为Ture，不然加载训练好的模型，会发现不管输入任何图片结果都很相似（这个问题困扰我很久，也是犹豫自己粗心，但是现在还未发现根本原因）
3 保存模型，对于BN层，在model.params中不包含bn层中_ri和_rim参数，所以在保存模型中，需要把这两个参数添加进去，当然后续如果跑前向预测模型，从data层输入，而不是从reader输入的话，还需要额外添加一个data。因为是多gpu训练，模型中的参数前缀会有相应的gpu的id，所以需要自己选择一个保存，保存代码如下

def save_model(iter,model,gpu_num):
    INIT_NET = '/root/notebook_caffe2/model/0816_86stars/'+'86stars_iter{}_gpu{}_init_net.pb'.format(iter,gpu_num)
    gpu_str = 'gpu_{}/'.format(gpu_num)
    extra_params = []
    for blob in workspace.Blobs():
        name = str(blob)
        if name.endswith('_rm') or name.endswith('_riv'):
            model.params.append(name)init_net = caffe2_pb2.NetDef()
    init_net.op.extend([core.CreateOperator("ConstantFill", [],
                                            ["data"], shape=(3, 120, 120))])
    for param in model.params:
        if gpu_str in str(param):
            blob = workspace.FetchBlob(param)
            shape = blob.shape
            tmp_param = str(param).replace(gpu_str, '')
            op = core.CreateOperator("GivenTensorFill", [], [tmp_param],
                                     arg=[utils.MakeArgument("shape", shape),
                                          utils.MakeArgument("values", blob)])
            init_net.op.extend([op])

    with open(INIT_NET, 'wb') as f:
        f.write(init_net.SerializeToString())

4 在保存模型中，只保存了init_net.pb 保存了所有参数的数值，未保存predict_net.pb里面，模型结构参数，是因为在加载的时候，出现UnicodeEncodeError，暂时还未解决。所以在加载模型的时候，又重新写了个模型，特别要注意该模型，BN层设置，把is_test设置为True只保留了data层到softmax结果。

caffe2多GPU训练模型，保存模型及加载保存的模型

猜你喜欢