将Keras训练的模型部署到C++平台上的可行方案

一. 背景：

本人这几天由于公司要求将Deep learning的项目迁移到C++的平台，以便作为一个子模块嵌入到整个公司的C++Project当中。在算法研究阶段，想必很多人会喜欢keras，因为keras的代码简介性，十分有利于Deeplearning的编程，十分容易上手，tensorflow虽然目前google创造了广大的生态，支持各种语言的接口，但是本人至今还是不习惯用tensorflow这种先定义图，再执行运算的方式，不知道大家是否有同感。我之前的研究内容在Python端已经通过keras训练出来了可用的model,但是keras并没有提供C++的API，所以似乎我得用另外的框架，如caffe2,tensorflow,caffe,pytorch等去重新train我的model，再用他们的C++API去在C++平台上运行。通过尝试caffe2,发现这玩意儿第一来说，环境配置过程虽然简单，但是很容易出各种error,，配置就是一个坑。第二，其官方的API写的也太水了吧，跟没有一样，只能按照上面的几个demo去跑，学起来十分费劲。caffe2和tensorflow在python端的方式是差不多的，都是先定义图，再运行，所以从训练角度，这两者都让我很不舒服。之前用caffe还算熟练，但是若要用caffe，则需要重新training，所以我心心念念地想要如果keras训练好的Model能直接来用，该多好，所以倒腾几天，终于实现了这个目标，下面话不多说，给大家讲解整个过程

Plus，相关的代码在我百度云上可以找到，链接路径如下：

链接: https://pan.baidu.com/s/1FHRTC1NZRSyaMuVBMGUihA 密码: h7x8

二. Pipline：

整个工作流：keras训练->"my_model.h5"->转换到pb类型->"my_model.pb"->C++端用tensorflowAPI调用并运行模型

三. 必备环境配置：

本机配置：GTX1080，Ubuntu16.04

（1）一个配有tensorflow和keras的python环境，建议用anaconda去创建

conda create -n keras python=3.6
source activate keras
conda install tensorflow-gpu==1.8
pip install keras==2.1

（2） CUDA9.0, cudnn7.0.5（安装CUDA和cudnn的方法请自行查找，不在本文范围内）

（3）OpenCV3.4（安装OpenCV的方法请自行查找，不在本文范围内）

（5）编译tensorflow C++接口，这在接下来会介绍

四.编译安装tensorflow的C++接口，本文编译的tensorflow C++是1.8版的

1.配置C++版tensorflow使用时的第三方依赖

（1）Protobuf！！！！！这玩意儿是重中之重，它的版本与tensorflow的版本密切相关，它的版本错了就无法work，我用的3.5.0

先从以下网址下载protobuf-cpp-3.5.0.tar.gz
https://github.com/google/protobuf/releases
再解压出来，获得一个protobuf-3.5.0的文件夹
cd prtobuf-3.5.0
./configure
sudo make -j8
make check -j8
sudo make install
sudo ldconfig
以上步骤可以完成Protubuf的源码的编译和安装
如果遇到什么问题，建议去看Protobuf的官方的编译安装指南：
https://github.com/google/protobuf/blob/master/src/README.md

（2）Eigen,这是一个C++端的矩阵运算库,这个库只要下载压缩包，解压到某个自己知道的路径下即可

先下载eigen的压缩包
wget http://bitbucket.org/eigen/eigen/get/3.3.4.tar.bz2
下载之后解压,重新命名为eigen3，放到某个路径下
我存放的路径是,~/tools/tf-C/

2.编译安装Tensorflow

（1）下载安装编译工具bazel

先下载Bazel的安装包
https://github.com/bazelbuild/bazel/releases，我下载的是bazel-0.10.1-installer-linux-x86_64.sh
然后执行安装
./bazel-0.10.1-installer-linux-x86_64.sh

（2）编译安装Tensorflow，我的源码路径是~/tensorflow

# 先下载tensorflow源码
git clone –recursive https://github.com/tensorflow/tensorflow

# 进入tensorflow文件夹
cd tensorflow

# 切换到1.8版本：
git checkout r1.8

# 执行configure
sudo ./configure
这一步需要你指定python路径，需要有各种y/N的选择
建议如下：

python路径用之前创建的keras的路径：/home/xxx/anaconda2/envs/keras/bin/python

其他的第一个y/N选择y，后面的都是N

cuda原则y,然后会自动搜索cudnn版本

nccl选择默认的1.3，

后面的不是选择N就是默认

详见Tensorflow官网的提示：
https://www.tensorflow.org/install/install_sources

# 使用bazel去编译,--config=monolithic是为了解决与OpenCV的冲突问题
sudo bazel build --config=opt --config=cuda --config=monolithic //tensorflow:libtensorflow_cc.so

....漫长的等待编译，大约20分钟
# 最后显示类似如下的信息，说明编译成功了：
....
Target //tensorflow:libtensorflow_cc.so up-to-date:
bazel-bin/tensorflow/libtensorflow_cc.so
INFO: Elapsed time: 1192.883s, Critical Path: 174.02s
INFO: 654 processes: 654 local.
INFO: Build completed successfully, 656 total actions

# 再把必要.h头文件以及编译出来.so的动态链接库文件复制到指定的一些路径下：
sudo mkdir /usr/local/include/tf
sudo cp -r bazel-genfiles/ /usr/local/include/tf/
sudo cp -r tensorflow /usr/local/include/tf/
sudo cp -r third_party /usr/local/include/tf/
sudo cp bazel-bin/tensorflow/libtensorflow_cc.so /usr/local/lib/
sudo cp bazel-bin/tensorflow/libtensorflow_framework.so /usr/local/lib

OK到此为止，tensorflow C++的接口已经搞定！

五.整个pipline的演示

1.python端用keras训练一个手写数字识别的mnist的demo，代码如下，训练完会产生一个my_model_ep20.h5的模型文件

from tensorflow.examples.tutorials.mnist import *
from keras.models import *
from keras.layers import *
import numpy as np

# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# 加载数据集
mnist=input_data.read_data_sets("MNIST_data/",one_hot=True)
train_X=mnist.train.images
train_Y=mnist.train.labels
test_X=mnist.test.images
test_Y=mnist.test.labels

train_X=train_X.reshape((55000,28,28,1))
test_X=test_X.reshape((test_X.shape[0],28,28,1))

print("type of train_X:",type(train_X))
print("size of train_X:",np.shape(train_X))
print("train_X:",train_X)

print("type of train_Y:",type(train_Y))
print("size of train_Y:",np.shape(train_Y))
print("train_Y:",train_Y)

print("num of test:",test_X.shape[0])


# 配置模型结构
model=Sequential()

model.add(Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1),padding="same"))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.5))


model.add(Conv2D(64, (3, 3), activation='relu',padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(128, (3, 3), activation='relu',padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(625,activation="relu"))
model.add(Dropout(0.5))

model.add(Dense(10,activation='softmax'))

model.compile(loss='categorical_crossentropy',optimizer='adadelta',metrics=['accuracy'])

# 训练模型
epochs=20
model.fit(train_X, train_Y, batch_size=32, epochs=epochs)

# 用测试集去评估模型的准确度
accuracy=model.evaluate(test_X,test_Y,batch_size=20)
print('\nTest accuracy:',accuracy[1])

save_model(model,'my_model_ep{}.h5'.format(epochs))

2.将my_model_ep20.h5的模型转化为my_model_ep20.pb的模型，用的脚本为h5_to_pb.py

from keras.models import load_model
import tensorflow as tf
from keras import backend as K
from tensorflow.python.framework import graph_io

def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    from tensorflow.python.framework.graph_util import convert_variables_to_constants
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [v.op.name for v in tf.global_variables()]
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
                node.device = ""
        frozen_graph = convert_variables_to_constants(session, input_graph_def,
                                                      output_names, freeze_var_names)
        return frozen_graph


"""----------------------------------配置路径-----------------------------------"""
epochs=20
h5_model_path='./my_model_ep{}.h5'.format(epochs)
output_path='.'
pb_model_name='my_model_ep{}.pb'.format(epochs)


"""----------------------------------导入keras模型------------------------------"""
K.set_learning_phase(0)
net_model = load_model(h5_model_path)

print('input is :', net_model.input.name)
print ('output is:', net_model.output.name)

"""----------------------------------保存为.pb格式------------------------------"""
sess = K.get_session()
frozen_graph = freeze_session(K.get_session(), output_names=[net_model.output.op.name])
graph_io.write_graph(frozen_graph, output_path, pb_model_name, as_text=False)

3.测试模型类型转换后，是否测试效果一致，如果不一致，那就说明转换失败

测试my_model_ep20.h5模型：写了一个load_h5_test.py

import os
import cv2
import numpy as np
from keras.models import load_model

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

"""---------载入已经训练好的模型---------"""
new_model = load_model('my_model_ep20.h5')

"""---------用opencv载入一张待测图片-----"""
# 载入图片
src = cv2.imread('Pictures/6.png')
cv2.imshow("test picture", src)

# 将图片转化为28*28的灰度图
src = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
dst = cv2.resize(src, (28, 28))
dst=dst.astype(np.float32)

# 将灰度图转化为1*784的能够输入的网络的数组
picture=1-dst/255
picture=np.reshape(picture,(1,28,28,1))

# 用模型进行预测
y = new_model.predict(picture)
print("softmax:")
for i,prob in enumerate(y[0]):
    print("class{},Prob:{}".format(i,prob))
result = np.argmax(y)
print("你写的数字是：", result)
print("对应的概率是：",np.max(y[0]))
cv2.waitKey(20170731)

效果如下：

测试my_model_ep20.pb模型，写了一个load_pb_test.py:

import tensorflow as tf
import numpy as np
import cv2

"""-----------------------------------------------定义识别函数-----------------------------------------"""
def recognize(jpg_path, pb_file_path):
    with tf.Graph().as_default():
        output_graph_def = tf.GraphDef()

        # 打开.pb模型
        with open(pb_file_path, "rb") as f:
            output_graph_def.ParseFromString(f.read())
            tensors = tf.import_graph_def(output_graph_def, name="")
            print("tensors:",tensors)

        # 在一个session中去run一个前向
        with tf.Session() as sess:
            init = tf.global_variables_initializer()
            sess.run(init)

            op = sess.graph.get_operations()

            # 打印图中有的操作
            for i,m in enumerate(op):
                print('op{}:'.format(i),m.values())

            input_x = sess.graph.get_tensor_by_name("conv2d_1_input:0")  # 具体名称看上一段代码的input.name
            print("input_X:",input_x)

            out_softmax = sess.graph.get_tensor_by_name("dense_2/Softmax:0")  # 具体名称看上一段代码的output.name
            print("Output:",out_softmax)

            # 读入图片
            img = cv2.imread(jpg_path, 0)
            img=cv2.resize(img,(28,28))
            img=img.astype(np.float32)
            img=1-img/255;
            # img=np.reshape(img,(1,28,28,1))
            print("img data type:",img.dtype)

            # 显示图片内容
            for row in range(28):
                for col in range(28):
                    if col!=27:
                        print(img[row][col],' ',end='')
                    else:
                        print(img[row][col])

            img_out_softmax = sess.run(out_softmax,
                                       feed_dict={input_x: np.reshape(img,(1,28,28,1))})

            print("img_out_softmax:", img_out_softmax)
            for i,prob in enumerate(img_out_softmax[0]):
                print('class {} prob:{}'.format(i,prob))
            prediction_labels = np.argmax(img_out_softmax, axis=1)
            print("Final class if:",prediction_labels)
            print("prob of label:",img_out_softmax[0,prediction_labels])


pb_path = './my_model_ep20.pb'
img = 'Pictures/6.png'
recognize(img, pb_path)

效果如下：与上面的h5模型的结果基本一致，说明转换没问题

4.在C++中调用my_model_ep20.pb模型，写了一个hello.cpp，代码如下：

#include <fstream>
#include <utility>
#include <Eigen/Core>
#include <Eigen/Dense>
#include <iostream>

#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
#include "tensorflow/cc/ops/standard_ops.h"

#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"

#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"

#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/stringprintf.h"

#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"

#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"

#include "opencv2/opencv.hpp"

using namespace tensorflow::ops;
using namespace tensorflow;
using namespace std;
using namespace cv;
using tensorflow::Flag;
using tensorflow::Tensor;
using tensorflow::Status;
using tensorflow::string;
using tensorflow::int32 ;

// 定义一个函数讲OpenCV的Mat数据转化为tensor，python里面只要对cv2.read读进来的矩阵进行np.reshape之后，
// 数据类型就成了一个tensor，即tensor与矩阵一样，然后就可以输入到网络的入口了，但是C++版本，我们网络开放的入口
// 也需要将输入图片转化成一个tensor，所以如果用OpenCV读取图片的话，就是一个Mat，然后就要考虑怎么将Mat转化为
// Tensor了
void CVMat_to_Tensor(Mat img,Tensor* output_tensor,int input_rows,int input_cols)
{
    //imshow("input image",img);
    //图像进行resize处理
    resize(img,img,cv::Size(input_cols,input_rows));
    //imshow("resized image",img);

    //归一化
    img.convertTo(img,CV_32FC1);
    img=img/255;

    //创建一个指向tensor的内容的指针
    float *p = output_tensor->flat<float>().data();

    //创建一个Mat，与tensor的指针绑定,改变这个Mat的值，就相当于改变tensor的值
    cv::Mat tempMat(input_rows, input_cols, CV_32FC1, p);
    img.convertTo(tempMat,CV_32FC1);

//    waitKey(0);

}

int main(int argc, char** argv )
{
    /*--------------------------------配置关键信息------------------------------*/
    string model_path="../my_model_ep20.pb";
    string image_path="../test_images/6.png";
    int input_height =28;
    int input_width=28;
    string input_tensor_name="conv2d_1_input";
    string output_tensor_name="dense_2/Softmax";

    /*--------------------------------创建session------------------------------*/
    Session* session;
    Status status = NewSession(SessionOptions(), &session);//创建新会话Session

    /*--------------------------------从pb文件中读取模型--------------------------------*/
    GraphDef graphdef; //Graph Definition for current model

    Status status_load = ReadBinaryProto(Env::Default(), model_path, &graphdef); //从pb文件中读取图模型;
    if (!status_load.ok()) {
        cout << "ERROR: Loading model failed..." << model_path << std::endl;
        cout << status_load.ToString() << "\n";
        return -1;
    }
    Status status_create = session->Create(graphdef); //将模型导入会话Session中;
    if (!status_create.ok()) {
        cout << "ERROR: Creating graph in session failed..." << status_create.ToString() << std::endl;
        return -1;
    }
    cout << "<----Successfully created session and load graph.------->"<< endl;

    /*---------------------------------载入测试图片-------------------------------------*/
    cout<<endl<<"<------------loading test_image-------------->"<<endl;
    Mat img=imread(image_path,0);
    if(img.empty())
    {
        cout<<"can't open the image!!!!!!!"<<endl;
        return -1;
    }

    //创建一个tensor作为输入网络的接口
    Tensor resized_tensor(DT_FLOAT, TensorShape({1,input_height,input_width,1}));

    //将Opencv的Mat格式的图片存入tensor
    CVMat_to_Tensor(img,&resized_tensor,input_height,input_width);

    cout << resized_tensor.DebugString()<<endl;

    /*-----------------------------------用网络进行测试-----------------------------------------*/
    cout<<endl<<"<-------------Running the model with test_image--------------->"<<endl;
    //前向运行，输出结果一定是一个tensor的vector
    vector<tensorflow::Tensor> outputs;
    string output_node = output_tensor_name;
    Status status_run = session->Run({{input_tensor_name, resized_tensor}}, {output_node}, {}, &outputs);

    if (!status_run.ok()) {
        cout << "ERROR: RUN failed..."  << std::endl;
        cout << status_run.ToString() << "\n";
        return -1;
    }
    //把输出值给提取出来
    cout << "Output tensor size:" << outputs.size() << std::endl;
    for (std::size_t i = 0; i < outputs.size(); i++) {
        cout << outputs[i].DebugString()<<endl;
    }

    Tensor t = outputs[0];                   // Fetch the first tensor
    auto tmap = t.tensor<float, 2>();        // Tensor Shape: [batch_size, target_class_num]
    int output_dim = t.shape().dim_size(1);  // Get the target_class_num from 1st dimension

    // Argmax: Get Final Prediction Label and Probability
    int output_class_id = -1;
    double output_prob = 0.0;
    for (int j = 0; j < output_dim; j++)
    {
        cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
        if (tmap(0, j) >= output_prob) {
            output_class_id = j;
            output_prob = tmap(0, j);
        }
    }

    // 输出结果
    cout << "Final class id: " << output_class_id << std::endl;
    cout << "Final class prob: " << output_prob << std::endl;

    return 0;
}

然后要写CmakeList.txt,如下所示：

cmake_minimum_required(VERSION 3.10)
project(demo)
set(CMAKE_CXX_STANDARD 11)

# 配置.so文件存在的路径，默认会去/usr/local/lib下找，
# 放在其他地方的.so文件路径就要用下面的代码添加
link_directories(/home/czj/anaconda2/envs/tf/lib)

# 配置.h头文件路径，默认会去/usr/local/include下找,
# 放在其他地方的.h文件就要用以下代码添加进来
include_directories(
        /home/czj/tensorflow
        /home/czj/tensorflow/bazel-genfiles
        /home/czj/tensorflow/bazel-bin/tensorflow
        /home/czj/tools/tf-C/eigen3
)

# 配置生成的可执行文件名为hello,要用到的源文件为hello.cpp
add_executable(hello hello.cpp)

# 以下使能OpenCV的路径查找
find_package(OpenCV REQUIRED)

# 以下是将可执行文件与一些.so文件建立动态链接关系，
# 用到的有libtensorflow_cc.so，libtensorflow_framework.so,以及opencv相关的so
target_link_libraries(hello tensorflow_cc tensorflow_framework ${OpenCV_LIBS})

执行以下步骤通过cmake去编译C++文件，生产可执行文件，然后执行：

# 先创建编译生成文件夹
mkdir build
cd build
# 在build中执行cmkae，生成makefile
cmake ..
# 执行make，产生可执行文件
make
# 执行可执行文件
./hello

C++端的可执行文件运行结果如下：与上面python代码的运行结果也基本一致，说明整个pipline是可以work的

用keras训练模型并用Tensorflow的C++API调用模型