mxnet是最近流行的机器学习框架之一,使用起来体验不错,不过平常都是用python接口写程序,本文介绍如何在linux下从源码编译mxnet并使用其C++接口编程。
本文所使用的环境是ubunt14.04,g++4.8,如果是其他类unix发行版(fedora,mac os)同理。
目标
- 编译出libmxnet.a,libmxnet.so,本文只编译cpu版
- 链接共享库,调用C++接口
构建依赖
最小构建条件:
- 最新支持C++11的编译器
- blas库(比如libblas,atlas,openblas),opencv库
可选条件:
- CUDA Toolkit>=7.0(nvidia GPU, compute capability>=2.0)
- cudnn>=3,加速GPU computation
编译步骤
安装依赖:
sudo apt-get update sudo apt-get install -y build-essential git libatlas-base-dev libopencv-dev
如果需要使用GPU版,还需要安装cudnn,cuda
修改配置:
config.mk修改
必选,生成op.h文件
USE_CPP_PACKAGE=1
可选,cuda相关
USE_CUDA = 1 USE_CUDA_PATH = /usr/local/cuda # 根据实际目录 USE_CUDNN = 1
构建mxnet:
git clone --recursive https://github.com/dmlc/mxnet cd mxnet; cp make/config.mk . make -j4
编译完成后,会在mxnet根目录的lib目录生成libmxnet.a和libmxnet.so
注意:
- 一定要使用git --recursive方式下载代码,因为很多目录里面代码是递归下载的,如果直接在github网页点击下载会导致代码补全,缺少很多库。
- 如果报internal compiler error: Killed (program cc1plus)这个错误,是因为内存不足,解决方法:stackoverflow
编写程序
在前面的链接库(静态和动态)编译好后,建立一个cpp工程,引入mxnet相关头文件和共享库,编写测试
这里跑一个机器学习的经典例子:mnist手写数据识别,需要预先下载好mnist数据集(地址:mnist)
工程结构:
mxnet_cpp_test项目目录
. ├── CMakeLists.txt └── src └── main.cpp
CMakeLists.txt
project(mxnet_cpp_test) cmake_minimum_required(VERSION 2.8) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -std=c++11 -W") include_directories( /home/user/codetest/mxnet/include /home/user/codetest/mxnet/dmlc-core/include /home/user/codetest/mxnet/nnvm/include /home/user/codetest/mxnet/cpp-package/include ) aux_source_directory(./src DIR_SRCS) link_directories(/home/user/codetest/mxnet/lib) add_executable(mxnet_cpp_test ${DIR_SRCS}) target_link_libraries(mxnet_cpp_test mxnet)
main.cpp
#include <chrono> #include "mxnet-cpp/MxNetCpp.h" using namespace std; using namespace mxnet::cpp; Symbol mlp(const vector<int> &layers) { auto x = Symbol::Variable("X"); auto label = Symbol::Variable("label"); vector<Symbol> weights(layers.size()); vector<Symbol> biases(layers.size()); vector<Symbol> outputs(layers.size()); for (size_t i = 0; i < layers.size(); ++i) { weights[i] = Symbol::Variable("w" + to_string(i)); biases[i] = Symbol::Variable("b" + to_string(i)); Symbol fc = FullyConnected( i == 0 ? x : outputs[i - 1], // data weights[i], biases[i], layers[i]); outputs[i] = i == layers.size() - 1 ? fc : Activation(fc, ActivationActType::kRelu); } return SoftmaxOutput(outputs.back(), label); } int main(int argc, char** argv) { const int image_size = 28; const vector<int> layers{ 128, 64, 10 }; const int batch_size = 100; const int max_epoch = 10; const float learning_rate = 0.1; const float weight_decay = 1e-2; auto train_iter = MXDataIter("MNISTIter") .SetParam("image", "./mnist_data/train-images.idx3-ubyte") .SetParam("label", "./mnist_data/train-labels.idx1-ubyte") .SetParam("batch_size", batch_size) .SetParam("flat", 1) .CreateDataIter(); auto val_iter = MXDataIter("MNISTIter") .SetParam("image", "./mnist_data/t10k-images.idx3-ubyte") .SetParam("label", "./mnist_data/t10k-labels.idx1-ubyte") .SetParam("batch_size", batch_size) .SetParam("flat", 1) .CreateDataIter(); auto net = mlp(layers); Context ctx = Context::cpu(); // Use CPU for training //Context ctx = Context::gpu(); std::map<string, NDArray> args; args["X"] = NDArray(Shape(batch_size, image_size*image_size), ctx); args["label"] = NDArray(Shape(batch_size), ctx); // Let MXNet infer shapes other parameters such as weights net.InferArgsMap(ctx, &args, args); // Initialize all parameters with uniform distribution U(-0.01, 0.01) auto initializer = Uniform(0.01); for (auto& arg : args) { // arg.first is parameter name, and arg.second is the value initializer(arg.first, &arg.second); } // Create sgd optimizer Optimizer* opt = OptimizerRegistry::Find("sgd"); opt->SetParam("rescale_grad", 1.0 / batch_size) ->SetParam("lr", learning_rate) ->SetParam("wd", weight_decay); // Create executor by binding parameters to the model auto *exec = net.SimpleBind(ctx, args); auto arg_names = net.ListArguments(); // Start training for (int iter = 0; iter < max_epoch; ++iter) { int samples = 0; train_iter.Reset(); auto tic = chrono::system_clock::now(); while (train_iter.Next()) { samples += batch_size; auto data_batch = train_iter.GetDataBatch(); // Set data and label data_batch.data.CopyTo(&args["X"]); data_batch.label.CopyTo(&args["label"]); // Compute gradients exec->Forward(true); exec->Backward(); // Update parameters for (size_t i = 0; i < arg_names.size(); ++i) { if (arg_names[i] == "X" || arg_names[i] == "label") continue; opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]); } } auto toc = chrono::system_clock::now(); Accuracy acc; val_iter.Reset(); while (val_iter.Next()) { auto data_batch = val_iter.GetDataBatch(); data_batch.data.CopyTo(&args["X"]); data_batch.label.CopyTo(&args["label"]); // Forward pass is enough as no gradient is needed when evaluating exec->Forward(false); acc.Update(data_batch.label, exec->outputs[0]); } float duration = chrono::duration_cast<chrono::milliseconds>(toc - tic).count() / 1000.0; LG << "Epoch: " << iter << " " << samples / duration << " samples/sec Accuracy: " << acc.Get(); } delete exec; MXNotifyShutdown(); return 0; }
使用cmake构建makefile,编译时会链接libmxnet.a(或者libmxnet.so),在cmake的生成目录下可以看到libmxnet.so被拷贝进去了,在编译生成的目录中需要将mnist数据拷贝进去,供程序load进来进行训练
编译生成目录结构:
. ├── libmxnet.so ├── Makefile ├── mnist_data │ ├── t10k-images.idx3-ubyte │ ├── t10k-labels.idx1-ubyte │ ├── train-images.idx3-ubyte │ └── train-labels.idx1-ubyte ├── mxnet_cpp_test
运行结果:
MNISTIter: load 60000 images, shuffle=1, shape=(100,784) MNISTIter: load 10000 images, shuffle=1, shape=(100,784) Epoch: 0 25178.3 samples/sec Accuracy: 0.1135 Epoch: 1 24340.8 samples/sec Accuracy: 0.536 Epoch: 2 25031.3 samples/sec Accuracy: 0.8278 Epoch: 3 25466.9 samples/sec Accuracy: 0.8729 Epoch: 4 25370 samples/sec Accuracy: 0.9042 Epoch: 5 23819 samples/sec Accuracy: 0.9159 Epoch: 6 24067.4 samples/sec Accuracy: 0.9229 Epoch: 7 26513.5 samples/sec Accuracy: 0.9287 Epoch: 8 25575.4 samples/sec Accuracy: 0.9335 Epoch: 9 25619.1 samples/sec Accuracy: 0.9371可以感受到用C++直接写mxnet训练程序运行速度非常快