[Deployment] MNN reasoning

[Reference] C++ read and call of Ali MNN model under Ubuntu-Knowledge

This article summarizes the whole process into the following 5 steps: (1) Create Interpreter (2) Scheduling configuration ScheduleConfig (3) Backend configuration BackendConfig (4) Create session (5) Input data (6) Conduct session and get output (post-processing )

1. class

1.1 Interpreter

The Interpreter class is a main interface for loading and running inference models. It provides functions to load, configure and run neural network models.

1.1.1 Create

//只有一个构造函数   
    Interpreter(Content* net);
//禁用了Interpreter类的拷贝构造函数、移动构造函数、拷贝赋值运算符和移动赋值运算符。
    Interpreter(const Interpreter&)  = delete;
    Interpreter(const Interpreter&&) = delete;
    Interpreter& operator=(const Interpreter&) = delete;
    Interpreter& operator=(const Interpreter&&) = delete;

For the creation of net, it can be obtained by loading from a file or buffer

static Interpreter* createFromFile(const char* file);
static Interpreter* createFromBuffer(const void* buffer, size_t size);

An example is as follows:

    // 1. 创建Interpreter, 通过磁盘文件创建: static Interpreter* createFromFile(const char* file);
    std::shared_ptr<Interpreter> net(Interpreter::createFromFile(model_name));

1.1.2 set session 

It can be created by the ScheduleConfig mentioned below

Session* createSession(const ScheduleConfig& config);

run session 

net->runSession(session);

Get input and output

auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr
MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()
 
 

1.2 ScheduleConfig

Scheduling parameters for configuring the computation graph.

Mainly concerned with the number of parallelism numThread and the type of backend reasoning

struct ScheduleConfig {
    /** which tensor should be kept */
    std::vector<std::string> saveTensors;
    /** 推理时,主选后端由type指定,默认为CPU。
    在主选后端不支持模型中的算子时,启用由backupType指定的备选后端。*/
    MNNForwardType type = MNN_FORWARD_CPU;
    /** CPU:number of threads in parallel , Or GPU: mode setting*/
    union {
        int numThread = 4;
        int mode;
    };

    /** subpath to run */
    struct Path {...};
    Path path;

    /** 备份后端用于在指定的后端不支持任何op时创建执行 */
    MNNForwardType backupType = MNN_FORWARD_CPU;

    /** extra backend config */
    BackendConfig* backendConfig = nullptr;
};

An example is as follows:

    MNN::ScheduleConfig config;
    // 2. 调度配置
    // 一些任务调度中的配置参数
    int forward = MNN_FORWARD_CPU;
    // int forward = MNN_FORWARD_OPENCL;
    int threads    = 1;
    // numThread决定并发数的多少,但具体线程数和并发效率,不完全取决于numThread
    // 推理时,主选后端由type指定,默认为CPU。在主选后端不支持模型中的算子时,启用由backupType指定的备选后端。
    config.numThread = threads;
    config.type      = static_cast<MNNForwardType>(forward);

1.3  BackendConfig

Backend parameters used to configure the computation graph. It provides options to control the execution backend and related settings of the computation graph. Finally, it needs to be passed into the backendConfig member variable of the MNN::ScheduleConfig class

struct BackendConfig {
    enum MemoryMode { Memory_Normal = 0, Memory_High, Memory_Low };

    MemoryMode memory = Memory_Normal;

    enum PowerMode { Power_Normal = 0, Power_High, Power_Low };

    PowerMode power = Power_Normal;

    enum PrecisionMode { Precision_Normal = 0, Precision_High, Precision_Low, Precision_Low_BF16 };

    PrecisionMode precision = Precision_Normal;

    /** user defined context */
    union {
        void* sharedContext = nullptr;
        size_t flags; // Valid for CPU Backend
    };
};

 1.4 One of the basic data types of MNN: Tensor

1.4.1 Data transfer between host and device

bool copyFromHostTensor(const Tensor* hostTensor);
bool copyToHostTensor(Tensor* hostTensor) const;
    auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr
    inputTensor->copyFromHostTensor(nhwc_Tensor);

2. Core link

2.1 Image processing

Image processing includes image preprocessing and putting images into input tensors. Preprocessing is relatively simple. Here we mainly look at putting images into input tensors.

2.1.1 Put the image into the input tensor

(1)memcpy

    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);//DimensionType
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

(2) Using pointers

// 假设 inputTensor 是输入张量,inputImage 是输入图片数据

// 获取输入张量的指针和相关信息
float* inputData = inputTensor->host<float>();
int inputWidth = inputTensor->width();
int inputHeight = inputTensor->height();
int inputChannels = inputTensor->channel();

// 遍历输入图片的像素,并将像素数据拷贝到输入张量
for (int y = 0; y < inputHeight; ++y) {
    for (int x = 0; x < inputWidth; ++x) {
        for (int c = 0; c < inputChannels; ++c) {
            // 计算输入张量的索引
            int inputIndex = c + x * inputChannels + y * inputWidth * inputChannels;
            
            // 获取输入图片的像素值
            cv::Vec3b pixel = inputImage.at<cv::Vec3b>(y, x);
            float value = static_cast<float>(pixel[c]);
            
            // 将像素值拷贝到输入张量
            inputData[inputIndex] = value;
        }
    }
}

2.2 Post-processing

After obtaining the output, the post-processing part may be specialized according to its own model

// 获取输出tensor
    MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()

    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    auto scores_dataPtr  = tensor_scores_host.host<float>();

3. Configuration

Use in QT

#MNN
# Minimum required version of Qt
QT += core

# Project name
TARGET = Test

# C++ standard version
CONFIG += c++11

# OpenCV library
LIBS += -lopencv_core -lopencv_highgui -lopencv_imgproc

# MNN library
MNN_DIR = /home/eveing/DL/nlp/llm_deploy/MNN-master
INCLUDEPATH += $$MNN_DIR/include $$MNN_DIR/include/MNN $$MNN_DIR/tools $$MNN_DIR/tools/cpp $$MNN_DIR/source $$MNN_DIR/source/backend $$MNN_DIR/source/core
LIBS += -L$$MNN_DIR/build -lMNN




Here we borrow the c++ read call of the Ali MNN model under Ubuntu- Zhihu 's model

#include "Backend.hpp"
#include "Interpreter.hpp"
#include "MNNDefine.h"
#include "Interpreter.hpp"
#include "Tensor.hpp"
#include <math.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <stdio.h>
using namespace MNN;
using namespace cv;

int main(void)
{
   // 填写自己的测试图像和mnn模型文件路径
    std::string image_name = "/home/project/ForwardNet_Test/MNN-master/build/model/33.bmp";
    const char* model_name = "/home/project/ForwardNet_Test/MNN-master/build/model/47.mnn";
    // 一些任务调度中的配置参数
    int forward = MNN_FORWARD_CPU;
    // int forward = MNN_FORWARD_OPENCL;
    int precision  = 2;
    int power      = 0;
    int memory     = 0;
    int threads    = 1;
    int INPUT_SIZE = 24;

    cv::Mat raw_image    = cv::imread(image_name.c_str());
    //imshow("image", raw_image);
    int raw_image_height = raw_image.rows;
    int raw_image_width  = raw_image.cols;
    cv::Mat image;
    cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));
    // 1. 创建Interpreter, 通过磁盘文件创建: static Interpreter* createFromFile(const char* file);
    std::shared_ptr<Interpreter> net(Interpreter::createFromFile(model_name));
    MNN::ScheduleConfig config;
    // 2. 调度配置,
    // numThread决定并发数的多少,但具体线程数和并发效率,不完全取决于numThread
    // 推理时,主选后端由type指定,默认为CPU。在主选后端不支持模型中的算子时,启用由backupType指定的备选后端。
    config.numThread = threads;
    config.type      = static_cast<MNNForwardType>(forward);
    MNN::BackendConfig backendConfig;
    // 3. 后端配置
    // memory、power、precision分别为内存、功耗和精度偏好
    backendConfig.precision = (MNN::BackendConfig::PrecisionMode)precision;
    backendConfig.power = (MNN::BackendConfig::PowerMode) power;
    backendConfig.memory = (MNN::BackendConfig::MemoryMode) memory;
    config.backendConfig = &backendConfig;
    // 4. 创建session
    auto session = net->createSession(config);
    net->releaseModel();

    clock_t start = clock();
    // preprocessing
    image.convertTo(image, CV_32FC3);
    image = image*2 / 255.0f-1.0f;
    // 5. 输入数据
    // wrapping input tensor, convert nhwc to nchw
    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

    std::string input_tensor = "input_image";
    // 获取输入tensor
    // 拷贝数据, 通过这类拷贝数据的方式,用户只需要关注自己创建的tensor的数据布局,
    // copyFromHostTensor会负责处理数据布局上的转换(如需)和后端间的数据拷贝(如需)。
    auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr  
    inputTensor->copyFromHostTensor(nhwc_Tensor);

    // 6. 运行会话
    net->runSession(session);

    // 7. 获取输出
    std::string output_tensor_name0 = "prob/Softmax";
    // 获取输出tensor
    MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()

    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    // 拷贝数据
    tensor_scores->copyToHostTensor(&tensor_scores_host);

    printf("score of every class:");
    tensor_scores_host.print();
	
    // post processing steps
    auto scores_dataPtr  = tensor_scores_host.host<float>();

    // softmax
    float exp_sum = 0.0f;
    for (int i = 0; i < 2; ++i)
    {
        float val = scores_dataPtr[i];
        exp_sum += val;
    }
    // get result idx
    int  idx = 0;
    float max_prob = -10.0f;
    for (int i = 0; i < 2; ++i)
    {
        float val  = scores_dataPtr[i];
        float prob = val / exp_sum;
        if (prob > max_prob)
        {
            max_prob = prob;
            idx      = i;
        }
    }
    printf("output belong to class: %d\n", idx);

    return 0;
}

Guess you like

Origin blog.csdn.net/weixin_50862344/article/details/131174843