这篇将整个过程归纳为以下5步：（1）创建Interpreter（2）调度配置ScheduleConfi（3）后端配置BackendConfig（4）创建session（5）输入数据（6）进行会话并获取输出（后处理）

1. 类

1.1 Interpreter

Interpreter类是一个用于加载和运行推理模型的主要接口。它提供了加载、配置和运行神经网络模型的功能。

1.1.1 创建

//只有一个构造函数   
    Interpreter(Content* net);
//禁用了Interpreter类的拷贝构造函数、移动构造函数、拷贝赋值运算符和移动赋值运算符。
    Interpreter(const Interpreter&)  = delete;
    Interpreter(const Interpreter&&) = delete;
    Interpreter& operator=(const Interpreter&) = delete;
    Interpreter& operator=(const Interpreter&&) = delete;

对于net的创建，可以通过从文件或者buffer加载得到

static Interpreter* createFromFile(const char* file);
static Interpreter* createFromBuffer(const void* buffer, size_t size);

一个例子如下：

    // 1. 创建Interpreter, 通过磁盘文件创建: static Interpreter* createFromFile(const char* file);
    std::shared_ptr<Interpreter> net(Interpreter::createFromFile(model_name));

1.1.2 设置session

通过下面提到的ScheduleConfig就可以创建

Session* createSession(const ScheduleConfig& config);

运行会话

net->runSession(session);

获取输入输出

auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr
MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()

1.2 ScheduleConfig

用于配置计算图的调度参数。

主要关心并行数numThread和后端推理类型type

struct ScheduleConfig {
    /** which tensor should be kept */
    std::vector<std::string> saveTensors;
    /** 推理时，主选后端由type指定，默认为CPU。
    在主选后端不支持模型中的算子时，启用由backupType指定的备选后端。*/
    MNNForwardType type = MNN_FORWARD_CPU;
    /** CPU:number of threads in parallel , Or GPU: mode setting*/
    union {
        int numThread = 4;
        int mode;
    };

    /** subpath to run */
    struct Path {...};
    Path path;

    /** 备份后端用于在指定的后端不支持任何op时创建执行 */
    MNNForwardType backupType = MNN_FORWARD_CPU;

    /** extra backend config */
    BackendConfig* backendConfig = nullptr;
};

一个例子如下：

    MNN::ScheduleConfig config;
    // 2. 调度配置
    // 一些任务调度中的配置参数
    int forward = MNN_FORWARD_CPU;
    // int forward = MNN_FORWARD_OPENCL;
    int threads    = 1;
    // numThread决定并发数的多少，但具体线程数和并发效率，不完全取决于numThread
    // 推理时，主选后端由type指定，默认为CPU。在主选后端不支持模型中的算子时，启用由backupType指定的备选后端。
    config.numThread = threads;
    config.type      = static_cast<MNNForwardType>(forward);

1.3 BackendConfig

用于配置计算图的后端参数。它提供了一些选项，可以控制计算图的执行后端和相关设置。最终需要将其传入到 MNN::ScheduleConfig 类的backendConfig成员变量中

struct BackendConfig {
    enum MemoryMode { Memory_Normal = 0, Memory_High, Memory_Low };

    MemoryMode memory = Memory_Normal;

    enum PowerMode { Power_Normal = 0, Power_High, Power_Low };

    PowerMode power = Power_Normal;

    enum PrecisionMode { Precision_Normal = 0, Precision_High, Precision_Low, Precision_Low_BF16 };

    PrecisionMode precision = Precision_Normal;

    /** user defined context */
    union {
        void* sharedContext = nullptr;
        size_t flags; // Valid for CPU Backend
    };
};

1.4 MNN基本数据类型之一：Tensor

1.4.1 数据在主机和设备之间的传递

bool copyFromHostTensor(const Tensor* hostTensor);
bool copyToHostTensor(Tensor* hostTensor) const;

    auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr
    inputTensor->copyFromHostTensor(nhwc_Tensor);

2 .核心环节

2.1 图片的处理

图片的处理包括图片的预处理和将图片放入输入张量，预处理相对简单，这里主要来看一下将图片放入输入张量

2.1.1 将图片放入输入张量

（1）memcpy

    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);//DimensionType
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

（2）使用指针

// 假设 inputTensor 是输入张量，inputImage 是输入图片数据

// 获取输入张量的指针和相关信息
float* inputData = inputTensor->host<float>();
int inputWidth = inputTensor->width();
int inputHeight = inputTensor->height();
int inputChannels = inputTensor->channel();

// 遍历输入图片的像素，并将像素数据拷贝到输入张量
for (int y = 0; y < inputHeight; ++y) {
    for (int x = 0; x < inputWidth; ++x) {
        for (int c = 0; c < inputChannels; ++c) {
            // 计算输入张量的索引
            int inputIndex = c + x * inputChannels + y * inputWidth * inputChannels;
            
            // 获取输入图片的像素值
            cv::Vec3b pixel = inputImage.at<cv::Vec3b>(y, x);
            float value = static_cast<float>(pixel[c]);
            
            // 将像素值拷贝到输入张量
            inputData[inputIndex] = value;
        }
    }
}

2.2 后处理

获得输出之后,后处理部分可能要根据自己的模型进行特化处理

// 获取输出tensor
    MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()

    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    auto scores_dataPtr  = tensor_scores_host.host<float>();

3. 配置

在QT中使用

#MNN
# Minimum required version of Qt
QT += core

# Project name
TARGET = Test

# C++ standard version
CONFIG += c++11

# OpenCV library
LIBS += -lopencv_core -lopencv_highgui -lopencv_imgproc

# MNN library
MNN_DIR = /home/eveing/DL/nlp/llm_deploy/MNN-master
INCLUDEPATH += $$MNN_DIR/include $$MNN_DIR/include/MNN $$MNN_DIR/tools $$MNN_DIR/tools/cpp $$MNN_DIR/source $$MNN_DIR/source/backend $$MNN_DIR/source/core
LIBS += -L$$MNN_DIR/build -lMNN

这里借鉴了Ubuntu下阿里MNN 模型的c++读取调用 - 知乎的模型

#include "Backend.hpp"
#include "Interpreter.hpp"
#include "MNNDefine.h"
#include "Interpreter.hpp"
#include "Tensor.hpp"
#include <math.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <stdio.h>
using namespace MNN;
using namespace cv;

int main(void)
{
   // 填写自己的测试图像和mnn模型文件路径
    std::string image_name = "/home/project/ForwardNet_Test/MNN-master/build/model/33.bmp";
    const char* model_name = "/home/project/ForwardNet_Test/MNN-master/build/model/47.mnn";
    // 一些任务调度中的配置参数
    int forward = MNN_FORWARD_CPU;
    // int forward = MNN_FORWARD_OPENCL;
    int precision  = 2;
    int power      = 0;
    int memory     = 0;
    int threads    = 1;
    int INPUT_SIZE = 24;

    cv::Mat raw_image    = cv::imread(image_name.c_str());
    //imshow("image", raw_image);
    int raw_image_height = raw_image.rows;
    int raw_image_width  = raw_image.cols;
    cv::Mat image;
    cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));
    // 1. 创建Interpreter, 通过磁盘文件创建: static Interpreter* createFromFile(const char* file);
    std::shared_ptr<Interpreter> net(Interpreter::createFromFile(model_name));
    MNN::ScheduleConfig config;
    // 2. 调度配置,
    // numThread决定并发数的多少，但具体线程数和并发效率，不完全取决于numThread
    // 推理时，主选后端由type指定，默认为CPU。在主选后端不支持模型中的算子时，启用由backupType指定的备选后端。
    config.numThread = threads;
    config.type      = static_cast<MNNForwardType>(forward);
    MNN::BackendConfig backendConfig;
    // 3. 后端配置
    // memory、power、precision分别为内存、功耗和精度偏好
    backendConfig.precision = (MNN::BackendConfig::PrecisionMode)precision;
    backendConfig.power = (MNN::BackendConfig::PowerMode) power;
    backendConfig.memory = (MNN::BackendConfig::MemoryMode) memory;
    config.backendConfig = &backendConfig;
    // 4. 创建session
    auto session = net->createSession(config);
    net->releaseModel();

    clock_t start = clock();
    // preprocessing
    image.convertTo(image, CV_32FC3);
    image = image*2 / 255.0f-1.0f;
    // 5. 输入数据
    // wrapping input tensor, convert nhwc to nchw
    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

    std::string input_tensor = "input_image";
    // 获取输入tensor
    // 拷贝数据, 通过这类拷贝数据的方式，用户只需要关注自己创建的tensor的数据布局，
    // copyFromHostTensor会负责处理数据布局上的转换（如需）和后端间的数据拷贝（如需）。
    auto inputTensor  = net->getSessionInput(session,input_tensor.c_str());   //nullptr  
    inputTensor->copyFromHostTensor(nhwc_Tensor);

    // 6. 运行会话
    net->runSession(session);

    // 7. 获取输出
    std::string output_tensor_name0 = "prob/Softmax";
    // 获取输出tensor
    MNN::Tensor *tensor_scores  = net->getSessionOutput(session, nullptr);   //output_tensor_name0.c_str()

    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    // 拷贝数据
    tensor_scores->copyToHostTensor(&tensor_scores_host);

    printf("score of every class:");
    tensor_scores_host.print();
	
    // post processing steps
    auto scores_dataPtr  = tensor_scores_host.host<float>();

    // softmax
    float exp_sum = 0.0f;
    for (int i = 0; i < 2; ++i)
    {
        float val = scores_dataPtr[i];
        exp_sum += val;
    }
    // get result idx
    int  idx = 0;
    float max_prob = -10.0f;
    for (int i = 0; i < 2; ++i)
    {
        float val  = scores_dataPtr[i];
        float prob = val / exp_sum;
        if (prob > max_prob)
        {
            max_prob = prob;
            idx      = i;
        }
    }
    printf("output belong to class: %d\n", idx);

    return 0;
}

【部署】MNN推理