Enlaces relacionados:

MODNet es un modelo ligero de Matting. El modelo onnx de MODNet se ha implementado anteriormente con Python. En este capítulo, se usará NCNN para implementar MODNet. Además, el modelo está cuantificado estáticamente para reducir su tamaño en 1/4 del original modelo _ El efecto mate es el siguiente:

El código completo y los pesos requeridos están vinculados al final del artículo.

1. compilación NCNN

Para conocer los pasos específicos, consulte: Tutorial de compilación oficial

1. Compilar protobuf

Versión de protobuf: https://github.com/google/protobuf/archive/v3.4.0.zip

Abra el símbolo del sistema de herramientas nativas x64 para la herramienta de línea de comandos VS 2017 en el menú de inicio (también hay versiones más avanzadas disponibles, lo logré con 2022) y compile protobuf.

cd <protobuf-root-dir>
mkdir build
cd build
cmake -G"NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=%cd%/install -Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_MSVC_STATIC_RUNTIME=OFF ../cmake
nmake
nmake install

2. Compilar NCNN

Clona el repositorio de NCNN:

git clone https://github.com/Tencent/ncnn.git

Compile NCNN (no uso Vulkan aquí, consulte el tutorial oficial si es necesario), reemplace la ruta en el comando con su propia ruta:

cd <ncnn-root-dir>
mkdir -p build
cd build
cmake -G"NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=%cd%/install -DProtobuf_INCLUDE_DIR=<protobuf-root-dir>/build/install/include -DProtobuf_LIBRARIES=<protobuf-root-dir>/build/install/lib/libprotobuf.lib -DProtobuf_PROTOC_EXECUTABLE=<protobuf-root-dir>/build/install/bin/protoc.exe -DNCNN_VULKAN=OFF.. -DOpenCV_DIR=C:/opencv/opencv/build
nmake
nmake install

2. Despliegue de NCNN

1. Medio ambiente

cpu de windows

opencv 4.5.5

estudio visual 2019

2, modelo onnx a ncnn

Primero convierta el modelo onnx simplificado obtenido anteriormente al modelo ncnn (obtenga 2 archivos con los sufijos param y bin), preste atención a la conversión sin informar un error, o la carga posterior fallará:

El modelo convertido se puede descargar aquí: Portal

../ncnn/build/tools/onnx/onnx2ncnn simple_modnet.onnx simple_modnet.param simple_modnet.bin

3. Código C++

Estructura del código:

Código MODNet.h:

#pragma once
#include <string>
#include "net.h"
#include <opencv.hpp>
#include "time.h"


class MODNet
{

private:
	std::string param_path;
	std::string bin_path;
	std::vector<int> input_shape;
	ncnn::Net net;

	const float norm_vals[3] = { 1 / 177.5, 1 / 177.5, 1 / 177.5 };
	const float mean_vals[3] = { 175.5, 175.5, 175.5 };

	cv::Mat normalize(cv::Mat& image);
public:
	MODNet() = delete;
	MODNet(const std::string param_path, const std::string bin_path, std::vector<int> input_shape);
	~MODNet();

	cv::Mat predict_image(cv::Mat& image);
	void predict_image(const std::string& src_image_path, const std::string& dst_path);

	void predict_camera();
};

Código MODNet.cpp:

#include "MODNet.h"


MODNet::MODNet(const std::string param_path, const std::string bin_path, std::vector<int> input_shape)
	:param_path(param_path), bin_path(bin_path), input_shape(input_shape) {
	net.load_param(param_path.c_str());
	net.load_model(bin_path.c_str());
}


MODNet::~MODNet() {
	net.clear();
}


cv::Mat MODNet::normalize(cv::Mat& image) {
	std::vector<cv::Mat> channels, normalized_image;
	cv::split(image, channels);

	cv::Mat r, g, b;
	b = channels.at(0);
	g = channels.at(1);
	r = channels.at(2);
	b = (b / 255. - 0.5) / 0.5;
	g = (g / 255. - 0.5) / 0.5;
	r = (r / 255. - 0.5) / 0.5;

	normalized_image.push_back(r);
	normalized_image.push_back(g);
	normalized_image.push_back(b);

	cv::Mat out = cv::Mat(image.rows, image.cols, CV_32F);
	cv::merge(normalized_image, out);
	return out;
}


cv::Mat MODNet::predict_image(cv::Mat& image) {
	cv::Mat rgbImage;
	cv::cvtColor(image, rgbImage, cv::COLOR_BGR2RGB);
	ncnn::Mat in = ncnn::Mat::from_pixels_resize(rgbImage.data, ncnn::Mat::PIXEL_RGB, image.cols, image.rows, input_shape[3], input_shape[2]);
	in.substract_mean_normalize(mean_vals, norm_vals);
	ncnn::Extractor ex = net.create_extractor();
	ex.set_num_threads(4);
	ex.input("input", in);
	ncnn::Mat out;
	ex.extract("output", out);

	cv::Mat mask(out.h, out.w, CV_8UC1);
	const float* probMap = out.channel(0);

	for (int i{ 0 }; i < out.h; i++) {
		for (int j{ 0 }; j < out.w; ++j) {
			mask.at<uchar>(i, j) = probMap[i * out.w + j] > 0.5 ? 255 : 0;
		}
	}
	cv::resize(mask, mask, cv::Size(image.cols, image.rows), 0, 0);
	cv::Mat segFrame;
	cv::bitwise_and(image, image, segFrame, mask = mask);
	return segFrame;
}


void MODNet::predict_image(const std::string& src_image_path, const std::string& dst_path) {
	cv::Mat image = cv::imread(src_image_path);
	cv::Mat segFrame = predict_image(image);
	cv::imwrite(dst_path, segFrame);
}


void MODNet::predict_camera() {
	cv::Mat frame;
	cv::VideoCapture cap;
	int deviceID{ 0 };
	int apiID{ cv::CAP_ANY };
	cap.open(deviceID, apiID);
	if (!cap.isOpened()) {
		std::cout << "Error, cannot open camera!" << std::endl;
		return;
	}
	//--- GRAB AND WRITE LOOP
	std::cout << "Start grabbing" << std::endl << "Press any key to terminate" << std::endl;
	int count{ 0 };
	clock_t start{ clock() }, end{ 0 };
	double fps{ 0 };
	for (;;)
	{
		// wait for a new frame from camera and store it into 'frame'
		cap.read(frame);
		// check if we succeeded
		if (frame.empty()) {
			std::cout << "ERROR! blank frame grabbed" << std::endl;
			break;
		}
		cv::Mat segFrame = predict_image(frame);

		// fps
		++count;
		end = clock();
		fps = count / (float(end - start) / CLOCKS_PER_SEC);
		if (count >= 50) {
			count = 0;  //防止计数溢出
			start = clock();
		}
		std::cout << "FPS: " << fps << "  Seg Image Number: " << count << "   time consume:" << (float(end - start) / CLOCKS_PER_SEC) << std::endl;
		//设置绘制文本的相关参数
		std::string text{ std::to_string(fps) };
		int font_face = cv::FONT_HERSHEY_COMPLEX;
		double font_scale = 1;
		int thickness = 2;
		int baseline;
		cv::Size text_size = cv::getTextSize(text, font_face, font_scale, thickness, &baseline);

		//将文本框居中绘制
		cv::Point origin;
		origin.x = 20;
		origin.y = 20;
		cv::putText(segFrame, text, origin, font_face, font_scale, cv::Scalar(0, 255, 255), thickness, 8, 0);

		// show live and wait for a key with timeout long enough to show images
		imshow("Live", segFrame);
		if (cv::waitKey(5) >= 0)
			break;

	}
	cap.release();
	cv::destroyWindow("Live");
	return;
}

código principal.cpp:

#include <opencv.hpp>
#include <iostream>
#include "MODNet.h"
#include <vector>
#include "net.h"
#include "time.h"


int main() {
	std::string param_path{ "onnx_model\\simple_modnet.param" };
	std::string bin_path{ "onnx_model\\simple_modnet.bin" };
	std::vector<int> input_shape{ 1, 3, 512, 512 };
	MODNet model(param_path, bin_path, input_shape);


	// 预测并显示
	cv::Mat image = cv::imread("C:\\Users\\langdu\\Pictures\\test.png");
	cv::Mat segFrame = model.predict_image(image);
	cv::imshow("1", segFrame);
	cv::waitKey(0);

	// 摄像头
	//model.predict_camera();
	return -1;
}

3. Cuantificación NCNN

Para dispositivos móviles, el tamaño del modelo es muy exigente y se necesita un método eficaz para reducir su espacio de almacenamiento. La cuantificación es un método eficaz para reducir el tamaño del modelo. Para obtener información cuantitativa, consulte: [Aprendizaje profundo] Cuantificación del modelo - Notas/Experimentos

Aquí se utiliza el método de cuantificación estática y la comparación del tamaño del modelo después de la cuantificación:

	papelera (KB)	parámetro(KB)
Antes de la cuantificación	25236	22
Después de la cuantificación	6442	24

Se puede ver en la tabla anterior que después de cuantificar el modelo, su tamaño es solo aproximadamente 1/4 del tamaño original (habrá una cierta pérdida de precisión en la predicción). Comencemos el tutorial cuantitativo y consulte el tutorial oficial .

1. Optimización del modelo

Utilice ncnnoptimize (en el directorio build\tools) para optimizar el modelo.

./ncnnoptimize simple_modnet.param simple_modnet.bin quantize_modnet.param quantize_modnet 0

2. Crear una tabla de calibración

Cuando ncnn crea la tabla de calibración, los parámetros de media y norma se pueden modificar aquí Además, tenga en cuenta que la configuración de píxeles es la misma que la del repositorio oficial de MODNet, que es BGR. Las imágenes utilizadas para la calibración se almacenan en la carpeta de imágenes.

find images/ -type f > imagelist.txt
./ncnn2table quantize_modnet.param quantize_modnet.bin imagelist.txt modnet.table mean=[177.5, 177.5, 177.5] norm=[0.00784, 0.00784, 0.00784] shape=[512, 512, 3] pixel=BGR thread=8 method=kl

3. Cuantificación

Use el siguiente comando para obtener el modelo int8.

./ncnn2int8 quantize_modnet.param quantize_modnet-opt.bin modnet_int8.param modnet_int8.bin modnet.table

4. Uso

Se obtiene el modelo int8, que es muy simple de usar. Simplemente reemplace las rutas bin y param en el código anterior con la ruta del modelo int8 generado.

5. Resultados de la predicción después de la cuantificación

Primero, se dan los resultados de predicción no cuantificados:

Los resultados después de la cuantificación (pérdida de precisión, predicción completa de zapatos y más errores de piso):

4. Enlaces relacionados

1. NCNN implementa todo el código + todos los pesos del modelo

[Mateado] MODNet: modelo de matizado de retratos en tiempo real: implementación cuantitativa de NCNN C++