The previous blog [Model Deployment] PaddleOCR model openvino deployment (1) introduces the deployment method of the PaddleOCR detection model DBNet. This article will introduce the deployment method of text orientation classification and text recognition, and at the same time, the detection, orientation classification, and text recognition models are connected in series Get up and give the complete deployment process.
PaddleOCR:https://github.com/PaddlePaddle/PaddleOCR
When deploying a deep learning model, we mainly complete the code of the preprocessing and postprocessing parts. The same is true for the three models of detection, classification, and identification. We only need to complete the preprocessing and postprocessing.
The model deployment effect is as follows (the left is the original image, the right is the detection result):
The identification results are as follows:
content
First, the detection model deployment
Second, the direction classification model deployment
1. Download the classification model
2. Unzip the compressed package
Third, identify the model deployment
2. Unzip the compressed package
Fourth, detection + classification + identification serial deployment
First, the detection model deployment
Detection model deployment reference: [Model deployment] PaddleOCR model openvino deployment (1) , this article will not introduce too much.
Second, the direction classification model deployment
1. Download the classification model
!wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
2. Unzip the compressed package
After decompression, the static graph model of the flying paddle is obtained, as shown below:
3. View the model
Use netron to view the inference.pdmodel structure, as shown in the figure below, mainly focus on 2 points:
(a) the output of the model (related to subsequent post-processing);
(b) the dimension of the input (design to subsequent pre-processing);
4. Deployment
The openvino deployment code is as follows:
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
class ClsPostProcess(object):
""" Convert between text-label and text-index """
def __init__(self, label_list=['0', '180'], threshold=0.9):
super(ClsPostProcess, self).__init__()
self.label_list = label_list
self.threshold = threshold
def __call__(self, preds, image=None):
pred_idxs = preds.argmax(axis=1)
assert pred_idxs.shape[0] == 1, "batch size must be 1, but got {}.".format(pred_idxs.shape[0])
direction = self.label_list[pred_idxs[0]]
if direction == '180' and preds[0, 1] > self.threshold:
image = cv2.rotate(image, 1)
return image
class ClsPredictor:
def __init__(self, model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.9):
self.target_size = target_size
self.mean = mean
self.std = std
self.model_path = model_path
self.post_process = ClsPostProcess(threshold=threshold)
def preprocess(self, image):
image = resize(image, target_size=self.target_size)
image = normalize(image, mean=self.mean, std=self.std)
return image
def predict(self, image):
if isinstance(image, str):
image = cv2.imread(image)
image_h, image_w, _ = image.shape
inputs = self.preprocess(image)
input_image = np.expand_dims(
inputs.transpose(2, 0, 1), 0
)
ie = Core()
model = ie.read_model(model=self.model_path)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = next(iter(compiled_model.inputs))
output_layer_ir = next(iter(compiled_model.outputs))
preds = compiled_model([input_image])[output_layer_ir]
image = self.post_process(preds, image)
return image
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--model_path',
dest='model_path',
help='The path of pdmodel for export',
type=str,
default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
model_path = args.model_path
image_path = args.image_path
cls_predictor = ClsPredictor(model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.7)
image = cls_predictor.predict(image_path)
cv2.imwrite('cls_result.png', image)
cv2.imshow('1', image)
cv2.waitKey(0)
Third, identify the model deployment
1. Download the model
!wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
2. Unzip the compressed package
3. View the model structure
It can be seen that the input of the recognition model is the same as the direction classification (because the direction classification is directly recognized), and the output dimension is [?, 25, 6625], here? Indicates the batch size, 25 indicates the length of the recognized characters, and 6625 is the number of character categories (the recognition model has a corresponding dictionary, and the number of characters in the dictionary should be consistent with the character category).
4. Deployment
You need to prepare the dictionary corresponding to the model first. The code is as follows:
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
class BaseRecLabelDecode(object):
""" Convert between text-label and text-index """
def __init__(self, character_dict_path=None, use_space_char=False):
self.beg_str = "sos"
self.end_str = "eos"
self.character_str = []
if character_dict_path is None:
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
else:
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str.append(line)
if use_space_char:
self.character_str.append(" ")
dict_character = list(self.character_str)
dict_character = self.add_special_char(dict_character)
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def add_special_char(self, dict_character):
return dict_character
def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
""" convert text-index into text-label. """
result_list = []
ignored_tokens = self.get_ignored_tokens()
batch_size = len(text_index)
for batch_idx in range(batch_size):
char_list = []
conf_list = []
for idx in range(len(text_index[batch_idx])):
if text_index[batch_idx][idx] in ignored_tokens:
continue
if is_remove_duplicate:
# only for predict
if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
batch_idx][idx]:
continue
char_list.append(self.character[int(text_index[batch_idx][
idx])])
if text_prob is not None:
conf_list.append(text_prob[batch_idx][idx])
else:
conf_list.append(1)
text = ''.join(char_list)
result_list.append((text, np.mean(conf_list)))
return result_list
def get_ignored_tokens(self):
return [0] # for ctc blank
class CTCLabelDecode(BaseRecLabelDecode):
""" Convert between text-label and text-index """
def __init__(self, character_dict_path=None, use_space_char=False,
**kwargs):
super(CTCLabelDecode, self).__init__(character_dict_path,
use_space_char)
def __call__(self, preds, label=None, *args, **kwargs):
if isinstance(preds, (tuple, list)):
preds = preds[-1]
preds_idx = preds.argmax(axis=2)
preds_prob = preds.max(axis=2)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
if label is None:
return text
label = self.decode(label)
return text, label
def add_special_char(self, dict_character):
dict_character = ['blank'] + dict_character
return dict_character
class RecPredictor:
def __init__(self, model_path, character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=False):
self.target_size = target_size
self.mean = mean
self.std = std
self.model_path = model_path
self.post_process = CTCLabelDecode(character_dict_path=character_dict_path, use_space_char=use_space_char)
def preprocess(self, image):
image = resize(image, target_size=self.target_size)
#cv2.imshow('rec', image)
#cv2.waitKey(0)
image = normalize(image, mean=self.mean, std=self.std)
return image
def predict(self, image):
if isinstance(image, str):
image = cv2.imread(image)
image_h, image_w, _ = image.shape
inputs = self.preprocess(image)
input_image = np.expand_dims(
inputs.transpose(2, 0, 1), 0
)
ie = Core()
model = ie.read_model(model=self.model_path)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = next(iter(compiled_model.inputs))
output_layer_ir = next(iter(compiled_model.outputs))
preds = compiled_model([input_image])[output_layer_ir]
text = self.post_process(preds)
return text
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--model_path',
dest='model_path',
help='The path of pdmodel for export',
type=str,
default=None)
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
parser.add_argument(
'--use_space_char',
dest='use_space_char',
help='Whether use space char.',
type=bool,
default=False)
parser.add_argument(
'--character_dict_path',
dest='character_dict_path',
help='The path of character dict.',
type=str,
default="ppocr_keys_v1.txt")
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
model_path = args.model_path
image_path = args.image_path
use_space_char = args.use_space_char
character_dict_path = args.character_dict_path
rec_predictor = RecPredictor(model_path, character_dict_path=character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=use_space_char)
text = rec_predictor.predict(image_path)
print(text)
Fourth, detection + classification + identification serial deployment
With the respective deployment codes of the 3 models, string them together as follows (see the end of the article for all the codes):
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
from ppocr_cls import ClsPredictor
from ppocr_det import DetPredictor
from ppocr_rec import RecPredictor
from PIL import Image, ImageDraw, ImageFont
class PaddleOCR:
def __init__(self, det_model_path, rec_model_path, character_dict_path, cls_model_path=None, use_space_char=False, det_image_size=[960, 960], rec_image_size=[100, 32], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
self.det_predictor = DetPredictor(det_model_path, target_size=det_image_size, mean=mean, std=std)
self.cls_predictor = ClsPredictor(cls_model_path, target_size=rec_image_size, mean=mean, std=std) if cls_model_path else None
self.rec_predictor = RecPredictor(rec_model_path, character_dict_path=character_dict_path, target_size=rec_image_size, mean=mean, std=std, use_space_char=use_space_char)
def predict(self, image_path):
image = cv2.imread(image_path)
raw_image = image.copy()
boxes_batch = self.det_predictor.predict(image_path)
draw_image = self.det_predictor.draw_det(image, boxes_batch[0]['points'])
texts = []
for box in boxes_batch[0]['points']:
box = box.astype(np.int32)
left, top = box[0, 0], box[0, 1]
right, bottom = box[2, 0], box[2, 1]
sub_image = raw_image[top:bottom, left:right, :]
if self.cls_predictor is not None:
sub_image = self.cls_predictor.predict(sub_image)
text = self.rec_predictor.predict(sub_image)
texts.append(text)
return draw_image, texts
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--det_model_path',
dest='det_model_path',
help='The path of detection pdmodel for export',
type=str,
default='ch_PP-OCRv2_det_infer/inference.pdmodel')
parser.add_argument(
'--rec_model_path',
dest='rec_model_path',
help='The path of recognition pdmodel for export',
type=str,
default="ch_PP-OCRv2_rec_infer/inference.pdmodel")
parser.add_argument(
'--cls_model_path',
dest='cls_model_path',
help='The path of direction class pdmodel for export',
type=str,
default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
parser.add_argument(
'--save_path',
dest='save_path',
help='The image save path.',
type=str,
default="result.png")
parser.add_argument(
'--use_space_char',
dest='use_space_char',
help='Whether use space char.',
type=bool,
default=True)
parser.add_argument(
'--character_dict_path',
dest='character_dict_path',
help='The path of character dict.',
type=str,
default="ppocr_keys_v1.txt")
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
predictor = PaddleOCR(det_model_path=args.det_model_path, rec_model_path=args.rec_model_path, character_dict_path=args.character_dict_path, cls_model_path=args.cls_model_path, use_space_char=args.use_space_char)
draw_image, texts = predictor.predict(args.image_path)
cv2.imwrite(args.save_path, draw_image)
print(texts)
5. Reference
1. All the code links in this article: pp-ocrv2pythonopenvino deployment code-deep learning documentation resources-CSDN download
2. Reference link: PaddlePaddle+openvino] PP-OCRv2 deployment