OCR - Layout Parser 用于基于深度学习的文档图像分析的统一工具包

一、简述

LayoutParser，这是一个用于简化 DL 在 DIA 研究和应用中的使用的开源库。核心 LayoutParser 库带有一组简单直观的界面，用于应用和自定义 DL 模型以进行布局检测、字符识别和许多其他文档处理任务。为了促进可扩展性，LayoutParser 还包含一个社区平台，用于共享预训练模型和完整文档数字化管道。证明了 LayoutParser 对实际用例中的轻量级和大规模数字化管道都有帮助。

LayoutParser 提供了一个统一的工具包来支持基于 DL 的文档图像分析和处理。

LayoutParser由以下组件构建：

1. 一个现成的工具包，用于将 DL 模型应用于布局检测、字符识别和其他 DIA 任务。

2. 丰富的预训练神经网络模型（Model Zoo）存储库，是现成使用的基础

3. 高效文档图像数据标注和模型调整的综合工具，支持不同级别的定制

4. 一个 DL 模型中心和社区平台，用于轻松共享、分发和讨论 DIA 模型和管道，以促进可重用性、可重复性和可扩展性。

LayoutParser 中的布局检测模型，LayoutParser 目前托管 9 个在 5 个不同数据集上训练的预训练模型。训练数据集的描述与训练模型一起提供，以便用户可以快速识别最适合其任务的模型。此外，当此类模型不可用时，LayoutParser 还支持训练自定义布局模型和社区共享的模型。

三、简单的使用说明

1、安装

!pip install -U layoutparser

Layout Parser 还支持 OCR 功能。为了使用它们，您需要通过以下方式安装 OCR 实用程序：

!pip install layoutparser[ocr]

如果我们想使用 Detectron2 模型进行布局检测，我们可能需要运行以下命令：

!pip install 'git+https://github.com/facebookresearch/[email protected]#egg=detectron2'

!git clone https://github.com/Layout-Parser/layout-parser.git
%cd layout-parser/

2、Load COCO Layout Annotations

LayoutParer 使用 COCO 格式来加载和可视化布局注解。

pip install pycocotools

def load_coco_annotations(annotations, coco=None):
    """
    Args:
        annotations (List):
            a list of coco annotaions for the current image
        coco (`optional`, defaults to `False`):
            COCO annotation object instance. If set, this function will
            convert the loaded annotation category ids to category names
            set in COCO.categories
    """
    layout = lp.Layout()

    for ele in annotations:

        x, y, w, h = ele['bbox']

        layout.append(
            lp.TextBlock(
                block = lp.Rectangle(x, y, w+x, h+y),
                type  = ele['category_id'] if coco is None else coco.cats[ele['category_id']]['name'],
                id = ele['id']
            )
        )

    return layout

3、引入库

import pandas as pd
import numpy as np
import cv2
import random
import json
import pandas as pd 
from tqdm import tqdm
import matplotlib.pyplot as plt


from pycocotools.coco import COCO
import layoutparser as lp

4、准备数据集合

imgdir="kaggle/input/papers-images/train/train"
image = cv2.imread('/kaggle/input/papers-images/train/train/PMC3777717_00006.jpg')
plt.imshow(image)

COCO_ANNO_PATH = '/kaggle/input/papers-images/train/train/samples.json'
COCO_IMG_PATH  = '/kaggle/input/papers-images/train/train'

coco = COCO(COCO_ANNO_PATH)

5、可视化布局

color_map = {
    'text':   'red',
    'title':  'blue',
    'list':   'green',
    'table':  'yellow',
    'figure': 'pink',
}


for image_id in random.sample(coco.imgs.keys(), 1):
    image_info = coco.imgs[image_id]
    annotations = coco.loadAnns(coco.getAnnIds([image_id]))

    image = cv2.imread(f'{COCO_IMG_PATH}/{image_info["file_name"]}')
    layout = load_coco_annotations(annotations, coco)

    viz = lp.draw_box(image, layout, color_map=color_map)
    display(viz) # show the results