Paper link: https://arxiv.org/pdf/1911.08947.pdf

network structure

First, the image input feature extracts the backbone and extracts the features;
Secondly, the feature pyramid is upsampled to the same size, and the feature F is obtained by feature concatenation;
Then, the feature F is used to predict the probability map (probability map P) and the threshold map (threshold map T)
Finally, calculate the approximate binary map (approximate binary map B) by P and F

During training, P, T, B are supervised training, P and B are used the same supervision signal (label). At inference time, only P or B is needed to get the text box.

Network output:

1. probability map, wh1 , which represents the probability that the pixel is a text

2. threshold map, wh1, threshold for each pixel

3. The binary map, wh1, is calculated from 1 and 2, and the calculation formula is the DB formula

As shown below:

download code

Get the code from WenmuZhou/DBNet.pytorch: A pytorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization (github.com) and unzip it. Then install the missing package

pip install Polygon3 -i https://pypi.tuna.tsinghua.edu.cn/simple  
pip install addict
pip install imgaug

According to your own environment, the environment is different, and the installed packages are also different.

Execute under pycharm's Terminal:

python tools/train.py --config_file "config/icdar2015_resnet18_FPN_DBhead_polyLR.yaml"

If the package is missing, there will be a package error. If you can't see the error, it means that everything is installed.

data set

The dataset uses icdar2015, web link: Downloads - Incidental Scene Text - Robust Reading Competition (uab.es) , registration is required.

Select Task4.1: Text Localization

Details of the data: Tasks - Incidental Scene Text - Robust Reading Competition (uab.es)

Task 4.1: Text Localization For the text localization task, we will provide word bounding boxes for each image. Ground truths are given as separate text files (one per image), where each line specifies the coordinates of a word bounding box and its transcription in comma-separated format (see Figure 1).

insert image description here

For text localization tasks, ground truth data is provided in the form of word bounding boxes. Unlike Challenges 1 and 2, the bounding boxes in Challenge 4 are not axis-oriented, they are specified in a clockwise fashion by the coordinates of the four corners. For each image in the training set, a separate UTF-8 text file is provided following the naming convention:

gt_[image name].txt

The text file is a comma-separated file, where each line will correspond to a word in the image, giving its bounding box coordinates (four corners, clockwise) and a transcription of its format:

x1, y1, x2, y2, x3, y3, x4, y4, transcription

Note that anything after the eighth comma is part of the transcription and no escape characters are used. The "don't care" region is denoted by the transcription of "###" in the ground truth. Authors will be asked to automatically locate text in images and return bounding boxes. Results must be submitted in a separate text file for each image, with each line corresponding to a bounding box (comma-separated values) in the above format. A single compressed (zip or rar) file containing all result files should be submitted. If your method fails to produce any results for the image, you can include an empty results file or no file at all. Unlike Challenges 1 and 2, the evaluation of the results will be based on a single Intersection-over-Union criterion with a threshold of 50%, similar to the standard practice in Object Recognition and Pascal VOC Challenges [1].

After the data set is downloaded, four files can be obtained, as shown below:

Unzip ch4_training_images.zip to ./datasets\train\img.

Unzip ch4_training_localization_transcription_gt.zip to ./datasets\train\gt.

Unzip ch4_test_images.zip to ./datasets\test\img.

Unzip Challenge4_Test_Task1_GT.zip to ./datasets\test\gt.

Next, the data set is preprocessed. The author writes the processing script generate_lists.sh under the Ubuntu system, so if the system used is UBuntu, just execute the script.

bash generate_lists.sh

If it is Win10 platform, you need to write python script. Create a new getdata.py and insert the code:

import os
def get_images(img_path):
    '''
    find image files in data path
    :return: list of files found
    '''
    files = []
    exts = ['jpg', 'png', 'jpeg', 'JPG', 'PNG']
    for parent, dirnames, filenames in os.walk(img_path):
        for filename in filenames:
            for ext in exts:
                if filename.endswith(ext):
                    files.append(os.path.join(parent, filename))
                    break
    print('Find {} images'.format(len(files)))
    return sorted(files)

def get_txts(txt_path):
    '''
    find gt files in data path
    :return: list of files found
    '''
    files = []
    exts = ['txt']
    for parent, dirnames, filenames in os.walk(txt_path):
        for filename in filenames:
            for ext in exts:
                if filename.endswith(ext):
                    files.append(os.path.join(parent, filename))
                    break
    print('Find {} txts'.format(len(files)))
    return sorted(files)

if __name__ == '__main__':
    import json

    img_train_path = './datasets/train/img'
    img_test_path = './datasets/test/img'
    train_files = get_images(img_train_path)
    test_files = get_images(img_test_path)

    txt_train_path = './datasets/train/gt'
    txt_test_path = './datasets/test/gt'
    train_txts = get_txts(txt_train_path)
    test_txts = get_txts(txt_test_path)
    n_train = len(train_files)
    n_test = len(test_files)
    assert len(train_files) == len(train_txts) and len(test_files) == len(test_txts)
    # with open('train.txt', 'w') as f:
    with open('./datasets/train.txt', 'w') as f:
        for i in range(n_train):
            line = train_files[i] + '\t' + train_txts[i] + '\n'
            f.write(line)
    with open('./datasets/test.txt', 'w') as f:
        for i in range(n_test):
            line = test_files[i] + '\t' + test_txts[i] + '\n'
            f.write(line)

The logic is not complicated. The img file list and gt file list of train and test are stored in train.txt and test.txt respectively.

After completing the above data processing, you can start training

Training

At this point, most of the work has been completed, and you only need to make appropriate changes to the config file parameters to start training.

The config file used in this training is ./config/icdar2015_resnet18_FPN_DBhead_polyLR.yaml, modify parameters such as learning rate, optimizer, BatchSize, as shown below:

The parameters marked with the red box above can be modified according to the actual situation. My card is 3090, and the BatchSize is set to 32.

After the parameter setting is completed, start the training and execute it under the Terminal of pycharm:

CUDA_VISIBLE_DEVICES=0 python tools/train.py --config_file "config/icdar2015_resnet18_FPN_DBhead_polyLR.yaml"

test

Open ./tools/predict.py and view the parameters:

def init_args():
    import argparse
    parser = argparse.ArgumentParser(description='DBNet.pytorch')
    parser.add_argument('--model_path', default=r'model_best.pth', type=str)
    parser.add_argument('--input_folder', default='./test/input', type=str, help='img path for predict')
    parser.add_argument('--output_folder', default='./test/output', type=str, help='img path for output')
    parser.add_argument('--thre', default=0.3,type=float, help='the thresh of post_processing')
    parser.add_argument('--polygon', action='store_true', help='output polygon or box')
    parser.add_argument('--show', default=True,action='store_true', help='show result')
    parser.add_argument('--save_resut', default=True, action='store_true', help='save box and score to txt file')
    args = parser.parse_args()
    return args

model_path: The path to the model.

input_folder: The path of the image to be tested.

output_folder: The path of the output result.

thre: minimum confidence level.

polygon: Polygon or box, True is a polygon, False is a box. It is recommended to set it to False.

show: Whether to show.

save_resut: Whether to save the result.

Create a new input folder, put the test image, and execute the following command in the Terminal of pycharm:

python tools/predict.py --model_path output/DBNet_resnet18_FPN_DBHead/checkpoint/model_best.pth --input_folder ./input --output_folder ./output --thre 0.7

After the execution is complete, you can view the results in the output folder:

Summarize

Today, we demonstrate how to train and test with DBNet. It doesn't look too difficult in general. Welcome to try it out.
Complete code: https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwwwww/85065029

DBNet combat: Detailed explanation of DBNet training and testing (pytorch)