Datawhale zero-based introductory CV competition-Task01 problem understanding

Insert picture description here

1 Question understanding

  • Contest title: street character recognition of CV
  • The goal of the competition: Through this competition, you can guide everyone into the world of computer vision, mainly for contestants to get started with the visual competition and improve the ability of data modeling.
  • Challenge task: The challenge is based on character recognition in computer vision and requires the contestants to predict the street character encoding. This is a typical character recognition problem.
    In order to simplify the difficulty of the question, the question data uses the public data set SVHN , so you can choose many corresponding papers as a reference for ideas.

1.1 Question data

The contest question uses street characters as the contest question data. The data set is visible and downloadable after registration. The data comes from the collected SVHN street characters and has been anonymously sampled.
Insert picture description here

Note: According to the rules of the competition, all contestants can only use the data set given in the competition to complete training, and cannot use the SVHN original data set for training. After the competition, the Top players will be reviewed for code, and the players who violate the rules will clear the ranking results.
The training set data includes 3W photos, and the validation set data includes 1W photos. Each photo includes color images and corresponding coding categories and specific locations. To ensure the fairness of the game, test set A includes 4W photos, and test set B includes 4W photos.
It should be noted that this contest requires the players to recognize all the characters in the picture. In order to reduce the difficulty of the competition, we provide the position boxes of all characters in the training set and verification set.

1.2 Data label

For each picture of the training data, the corresponding coding label and the position of the specific character box will be given (the character position is given in the training set and the verification set), which can be used for model training:

Field Description
top The upper left corner coordinate X
height Character height
left Coordinate of upper left corner Y
width Character width
label Character Encoding

The coordinates of the characters are as follows:
Insert picture description here
In the game data (training set and validation set), the same picture may include one or more characters. Therefore, in the JSON annotation of the game data, there will be two-character border information:
Insert picture description here

1.3 Evaluation indicators

The results submitted by the contestants are compared with the codes of the actual pictures, and the overall recognition accuracy of the codes is used as the evaluation index. Any character error is an error. The larger the final evaluation index, the better. The specific calculation formula is as follows:

score = number of correct code recognition test set pictures number score=\frac{\text {number of correct code recognition recognition}}{\text {number of test set pictures}} score= Number of images in the test set  The correct number of coded identification 

1.4 Read data

In order to facilitate everyone to read the data, here we give the way to read the tags in JSON:

import json
import cv2
import numpy as np
import matplotlib.pyplot as plt

train_json = json.load(open('../input/train.json'))

# 数据标注处理
def parse_json(d):
    arr = np.array([
        d['top'], d['height'], d['left'],  d['width'], d['label']
    ])
    arr = arr.astype(int)
    return arr
    
img = cv2.imread('../input/train/000000.png')
arr = parse_json(train_json['000000.png'])

plt.figure(figsize=(10, 10))
plt.subplot(1, arr.shape[1]+1, 1)
plt.imshow(img)
plt.xticks([]); plt.yticks([])

for idx in range(arr.shape[1]):
    plt.subplot(1, arr.shape[1]+1, idx+2)
    plt.imshow(img[arr[0, idx]:arr[0, idx]+arr[1, idx],arr[2, idx]:arr[2, idx]+arr[3, idx]])
    plt.title(arr[4, idx])
    plt.xticks([]); plt.yticks([])

Insert picture description here

1.5 Problem solving ideas

Analysis of competition questions: The essence of the competition questions is a classification problem, which needs to recognize the characters of the picture. However, the number of characters contained in different pictures in the data picture given by the question is different, as shown in the figure below. Some pictures have 2 characters, some pictures have 3 characters, and some pictures have 4 characters.
Insert picture description here

– Simple entry idea: fixed-length character recognition.
The contest question can be abstracted as a fixed-length character recognition problem. In most images in the contest question data set, the number of characters is 2-4, and the maximum number of characters is 6.
Therefore, all images can be abstracted as a six-character recognition problem, the character 23 is filled with 23XXXX, and the character 231 is filled with 231XXX.
Insert picture description hereAfter filling, the original contest question can simplify the classification problem of 6 characters. In the classification of each character, 11 categories are classified. If the classification is a filled character, it indicates that the character is empty.

– Professional character recognition ideas: variable-length character recognition
Insert picture description here
In character recognition research, there are specific methods to solve this variable-length character recognition problem, and the CRNN character recognition model is more typical.
The image data given in this contest is relatively regular and can be regarded as a word or a sentence.

– Professional classification idea: detection and recognition
. The position of the characters in all the pictures in the training set and verification set has been given in the question data. Therefore, the position of the characters can be recognized first and completed by the idea of ​​object detection.
Insert picture description here
This kind of thinking requires contestants to construct a character detection model to recognize characters in the test set. Players can refer to the object detection model SSD or YOLO to complete.

1.6 Summary

In summary, although this question is a simple character recognition problem, there are a variety of solutions that can be applied to various models in the computer vision field, which is very suitable for introductory learning.
The difficulty of the three solutions ranges from low to high, so it is recommended that students who are starting to learn can first learn the idea of ​​fixed-length character recognition. In the content after the document, fixed-length character recognition will also be used as an example to take you gradually into computer vision.

reference

Computer Vision Practice (Street View Character Encoding Recognition)
datawhalechina

Datawhale is an open source organization focusing on data science and AI. It brings together excellent learners from many universities and well-known companies in many fields, and brings together a group of team members with open source spirit and exploratory spirit. With the vision of "for the learner, grow with learners", Datawhale encourages true self-expression, openness and tolerance, mutual trust and mutual assistance, the courage to try and make mistakes, and the courage to take responsibility. At the same time, Datawhale uses the concept of open source to explore open source content, open source learning and open source solutions, empower talent training, help talent growth, and establish a connection between people and people, people and knowledge, people and enterprises, and people and the future.

Guess you like

Origin blog.csdn.net/OuDiShenmiss/article/details/106245105