MNIST handwritten digit recognition - image analysis method to achieve two classification

Introduction to Handwritten Digits Task Recognition

The MNIST data set comes from the National Institute of Standards and Technology (NIST for short), with a total of 70,000 pictures, of which 60,000 are training sets, consisting of handwritten digits of 250 different people, 50% of which are High school students, the other 50% are Census Bureau staff, and the test set of 10,000 images is also handwritten by the same proportion. This data set is a classic introductory learning data set in the field of deep learning. Some examples of pictures of handwritten numbers are as follows:

1. Download the MNIST dataset

# 创建datasets目录
import os
datasets_dir = '../datasets'
if not os.path.exists(datasets_dir):
    os.makedirs(datasets_dir)

os is a function that provides some functions related to the operating system to facilitate the use of the toolkit through import.

1. Get the current file path: os.getcwd()
2. Create a new folder: os.mkdir()
3. Jump to the current file path: os.chdir (an existing directory)
4. Get all files under the path Name: os.listdir(path)
5. Return whether it is a folder: os.path.isdir()
6. Return whether it is a file: os.path.isfile()
7. Split the file path: os.path. split()
8. Modify the file name: os.rename()

# 下载数据集，由于使用的数据源在华为云OBS中，所以本代码只能在华为云 ModelArts 中运行
import moxing as mox
if not os.path.exists(os.path.join(datasets_dir, 'MNIST_Data.zip')):
    mox.file.copy('obs://modelarts-labs-bj4-v2/course/hwc_edu/python_module_framework/datasets/mindspore_data/MNIST_Data.zip', 
                  os.path.join(datasets_dir, 'MNIST_Data.zip'))
    os.system('cd %s; unzip MNIST_Data.zip' % (datasets_dir))

moxing : MoXing (model) is a network model development API provided by HUAWEI CLOUD Deep Learning Service. Compared with native APIs such as TensorFlow and MXNet, MoXing API simplifies model code writing and can automatically obtain high-performance distributed execution capabilities.

2. Read the MNIST dataset

import numpy as np
import mindspore.dataset as ds

# 读取完整训练样本和测试样本
datasets_dir = '../datasets'
mnist_ds_train = ds.MnistDataset(os.path.join(datasets_dir, "MNIST_Data/train"))
mnist_ds_test = ds.MnistDataset(os.path.join(datasets_dir, "MNIST_Data/test"))

# 为训练集创建字典迭代器，提取训练图片和标签
items_train = mnist_ds_train.create_dict_iterator(output_numpy=True)
train_data = np.array([i for i in items_train])
images_train = np.array([i["image"] for i in train_data])
labels_train = np.array([i["label"] for i in train_data])

# 为测试集创建字典迭代器，提取测试图片和标签
items_test = mnist_ds_test.create_dict_iterator(output_numpy=True)
test_data = np.array([i for i in items_test])
images_test = np.array([i["image"] for i in test_data])
labels_test = np.array([i["label"] for i in test_data])

print("训练集规模：")  # 60000个训练样本
print("图片：{}, 标签：{}".format(images_train.shape, labels_train.shape))
print("测试集规模：")  # 10000个训练样本
print('图片：{}, 标签：{}'.format(images_test.shape, labels_test.shape))

Training set size:

Image: (60000, 28, 28, 1), Label: (60000,)

Test set size:

Image: (10000, 28, 28, 1), Label: (10000,)

3. View some pictures and labels

from PIL import Image

batch_img = np.squeeze(images_train[0])
print("图像的大小： ", batch_img.shape)
print("图像的标签： ", labels_train[0])
Image.fromarray(batch_img)  # 转成PIL格式进行图片显示

Image size: (28, 28)

Image tags: 9

batch_img = np.squeeze(images_train[1])
print("图像的大小： ", batch_img.shape)
print("图像的标签： ", labels_train[1])
Image.fromarray(batch_img)  # 转成PIL格式进行图片显示

Image size: (28, 28)

Image tags: 4

Image analysis method realizes binary classification of handwritten digits

(Use the traditional software programming method, the method of analyzing the statistical characteristics of the image, rather than the AI method to realize handwritten digit recognition.)

        The handwritten digit recognition task is to predict each 28*28 size picture and judge which one of the numbers 0-9 the picture is, so this is a 10-category task.
        The conventional way to do scientific research is to make some assumptions or simplifications on a problem first, try to solve this simple problem, and wait for the simple problem to be better solved, then reduce the assumptions, and try to solve more realistic and more complex problems . Following this method, first assume that the handwritten digit recognition task only needs to recognize two numbers, 0 and 1, and try to solve this simple binary classification problem first, and then solve the 10-classification problem.
        There are many ways to realize the binary classification of handwritten numbers 0 and 1. First, non-machine learning methods are used, such as traditional programming methods based on image analysis to realize the binary classification of numbers 0 and 1.

1. Prepare a data set of handwritten digits 0 and 1

Since the entire MNIST data set contains all pictures from 0 to 9, the current research is on the simplified binary classification problem of 0 and 1, so first select all pictures of handwritten digits 0 and 1 from the entire data set, and also need to distinguish training set and test set.

import os
import numpy as np
import mindspore.dataset as ds

datasets_dir = '../datasets'
if not os.path.exists(datasets_dir):
    os.makedirs(datasets_dir)
    
import moxing as mox
if not os.path.exists(os.path.join(datasets_dir, 'MNIST_Data.zip')):
    mox.file.copy('obs://modelarts-labs-bj4-v2/course/hwc_edu/python_module_framework/datasets/mindspore_data/MNIST_Data.zip', 
                  os.path.join(datasets_dir, 'MNIST_Data.zip'))
    os.system('cd %s; unzip MNIST_Data.zip' % (datasets_dir))

# 读取完整训练样本和测试样本
mnist_ds_train = ds.MnistDataset(os.path.join(datasets_dir, "MNIST_Data/train"))
mnist_ds_test = ds.MnistDataset(os.path.join(datasets_dir, "MNIST_Data/test"))

# 为训练集创建字典迭代器，提取训练图片和标签
items_train = mnist_ds_train.create_dict_iterator(output_numpy=True)
train_data = np.array([i for i in items_train])
images_train = np.array([i["image"] for i in train_data])
labels_train = np.array([i["label"] for i in train_data])

# 为测试集创建字典迭代器，提取测试图片和标签
items_test = mnist_ds_test.create_dict_iterator(output_numpy=True)
test_data = np.array([i for i in items_test])
images_test = np.array([i["image"] for i in test_data])
labels_test = np.array([i["label"] for i in test_data])

Extract 0 and 1 data

# 提取 0 ，1 数据
train_zeros = images_train[labels_train==0]
train_ones = images_train[labels_train==1]

test_zeros = images_test[labels_test==0]
test_ones = images_test[labels_test==1]

print('数字0，训练集规模：', len(train_zeros), '，测试集规模：', len(test_zeros))
print('数字1，训练集规模：', len(train_ones), '，测试集规模：', len(test_ones))

Number 0, training set size: 5923, test set size: 980

Number 1, training set size: 6742, test set size: 1135

2. Perform sample analysis

View the overall profile of the sample

# 查看30张数字0的图片
from PIL import Image

batch_zeros = np.squeeze(train_zeros[:30])
Image.fromarray(np.hstack(batch_zeros))

# 查看30张数字1的图片
batch_ones = np.squeeze(train_ones[:30])
Image.fromarray(np.hstack(batch_ones))

View details of a single image

As mentioned above, each picture in the MNIST dataset is 28*28 in size. After using the python module to read the picture file, the picture can be represented by a 28*28 matrix. Let’s check the specific values in this matrix.

# 首先安装必要的库
!pip install jinja2

# 查看图片的像素值
import pandas as pd

single_image = np.squeeze(images_train[0])
df = pd.DataFrame(single_image)
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

▲The matrix value of the picture number 4

In the original picture of handwritten numbers, the background of the picture is black, and the corresponding pixel value is 0. The color of the digital stroke is white, and the corresponding pixel value is 255. For the convenience of display, the color shown in the above picture has been flipped from black to white.

A phenomenon can be found: each value in the matrix represents a pixel in the picture, where there is no stroke is a 0 pixel value, and where there is a stroke is a non-zero pixel, and according to common sense, in a picture of the same size, the stroke of the number 0 The area is generally larger than the stroke area of the number 1.
This leads to an idea: Can the numbers 0 and 1 be distinguished according to the proportion of non-zero pixels generated by strokes in the entire image?
First count the average proportion of the non-zero pixels of the number 0 and the number 1 in the entire image. Since the proportion of the non-zero pixels of the number 0 is generally larger than that of the number 1, it is only necessary to find a suitable non-zero pixel proportion. Compared with the threshold (represented by variable th), if the proportion of non-zero pixels in a picture is greater than th, the picture can be classified as 0, otherwise it can be classified as 1. In order to realize this idea, we can then use the traditional programming method to realize it step by step.

3. Define non-zero pixel proportion function

def calc_nonzero_ratio(img):
    '''实现方法：使用np.count_nonzero函数统计矩阵中的非零像素个数，除以图像大小，即可得到非零像素占比'''
    img = np.asarray(img)
    return np.count_nonzero(img) / img.size

The average value of the proportion of non-zero pixels with statistics 0

zeros_ratio = 0
for zero in train_zeros:
    zeros_ratio += calc_nonzero_ratio(zero)
zeros_ratio = zeros_ratio / len(train_zeros)
print('数字0的非零像素占比均值：', zeros_ratio)

The average proportion of non-zero pixels of the number 0: 0.24486587223104644

The average value of the proportion of non-zero pixels in statistics 1

ones_ratio = 0
for one in train_ones:
    ones_ratio += calc_nonzero_ratio(one)
ones_ratio = ones_ratio / len(train_ones)
print('数字1的非零像素占比均值：', ones_ratio)

The average value of non-zero pixel proportion of number 1: 0.10949749968216267

4. Set the pixel proportion classification threshold

First adopt a simple strategy to set the classification threshold, directly take the average of the proportion of non-zero pixels of the number 0 and the number 1, and take 4 effective decimal places.

th = round((zeros_ratio + ones_ratio) / 2, 4)
print('分类阈值：', th)

Classification Threshold: 0.1772

5. Define the classification prediction function

This classification method is very simple. If the proportion of non-zero pixels in a certain picture is greater than th, the picture is classified as 0, otherwise it is classified as 1.

def predict(img):
    if calc_nonzero_ratio(img) > th:
        pred_label = 0
    else:
        pred_label = 1
    return pred_label

6. Accuracy statistics

Predict the test sample of the number 0, and count the accuracy

zero_right_count = 0
for zero in test_zeros:
    pred_result = predict(zero)
    if pred_result == 0:
        zero_right_count += 1
print('数字0测试样本准确率：%.4f' % (float(zero_right_count) / len(test_zeros)))

Number 0 test sample accuracy: 0.9571

Predict the test sample of number 1 and count the accuracy

one_right_count = 0
for one in test_ones:
    pred_result = predict(one)
    if pred_result == 1:
        one_right_count += 1
print('数字1测试样本准确率：%.4f' % (float(one_right_count) / len(test_ones)))

Number 1 test sample accuracy: 0.9762

Statistical Comprehensive Accuracy

print('测试样本综合准确率：%.4f' % (float(zero_right_count + one_right_count) / (len(test_zeros) + len(test_ones))))

Comprehensive accuracy of test samples: 0.9674

As shown above, using the very simple strategy of "statistical proportion of non-zero pixels and comparing thresholds", it is also possible to classify handwritten digits 0 and 1. The classification accuracy rates of digits 0 and 1 are 95.71% and 97.62% respectively. , the comprehensive accuracy rate reaches 96.74% .