Table of contents
1. Theoretical basis
Handwritten digit recognition is an important problem in the field of computer vision, and its purpose is to automatically recognize digits from handwritten digit images. Based on Bayesian recognizer and based on linear recognizer are two common handwritten digit recognition algorithms, this paper will introduce these two algorithms in detail from two aspects of mathematical formula and algorithm implementation.
Handwritten Digit Recognition Based on Bayesian Recognizer
Mathematical formula
The basic idea of the handwritten digit recognition algorithm based on Bayesian recognizer is to calculate the posterior probability of each digit according to Bayesian theorem, and select the digit with the highest posterior probability as the recognition result. Specifically, for a handwritten digit image $x$, its posterior probability of belonging to digit $i$ is:
$$
p(i|x)=\frac{p(x|i)p(i)}{p(x)}
$$
Among them, $p(x|i)$ represents the probability that the number $i$ generates an image $x$, $p(i)$ represents the prior probability of the number $i$ appearing, and $p(x)$ represents the image $x $ Probability of occurrence. Since $p(x)$ is the same for all numbers, it can be omitted, giving:
$$
p(i|x)\properly p(x|i)p(i)
$$
Among them, $\propto$ means "proportional to".
$p(x|i)$ can be calculated from samples of numbers $i$ in the training dataset, usually modeled using a Gaussian distribution, namely:
$$
p(x|i)=\frac{1}{(2\pi)^{d/2}|\Sigma_i|^{1/2}}\exp\left(-\frac{1}{2}(x-\mu_i)^T\Sigma_i^{-1}(x-\mu_i)\right)
$$
Among them, $d$ represents the dimension of the image, $\mu_i$ represents the mean vector of the number $i$, and $\Sigma_i$ represents the covariance matrix of the number $i$.
$p(i)$ can be calculated from the samples of the number $i$ in the training data set, that is, $p(i)=n_i/N$, where $n_i$ represents the number of samples of the number $i$, and $N$ represents the total Number of samples.
Algorithm implementation
The following is the specific implementation process of the handwritten digit recognition algorithm based on the Bayesian recognizer:
Preprocessing
First, some preprocessing operations need to be performed on the input handwritten digital image, including grayscale, binarization, morphological processing, and feature extraction. These operations can be realized by common image processing algorithms.
Train the Model
Next, you need to use the training dataset to train the model. The training process includes the following steps:
(1) For each number $i$ in the training data set, calculate its mean vector $\mu_i$ and covariance matrix $\Sigma_i$.
(2) Calculate the prior probability $p(i)$ of each number $i$.
Test recognition
For each handwritten digit image in the test dataset, it needs to be recognized using the trained model. The specific process is as follows:
(1) For the input handwritten digital image, preprocessing operation is performed.
(2) For each number $i$, calculate the value of $p(x|i)p(i)$, and select the number with the maximum value as the recognition result.
(3) Output the recognition result.
Recognition of Handwritten Digits Based on Linear Recognizer
Mathematical formula
The basic idea of the handwritten digit recognition algorithm based on a linear recognizer is to use a linear classifier to classify handwritten digit images. Specifically, for a handwritten digit image $x$, the probability of being classified as a digit $i$ can be expressed as:
$$
p(i|x)=\frac{\exp(w_i^Tx+b_i)}{\sum_{j=0}^9\exp(w_j^Tx+b_j)}
$$
Among them, $w_i$ represents the weight vector of the number $i$, and $b_i$ represents the bias item of the number $i$.
Algorithm implementation
The following is the specific implementation process of the handwritten digit recognition algorithm based on the linear recognizer:
Preprocessing
is similar to the handwritten digit recognition algorithm based on the Bayesian recognizer. First, some preprocessing operations are required on the input handwritten digit image, including grayscale, binarization, morphological processing, and feature extraction. These operations can be realized by common image processing algorithms.
Train the Model
Next, you need to use the training dataset to train the model. The training process includes the following steps:
(1) For each number $i$ in the training data set, calculate its weight vector $w_i$ and bias item $b_i$. Optimization algorithms such as gradient descent are usually used to minimize the loss function. For the specific implementation process, please refer to textbooks and papers related to machine learning.
Test recognition
For each handwritten digit image in the test dataset, it needs to be recognized using the trained model. The specific process is as follows:
(1) For the input handwritten digital image, preprocessing operation is performed.
(2) For each number $i$, calculate the value of $p(i|x)$, and select the number with the maximum value as the recognition result.
(3) Output the recognition result.
Both Bayesian-based and linear-based recognizers are common handwritten digit recognition algorithms. The Bayesian recognizer uses Bayesian theorem to calculate the posterior probability. The modeling complexity is high, but it has better recognition performance. Using a linear classifier to classify handwritten digit images based on a linear recognizer has low modeling complexity, but its recognition performance may be affected by the data distribution. The appropriate algorithm should be selected according to the actual application scenario.
2. Core program
global flag
global pos0
global x0 y0
pos=get(handles.WritingAxes,'currentpoint');
x=pos(1,1);
y=pos(1,2);
if flag && (pos(1,1)>=0&pos(1,1)<100) && (pos(1,2)>=0&pos(1,2)<100)
line(x,y, 'marker', '.','markerSize',18, 'LineStyle','-','LineWidth',2,'Color','Black');
if x>x0
stepX=0.1;
else
stepX=-0.1;
end
if y>y0
stepY=0.1;
else
stepY=-0.1;
end
X=x0:stepX:x;
if abs(x-x0)<0.01
Y=y0:stepY:y;
else
Y=(y-y0)*(X-x0)/(x-x0)+y0;
end
line(X ,Y, 'marker', '.','markerSize',18, 'LineStyle','-','LineWidth',2,'Color','Black');
x0=x;
y0=y;
pos0=pos;
else
flag=0;
end
%-------------------------------------------------------------------------
%-------------------------------------------------------------------------
function figure1_WindowButtonUpFcn(hObject, eventdata, handles)
%clc
%手写板实现程序---释放鼠标左键结束画线的程序
global flag
flag=0;
%global data
data=[];
Img=getframe(handles.WritingAxes);
imwrite(Img.cdata,'当前手写数字.bmp','bmp');
I=imread('当前手写数字.bmp');
I=rgb2gray(I);
I=im2bw(I);
imwrite(I,'当前手写数字.bmp','bmp');
I=imread('当前手写数字.bmp');
data=GetFeature(I);
%--------------------------------------------------------------------------
% --- Executes on selection change in popupmenuNUM.
function popupmenuNUM_Callback(hObject, eventdata, handles)
%-------------------------------------------------------------------------
% --- Executes during object creation, after setting all properties.
function popupmenuNUM_CreateFcn(hObject, eventdata, handles)
if ispc
set(hObject,'BackgroundColor','white');
else
set(hObject,'BackgroundColor',get(0,'defaultUicontrolBackgroundColor'));
end
%-------------------------------------------------------------------------
%-------------------------------------------------------------------------
function pushbuttonSave_Callback(hObject, eventdata, handles)
%global data
I=imread('当前手写数字.bmp');
data=GetFeature(I);
load template pattern;
num=get(handles.popupmenuNUM,'value');
switch num
case 1
msgbox('请选择数字类别再保存','提示');
case 2
pattern(1,1).num=pattern(1,1).num+1;
pattern(1,1).feature(:,pattern(1,1).num)=data;
case 3
pattern(1,2).num=pattern(1,2).num+1;
pattern(1,2).feature(:,pattern(1,2).num)=data;
case 4
pattern(1,3).num=pattern(1,3).num+1;
pattern(1,3).feature(:,pattern(1,3).num)=data;
case 5
pattern(1,4).num=pattern(1,4).num+1;
pattern(1,4).feature(:,pattern(1,4).num)=data;
case 6
pattern(1,5).num=pattern(1,5).num+1;
pattern(1,5).feature(:,pattern(1,5).num)=data;
case 7
pattern(1,6).num=pattern(1,6).num+1;
pattern(1,6).feature(:,pattern(1,6).num)=data;
case 8
pattern(1,7).num=pattern(1,7).num+1;
pattern(1,7).feature(:,pattern(1,7).num)=data;
case 9
pattern(1,8).num=pattern(1,8).num+1;
pattern(1,8).feature(:,pattern(1,8).num)=data;
case 10
pattern(1,9).num=pattern(1,9).num+1;
pattern(1,9).feature(:,pattern(1,9).num)=data;
case 11
pattern(1,10).num=pattern(1,10).num+1;
pattern(1,10).feature(:,pattern(1,10).num)=data;
end
save template pattern;
%--------------------------------------------------------------------------
%-------------------------------------------------------------------------
function pushbuttonFeature_Callback(hObject, eventdata, handles)
%global data
I=imread('当前手写数字.bmp');
data=GetFeature(I);
data=data';
fprintf('当前手写数字的特征如下所示:\n')
for i=1:25
if data(i)>0.1
data(i)=1;
else
data(i)=0;
end
fprintf(num2str(data(i)));
fprintf(' ');
if i/5==1 || i/5==2 || i/5==3 || i/5==4 || i/5==5
fprintf('\n');
end
end
%-------------------------------------------------------------------------
%--------------------------------------------------------------------------
function pushbuttonNUM_Callback(hObject, eventdata, handles)
load template;
num0=pattern(1,1).num;
disp(['数字0的样本数量为:',num2str(num0)])
num1=pattern(1,2).num;
disp(['数字1的样本数量为:',num2str(num1)])
num2=pattern(1,3).num;
disp(['数字2的样本数量为:',num2str(num2)])
num3=pattern(1,4).num;
disp(['数字3的样本数量为:',num2str(num3)])
num4=pattern(1,5).num;
disp(['数字4的样本数量为:',num2str(num4)])
num5=pattern(1,6).num;
disp(['数字5的样本数量为:',num2str(num5)])
num6=pattern(1,7).num;
disp(['数字6的样本数量为:',num2str(num6)])
num7=pattern(1,8).num;
disp(['数字7的样本数量为:',num2str(num7)])
num8=pattern(1,9).num;
disp(['数字8的样本数量为:',num2str(num8)])
num9=pattern(1,10).num;
disp(['数字9的样本数量为:',num2str(num9)])
%-------------------------------------------------------------------------
up2136
3. Simulation conclusion