Machine Learning Getting TF2.0 (KNN handwriting recognition algorithm)

In the form of machine learning

Machine learning is about what I will not explain too much, just below its simple points under category:

  • Supervised learning
    provide labels in the learning process, right or wrong label
    main purpose of this type of learning for regression and classification.

  • Unsupervised learning
    a set of samples provided unknown label, then go to these samples classification. Also known as clustering. Pattern recognition

  • Semi-supervised learning
    a way of combination of supervised learning and unsupervised learning.

  • Reinforcement learning (Alpha Dog Alpha Zero)
    learning intelligent system mapping from the environment to act as a reward signal.

  • Transfer learning

Learning methods previously described, said in some ways is summarized from the point of view of mathematical statistics to make something

  • Deep learning
    and deep learning is on the other hand, this biological simulation aspects. Bionics little meaning. Simulation of human neurons.

Simply do a mind map as follows:
Machine Learning

From zero to build a KNN

KNN (k - nearestneighbor) algorithm that is K Nearest Neighbor algorithm. The algorithm is somewhat similar analogy, it simply is something I now have a need to be classified, I have classified it with good things are compared, and who is most like, then this thing is what.
The following is a specific implementation:

from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
#虽然导入了tensorflow,但实际上并没有用到tensorflow的神经网络的框架,只是用它来得到并稍微处理了一下数据集

# 按照四步走来编写代码,四步即 准备数据->搭建网络->训练网络->测试网络
# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# 训练图像保存在一个uint8 类型的数组中,其形状为(60000, 28, 28),取值区间为[0, 255]。
# 我们需要将其变换为一个float32 数组,其形状为(60000, 28 * 28),取值范围为0~1。
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
test_images = test_images.reshape(10000, 28*28).astype('float32') / 255
# 对标签进行分类编码(one-hot编码)
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
# 搭建KNN并测试(此处我们用KNN来实现手写识别严格意义上来说,并不是搭建一个网络,这三个步骤都包含在下面)
# 且以下是用minst数据集的训练集和测试集分别来测试这种最简单的KNN算法的准确性,我的测试结果大概是90%
def knn_test(test_sum, train_sum):
    print("测试KNN算法的准备性")
    accracy = 0
    for i in range(test_sum):
        test_data = test_images[i]
        for j in range(train_sum):
            train_data = train_images[j]
            dist = get_dist(train_data, test_data)
            if j == 0:
                min_dist = dist
                min_index = j
            else:
                if dist < min_dist:
                    min_dist = dist
                    min_index = j
        predict = np.argmax(train_labels[min_index])
        real_data = np.argmax(test_labels[i])
        if predict == real_data:
            accracy += 1/test_sum
        print("预测:", predict, "实际:", real_data)
    print("准确性:", accracy)
# 求“距离”函数
def get_dist(train_data, test_data):
    data_pow = 0.
    for k in range(784):
        data_pow += math.pow(train_data[k]-test_data[k], 2)
    data_pow = math.sqrt(data_pow)
    return data_pow

# 测试
test_sum = 20
train_sum = 1000
knn_test(test_sum, train_sum)

optimization

In seeking "distance" when the loop code can actually be optimized: only the following line of code:

min_index = np.argmin(np.sqrt(np.sum(np.square(test_data-train_data), axis=1)))

Optimized the code as a whole:

from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
from PIL import Image
#虽然导入了tensorflow,但实际上并没有用到tensorflow的神经网络的框架,只是用它来得到并稍微处理了一下数据集

# 按照四步走来编写代码,四步即 准备数据->搭建网络->训练网络->测试网络
# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# 训练图像保存在一个uint8 类型的数组中,其形状为(60000, 28, 28),取值区间为[0, 255]。
# 我们需要将其变换为一个float32 数组,其形状为(60000, 28 * 28),取值范围为0~1。
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
test_images = test_images.reshape(10000, 28*28).astype('float32') / 255
# 对标签进行分类编码(one-hot编码)
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
# 搭建KNN并测试(此处我们用KNN来实现手写识别严格意义上来说,并不是搭建一个网络,这三个步骤都包含在下面)
# 且以下是用minst数据集的训练集和测试集分别来测试这种最简单的KNN算法的准确性,我的测试结果大概是80%
def knn_test(test_sum, train_sum):
    print("测试KNN算法的准备性")
    accracy = 0
    for i in range(test_sum):
        test_data = test_images[i]
        train_data = train_images[0:train_sum, :]
        # 优化如下
        min_index = np.argmin(np.sqrt(np.sum(np.square(test_data-train_data), axis=1)))

        predict = np.argmax(train_labels[min_index])
        real_data = np.argmax(test_labels[i])
        if predict == real_data:
            accracy += 1/test_sum
        print("预测:", predict, "实际:", real_data)
    print("准确性:", accracy)
# 测试
test_sum = 200
train_sum = 50000
knn_test(test_sum, train_sum)

Handwriting recognition demo

These are just using MINST dataset training and test sets to do a little test, if we make the code identification number to write it? In fact, only slightly changed the code to just.
But first we have to own a digital painting, you can use the drawing board drew a number, and, if the resulting 28 * 28 pixels, because before the code is written based on this picture, of course you can change the code but simplicity, I directly change the picture.
Here I is a number of windows built drawing board painted:
image
identification code is as follows:

from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
from PIL import Image

# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
train_labels = tf.keras.utils.to_categorical(train_labels)

# 识别自己写的数字的代码
def image_reshape(image_address):   # 将28*28像素图片转化为28*28数组
    #注意此处输入的image需要是28*28像素的
    image = Image.open(image_address).convert('L')  # 用PIL中的Image.open打开图像
    image.save("test01.png")
    # .convert('L')是将图片灰度化处理,原本是彩色图片,也就是维度是(28,28,3),将其变为(28,28)
    image_arr = np.array(image)  # 转化成numpy数组

    image_arr = np.reshape(image_arr, 28 * 28).astype('float32') / 255
    #再将其变换为一个float32 数组,其形状为(784,),取值范围为0~1。
    print(image_arr)
    return image_arr
def knn_real(image_address,train_sum):
    test_data = image_reshape(image_address)
    train_data = train_images[0:train_sum, :]
    min_index = np.argmin(np.sqrt(np.sum(np.square(test_data - train_data), axis=1)))
    predict = np.argmax(train_labels[min_index])
    print("预测:", predict)

train_sum = 40000
knn_real("test.png", train_sum)

Published 17 original articles · won praise 6 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_44307764/article/details/102353344