Article Directory
In the form of machine learning
Machine learning is about what I will not explain too much, just below its simple points under category:
-
Supervised learning
provide labels in the learning process, right or wrong label
main purpose of this type of learning for regression and classification. -
Unsupervised learning
a set of samples provided unknown label, then go to these samples classification. Also known as clustering. Pattern recognition -
Semi-supervised learning
a way of combination of supervised learning and unsupervised learning. -
Reinforcement learning (Alpha Dog Alpha Zero)
learning intelligent system mapping from the environment to act as a reward signal. -
Transfer learning
Learning methods previously described, said in some ways is summarized from the point of view of mathematical statistics to make something
- Deep learning
and deep learning is on the other hand, this biological simulation aspects. Bionics little meaning. Simulation of human neurons.
Simply do a mind map as follows:
From zero to build a KNN
KNN (k - nearestneighbor) algorithm that is K Nearest Neighbor algorithm. The algorithm is somewhat similar analogy, it simply is something I now have a need to be classified, I have classified it with good things are compared, and who is most like, then this thing is what.
The following is a specific implementation:
from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
#虽然导入了tensorflow,但实际上并没有用到tensorflow的神经网络的框架,只是用它来得到并稍微处理了一下数据集
# 按照四步走来编写代码,四步即 准备数据->搭建网络->训练网络->测试网络
# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# 训练图像保存在一个uint8 类型的数组中,其形状为(60000, 28, 28),取值区间为[0, 255]。
# 我们需要将其变换为一个float32 数组,其形状为(60000, 28 * 28),取值范围为0~1。
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
test_images = test_images.reshape(10000, 28*28).astype('float32') / 255
# 对标签进行分类编码(one-hot编码)
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
# 搭建KNN并测试(此处我们用KNN来实现手写识别严格意义上来说,并不是搭建一个网络,这三个步骤都包含在下面)
# 且以下是用minst数据集的训练集和测试集分别来测试这种最简单的KNN算法的准确性,我的测试结果大概是90%
def knn_test(test_sum, train_sum):
print("测试KNN算法的准备性")
accracy = 0
for i in range(test_sum):
test_data = test_images[i]
for j in range(train_sum):
train_data = train_images[j]
dist = get_dist(train_data, test_data)
if j == 0:
min_dist = dist
min_index = j
else:
if dist < min_dist:
min_dist = dist
min_index = j
predict = np.argmax(train_labels[min_index])
real_data = np.argmax(test_labels[i])
if predict == real_data:
accracy += 1/test_sum
print("预测:", predict, "实际:", real_data)
print("准确性:", accracy)
# 求“距离”函数
def get_dist(train_data, test_data):
data_pow = 0.
for k in range(784):
data_pow += math.pow(train_data[k]-test_data[k], 2)
data_pow = math.sqrt(data_pow)
return data_pow
# 测试
test_sum = 20
train_sum = 1000
knn_test(test_sum, train_sum)
optimization
In seeking "distance" when the loop code can actually be optimized: only the following line of code:
min_index = np.argmin(np.sqrt(np.sum(np.square(test_data-train_data), axis=1)))
Optimized the code as a whole:
from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
from PIL import Image
#虽然导入了tensorflow,但实际上并没有用到tensorflow的神经网络的框架,只是用它来得到并稍微处理了一下数据集
# 按照四步走来编写代码,四步即 准备数据->搭建网络->训练网络->测试网络
# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# 训练图像保存在一个uint8 类型的数组中,其形状为(60000, 28, 28),取值区间为[0, 255]。
# 我们需要将其变换为一个float32 数组,其形状为(60000, 28 * 28),取值范围为0~1。
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
test_images = test_images.reshape(10000, 28*28).astype('float32') / 255
# 对标签进行分类编码(one-hot编码)
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
# 搭建KNN并测试(此处我们用KNN来实现手写识别严格意义上来说,并不是搭建一个网络,这三个步骤都包含在下面)
# 且以下是用minst数据集的训练集和测试集分别来测试这种最简单的KNN算法的准确性,我的测试结果大概是80%
def knn_test(test_sum, train_sum):
print("测试KNN算法的准备性")
accracy = 0
for i in range(test_sum):
test_data = test_images[i]
train_data = train_images[0:train_sum, :]
# 优化如下
min_index = np.argmin(np.sqrt(np.sum(np.square(test_data-train_data), axis=1)))
predict = np.argmax(train_labels[min_index])
real_data = np.argmax(test_labels[i])
if predict == real_data:
accracy += 1/test_sum
print("预测:", predict, "实际:", real_data)
print("准确性:", accracy)
# 测试
test_sum = 200
train_sum = 50000
knn_test(test_sum, train_sum)
Handwriting recognition demo
These are just using MINST dataset training and test sets to do a little test, if we make the code identification number to write it? In fact, only slightly changed the code to just.
But first we have to own a digital painting, you can use the drawing board drew a number, and, if the resulting 28 * 28 pixels, because before the code is written based on this picture, of course you can change the code but simplicity, I directly change the picture.
Here I is a number of windows built drawing board painted:
identification code is as follows:
from tensorflow.keras import datasets
import tensorflow as tf
import math
import numpy as np
from PIL import Image
# 准备数据
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images.reshape(60000, 28*28).astype('float32') / 255
train_labels = tf.keras.utils.to_categorical(train_labels)
# 识别自己写的数字的代码
def image_reshape(image_address): # 将28*28像素图片转化为28*28数组
#注意此处输入的image需要是28*28像素的
image = Image.open(image_address).convert('L') # 用PIL中的Image.open打开图像
image.save("test01.png")
# .convert('L')是将图片灰度化处理,原本是彩色图片,也就是维度是(28,28,3),将其变为(28,28)
image_arr = np.array(image) # 转化成numpy数组
image_arr = np.reshape(image_arr, 28 * 28).astype('float32') / 255
#再将其变换为一个float32 数组,其形状为(784,),取值范围为0~1。
print(image_arr)
return image_arr
def knn_real(image_address,train_sum):
test_data = image_reshape(image_address)
train_data = train_images[0:train_sum, :]
min_index = np.argmin(np.sqrt(np.sum(np.square(test_data - train_data), axis=1)))
predict = np.argmax(train_labels[min_index])
print("预测:", predict)
train_sum = 40000
knn_real("test.png", train_sum)