Make your own mnist set when using tensorflow training model (with code)
Exploration process
(Ps: The first written, badly written forgive me!)
MNIST collection is a set of training images is handwritten digits with rotten, but in practice we often use their own data set, you need to own the picture is converted into data sets mnist form of data, or by other methods (used before keras, a third-party library, although the library is useful, simplified a lot of steps may be because of my limited ability to always trained model accuracy and the loss is not ideal, if you have small partners also want to know how to teach me down, I just started really good food (^ △ ^)).
The neural network data fed two main methods: introducing local, load directly.
Probably that the next mnist set as follows by four compressed packages:
After extracting this file is a IDX3-UBYTE file, which is stored in a binary units, so as a non-professional, we are not open.
So how do we understand it inside the storage form of it? as follows:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('mnist_data',one_hot=True)
test_x = mnist.test.images[:3000]
test_y = mnist.test.labels[:3000]
print(len(test_x)," ",len(test_x[0]))
print(test_x)
Thus it can be obtained which train-images.idx3-ubyte of some data:
3000 784
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
You can find it by the 3000 list, each of which is a vector picture 784 = 28 * 28.
We look the same test_y :
3000 10
[[0. 0. 0. ... 1. 0. 0.]
[0. 0. 1. ... 0. 0. 0.]
[0. 1. 0. ... 0. 0. 0.]
...
[0. 1. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[1. 0. 0. ... 0. 0. 0.]]
3000 meaning as above, 10 is its class label which category which index is 1, and 0 otherwise.
Here we begin to learn to make your own mnist set, and saved as csv or data can be.
Code (python)
# coding:utf-8
import cv2
import os
import random
import pandas as pd
def progress_bar(i):
a = int(i)*10
b = '='*int(i)
c = '->'
d = '·'*(10-int(i))
if(i==0):
print("******执行开始******")
print(a,'\t',"%","[",b+c+d,"]")
if (i == 10):
print("******执行结束******")
def mnist_change(path,img_width,img_height):
images = []
labels = []
tags = os.listdir(path)
n = 0
for tag in tags:
_tag_ = os.listdir(path+tag)
n += len(_tag_)
key = 0
for tag in tags:
_tag_ = os.listdir(path+tag)
i = 0
for image in _tag_:
img_path = path+tag+'/'+image
img = cv2.imread(img_path)
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (img_width,img_height),interpolation=cv2.INTER_CUBIC)
img_data = []
# images中的一个元素
for data in img:
img_data.extend(data)
# labels中的一个元素
zero_ = [0 for _ in range(len(tags))]
zero_[i] = 1
if(key == 0):
images.append(img_data)
labels.append(zero_)
else:
rand_i = random.randint(0,len(labels)-1)
temp1,temp2 = images[rand_i],labels[rand_i]
images[rand_i],labels[rand_i] = img_data,zero_
images.append(temp1)
labels.append(temp2)
if(key%(n//10)==0 and key/(n//10)<=10):
progress_bar(key/(n//10))
key += 1
i += 1
return images,labels
def text_save(file, data):
data = pd.DataFrame(data)
data.to_csv(file,index=None)
print("保存文件成功")
if __name__ == '__main__':
# 自定义参数
path = 'data/'
width,height = 200,200
images,labels = mnist_change(path,width,height)
text_save('images.csv',images)
text_save('labels.csv',labels)
'''
图片存放格式
data
n0
4546asd.jpg
asdw4145.jpg
n1
asd4.jpg
'''
result:
******执行开始******
0 % [ ->·········· ]
10 % [ =->········· ]
20 % [ ==->········ ]
30 % [ ===->······· ]
40 % [ ====->······ ]
50 % [ =====->····· ]
60 % [ ======->···· ]
70 % [ =======->··· ]
80 % [ ========->·· ]
90 % [ =========->· ]
100 % [ ==========-> ]
******执行结束******
保存文件成功
保存文件成功
idea
This is the first blog I wrote, the level of both the content and the code may not be very high, we apologize.
If you have any ideas, we welcomed the comments section exchanges.
I built a small group, we welcome the exchange.