Data Science old driver Online car series: How to Train yourself to a hot dog identification model

Recap

US drama "Silicon Valley" I believe you have not seen how, we might have no idea of artificial intelligence techniques to identify hot dog was one of Silicon Valley's most profitable.
Last year, HBO released the official Not Hotdog applications, support for iOS and Android, is said by TensorFlow, Keras and React Native to build, but the source did not disclose.
Not_Hotdog_APP_

Today, we need to do all this drama is there in the fourth quarter which let Yang Jiancheng millionaire model: hot dog recognition model. This time, let Ali cloud data driver to take you along old science, machine learning platform for training their own hot dogs pai recognition model, breaking the technology blockade. So that you became CEO, marry white Formica / rich handsome, took to the pinnacle of life .

Step 1: open PAI Service

We must first of its profits. Without a good tool to want to train a good model is ridiculous.
_MEME

Here we give a simple guide how quickly opened aliyun proprietary machine-learning platform PAI. Click into this after the link, we click to buy it.
DAD_

East 2 machines now are doing activities in Shanghai, the price is very touching . Students have the opportunity to start to consider this.

Step 2: PAI into the console, you create instances of DSW

After the opening of the service, we enter the PAI console , and then click on the left side of modeling DSW-Notebook
DAD_

After entering DSW-Notebook model, we click to create a new instance.
_

Here billing billing when we choose to buy, fill in the instance name, last write on the allocation of resources. This is an experiment we do not need to upload large files, so we can not set up a temporary storage resource allocation. After clicking OK, we'll see an instance is created in the. Wait a few minutes, this instance will be created. Then we click open, you can enter this instance the.
_

After entering we can see our way of DSW environment.
DSW_

Step 3: Code and data upload training

We start here to download my carefully prepared for you good code and the training dataset archive. Once downloaded to a local, you can click the Upload button to upload your files come up.
_

After a successful upload, we open Terminal into this path, then enter

$ unzip ./not_hotdog.zip # 解压整个文件
$ cd not_hotdog.zip
$ unzip seefood.zip # 解压训练数据集

Then we will see a folder has been obediently lying Explorer inside the left side of our children.
_MEME

Step 4: start training

Now for our hard core part of the code impress us directly. We can pull directly run

#!/usr/bin/env python
# coding: utf-8

# # Import dependencies 导入依赖

# In[1]:


import numpy as np
import pandas as pd
import os

import tensorflow as tf
rand_state = 42 # 顺便定义一个随机种子 
tf.set_random_seed(rand_state)
np.random.seed(rand_state)

from skimage import exposure
import cv2
import glob
import time
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model


# # 图像预处理的函数们

# In[2]:


def rotateImage(img, angle):
    '''
    img:三通道的图片
    angle:随机角度
    
    本功能是样本增强功能,对图片样本进行随机的旋转缩放
    
    return:返回一个变换后的图片
    
    '''
    
    (rows, cols, ch) = img.shape   # 得到源图片尺寸
    
    #第一个参数旋转中心,第二个参数旋转角度,第三个参数:缩放比例
    M = cv2.getRotationMatrix2D((cols/2,rows/2), angle, 1)
    
    return cv2.warpAffine(img, M, (cols,rows))  # 图像进行上面操作后生成的图像
    
    
def loadBlurImg(path, imgSize):
    '''
    path:图片路径,字符串
    imgsize:图片的尺寸,二元组,元素都是int
    '''
    img = cv2.imread(path)  # 读取图片数据
    angle = np.random.randint(0, 360)  # 生成0,360之间生成随机数,离散均匀随机,整形
    img = rotateImage(img, angle)   # 图片随机旋转,缩放
    img = cv2.blur(img,(5,5))       # 每5*5的尺寸进行均值模糊
    img = cv2.resize(img, imgSize)  # 图片按照尺寸缩放   
    return img


def loadImgClass(classPath, classLable, classSize, imgSize):
    '''
    classPath:传入图片的路径,list集合
    classLable:图片的类别,数值int
    classSize:样本数量
    imgsize:图片的尺寸,二元组,元素都是int
    
    return:返回classSize个样本及标签
    
    本函数从样本地址中生成classSize个数据,样本是经过旋转,缩放等变换的,图片规格是imgsize
    
    '''
    x = []
    y = []
    
    for path in classPath:
        img = loadBlurImg(path, imgSize)   # 加载地址中的图片并进行样本增强,生成imgsize大的图片    
        x.append(img)
        y.append(classLable)
        
    while len(x) < classSize:
        randIdx = np.random.randint(0, len(classPath))
        img = loadBlurImg(classPath[randIdx], imgSize)
        x.append(img)
        y.append(classLable)
        
    return x, y

def loadData(img_size, classSize, hotdogs, notHotdogs):    
    '''
    img_size:要返回图片的大小,int
    classSize:正例,负例样本数量,int
    hotsdogs,notHotdogs:正例,负例样本地址,都是个list
    
    return;返回训练样本及对应的标签
    
    本函数读取数据并返回样本及标签
    '''
    
    imgSize = (img_size, img_size)     # 要输入图片的尺寸
    xHotdog, yHotdog = loadImgClass(hotdogs, 0, classSize, imgSize)   # 生成正样本,classSize个
    xNotHotdog, yNotHotdog = loadImgClass(notHotdogs, 1, classSize, imgSize)  # 生成负样本,classSize个
    print("There are", len(xHotdog), "hotdog images")
    print("There are", len(xNotHotdog), "not hotdog images")
    
    X = np.array(xHotdog + xNotHotdog)      
    y = np.array(yHotdog + yNotHotdog)
    
    return X, y

def toGray(images):
    
    '''
    样本灰度转换,生成后的图片是一个通道的
    '''
    # rgb2gray converts RGB values to grayscale values by forming a weighted sum of the R, G, and B components:
    # 0.2989 * R + 0.5870 * G + 0.1140 * B 
    # source: https://www.mathworks.com/help/matlab/ref/rgb2gray.html
    
    images = 0.2989*images[:,:,:,0] + 0.5870*images[:,:,:,1] + 0.1140*images[:,:,:,2]
    return images

def normalizeImages(images):
    '''
    images:1个通道的图像
    return:图像像素经过比例缩放,直方图均衡后的图像
    '''
    # use Histogram equalization to get a better range
    # source http://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_hist
    images = (images / 255.).astype(np.float32)  # rgb像素是0-255之间,缩放至0-1的范围
    
    for i in range(images.shape[0]):
        images[i] = exposure.equalize_hist(images[i])   # 直方图均衡之后的图像数组
    
    images = images.reshape(images.shape + (1,))   #  二维扩成三维
    return images

def preprocessData(images):
    '''
    images:三通道的image
    return:返回一通道,且数值经过比例缩放的图片(除以255,使之数值范围集中在0-1之间)
    '''
    grayImages = toGray(images)
    return normalizeImages(grayImages)


# # 我们需要对图像做一些骚操作 毕竟500张图片还是太少了

# In[3]:


from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split

size = 32
classSize = 20000


# In[7]:


# 导入数据
hotdogs = glob.glob('./train/hot_dog/**/*.jpg', recursive=True)
notHotdogs = glob.glob('./train/not_hot_dog/**/*.jpg', recursive=True)


# In[12]:


dd = (20000,20000)
print(dd)


# In[14]:


# 骚操作一波 
scaled_X, y = loadData(size, classSize, hotdogs, notHotdogs)
scaled_X = preprocessData(scaled_X)


# In[15]:


y = to_categorical(y)    # 目标变量独热


n_classes=2
print("y shape", y.shape)
X_train, X_test, y_train, y_test = train_test_split(
    scaled_X, 
    y, 
    test_size=0.2, 
    random_state=rand_state
)    # 数据按照训练集0.8的比例分割

print("train shape X", X_train.shape)
print("train shape y", y_train.shape)
print("Test shape X:", X_test.shape)
print("Test shape y: ", y_test.shape)

inputShape = (size, size, 1)


# In[8]:


def plot_history(history):
    loss_list = [s for s in history.history.keys() if 'loss' in s and 'val' not in s]
    val_loss_list = [s for s in history.history.keys() if 'loss' in s and 'val' in s]
    acc_list = [s for s in history.history.keys() if 'acc' in s and 'val' not in s]
    val_acc_list = [s for s in history.history.keys() if 'acc' in s and 'val' in s]
    
    if len(loss_list) == 0:
        print('Loss is missing in history')
        return 
    
    ## As loss always exists
    epochs = range(1,len(history.history[loss_list[0]]) + 1)
    
    ## Loss
    plt.figure(1)
    for l in loss_list:
        plt.plot(epochs, history.history[l], 'b', label='Training loss (' + str(str(format(history.history[l][-1],'.5f'))+')'))
    for l in val_loss_list:
        plt.plot(epochs, history.history[l], 'g', label='Validation loss (' + str(str(format(history.history[l][-1],'.5f'))+')'))
    
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    ## Accuracy
    plt.figure(2)
    for l in acc_list:
        plt.plot(epochs, history.history[l], 'b', label='Training accuracy (' + str(format(history.history[l][-1],'.5f'))+')')
    for l in val_acc_list:    
        plt.plot(epochs, history.history[l], 'g', label='Validation accuracy (' + str(format(history.history[l][-1],'.5f'))+')')

    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.show()


# # 重点来了:构建模型就是这儿了

# In[9]:


import keras
from keras.models import Sequential
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.layers.normalization import BatchNormalization


model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',
                 input_shape=inputShape))   # 卷积
model.add(MaxPooling2D((2, 2)))             # 池化
model.add(Dropout(0.25))                    # 随机失活
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Dropout(0.4))
model.add(Flatten())                       # 展成一维
model.add(Dense(128, activation='relu'))   # 全连接
model.add(Dropout(0.3))
model.add(Dense(2, activation='softmax'))

model.compile(loss=keras.losses.binary_crossentropy,
              optimizer=keras.optimizers.Adam(lr=1e-4),
              metrics=['accuracy'])

start = time.time()

model.summary()
# Set callback functions to early stop training and save the best model so far
callbacks = [
    EarlyStopping(
        monitor='val_loss', 
        patience=3
    ),
    ModelCheckpoint(
        filepath='model.h5', 
        monitor='val_acc', 
        save_best_only=True
    )
]

history = model.fit(
    X_train, 
    y_train,
    batch_size=32,
    epochs=100, 
    callbacks=callbacks,
    verbose=0,
    validation_data=(X_test, y_test)
)

end = time.time()
print('Execution time: ', end-start)

plot_history(history)
 

After training is completed, we can simply test the accuracy of our model. The following code can help us do that.

hotdogs = glob.glob('./test/hot_dog/**/*.jpg', recursive=True) 
notHotdogs = glob.glob('./test/not_hot_dog/**/*.jpg', recursive=True)

scaled_X_test, y_test = loadData(size, 250, hotdogs, notHotdogs)
scaled_X_test = preprocessData(scaled_X_test)

#get the predictions for the test data
predicted_classes = model.predict_classes(scaled_X_test)

# setup the true classes: just 250 hotdogs followed by 250 not hotdogs
y_true = np.concatenate((np.zeros((250,)), np.ones((250,))))
from sklearn.metrics import classification_report
print(classification_report(y_true, predicted_classes, target_names=['hotdog', 'not hotdog']))

So that we can see some of the more important findings of our model, and what such accuracy.

Step 5: Test Model

But since we trained hard, we will play a good look at this model. We can directly use the following code to predict a picture which is not a hot dog. Prior to this we need to create a named foo folder, and the picture you want to put to the test

from PIL import Image
import numpy as np
from skimage import transform


from IPython.display import Image as ipy_Image
from IPython.display import display

# 定义一个加载图片的函数,使我们的图片变成np array
def load(filename):
   np_image = Image.open(filename)
   np_image = np.array(np_image).astype('float32')/255
   np_image = transform.resize(np_image, (32, 32, 1))
   np_image = np.expand_dims(np_image, axis=0)
   return np_image

import os
from os.path import join

image_dir = './foo'
os.listdir(image_dir)
img_paths = [join(image_dir,filename) for filename in os.listdir(image_dir)]

index_number = 0

image = load(img_paths[index_number])
score = model.predict(image)
result = model.predict_classes(image)
print(score[0][0], result)
display(ipy_Image(img_paths[index_number]))

For instance, we live here in the netizens to upload a picture sent this picture in live time successfully deceived the model, the highest score
IMG_6449

We run what you can see the result
_

We can see this picture perfect fool our model, almost reached 100. We can also take this model to test his side like a long hot dogs but it is not a hot dog, etc., how to see in the end - you can add nails group 23,304,116 to the group of friends and you cheated Competition highest model points (spoofing success was defined as not more than 50 hot dogs but score)

_MEME

Guess you like

Origin yq.aliyun.com/articles/704331