Artificial intelligence study notes five - twin neural network

This article will use the twin neural network model to compare the similarity of the handwritten digit set minist, and the framework used is keras. If you are not clear about neural networks, you can read this article Neural Networks (caodong0225.github.io) .

MNIST is a picture dataset of handwritten digits. The dataset was organized by the National Institute of Standards and Technology. A total of 250 different handwritten digit pictures were counted, of which 50% were high school students and 50% came from the census. bureau staff. The purpose of collecting this data set is to realize the recognition of handwritten digits through algorithms.

The training set contains a total of 60,000 images and labels, while the test set contains a total of 10,000 images and labels. The first 5000 in the test set are from the original NIST project training set, and the last 5000 are from the original NIST project test set. The first 5,000 numbers are more regular than the last 5,000 because the first 5,000 are from U.S. Census Bureau employees, while the last 5,000 are from college students.

Download address: caodong0225.github.io/minist.zip at master caodong0225/caodong0225.github.io

Since 1998, this data set has been widely used in the field of machine learning and deep learning to test the effect of algorithms, such as linear classifiers (Linear Classifiers), K-Nearest Neighbors (K-Nearest Neighbors), support vector machines (SVMs), Neural Nets, Convolutional nets, etc.

Figure 1 (minist partial handwritten data set)

Keras is an open source artificial neural network library written in Python, which can be used as a high-level API for Tensorflow, Microsoft-CNTK, and Theano to design, debug, evaluate, apply, and visualize deep learning models.

Keras is written in an object-oriented way in terms of code structure, and is fully modularized and scalable. Its operating mechanism and documentation take user experience and difficulty of use into consideration, and try to simplify the difficulty of implementing complex algorithms. Keras supports mainstream algorithms in the field of modern artificial intelligence, including neural networks with feedforward structure and recursive structure, and can also participate in the construction of statistical learning models through encapsulation. In terms of hardware and development environment, Keras supports multi-GPU parallel computing under multiple operating systems, and can be converted into components under Tensorflow, Microsoft-CNTK and other systems according to background settings. Therefore, this article uses keras as the framework.

As for the Siamese neural network, also known as the twin neural network, it is a coupled framework based on two artificial neural networks. The Siamese neural network takes two samples as input and outputs a representation embedded in a high-dimensional space to compare the similarity of the two samples. The narrow twin neural network is spliced ​​by two neural networks with the same structure and shared weights. The generalized twin neural network, or "pseudo-siamese network", can be spliced ​​by any two neural networks. Siamese neural networks usually have a deep structure and can be composed of convolutional neural networks, recurrent neural networks, etc.

Figure 2 Schematic diagram of twin neural network

The so-called weight sharing means that when the neural network has two inputs, the weights of the neural network used by the two inputs are shared (it can be understood as using the same neural network). Many times, we need to judge the similarity of two pictures, such as comparing the similarity of two faces, we can naturally think of extracting the features of this picture and then comparing them. Naturally, we can also think of using neural network Perform feature extraction. If two neural networks are used to extract features from pictures, the extracted features may not be in the same domain. At this time, we can consider using a neural network for feature extraction and then compare them. At this time, we can understand why the twin neural network needs to share weights.

The twin neural network has two inputs (Input1 and Input2), and uses the neural network to map the input to a new space to form a representation of the input in the new space. Through the calculation of Loss, the similarity of two inputs is evaluated. 

Figure 3 Siamese neural network schematic diagram

There are many mapping methods, and the common mapping methods are square difference mapping and absolute value mapping. For the two input images, after feature extraction through the weight-sharing neural network, two sets of one-dimensional feature vectors W 1 and W2 with a size of N will be obtained . Let the one-dimensional vector obtained after mapping be W3 , then the formula for square difference mapping is:

The formula for absolute value mapping is:

At this point, the weight sharing part of the twin neural network is over. For the new vector W3 , the neural network operation can be continued, or each value of the vector can be summed up and the square root or average value can be used as the loss function. It depends. In this example, we take the latter approach, and we specify that this loss function is called the contrastive loss function (contrasive loss). The formula is:

Among them, y represents the sample label, that is, 1 or 0, indicating whether the two input pictures are of the same type of picture, if so, it is 1, otherwise it is 0. margin represents the threshold, because when the input images are of different types, the value of d will be very large. In order to prevent the loss function from changing unevenly due to excessive d, the threshold is set. Generally, the value of margin is 1.

Observing this formula, we can find that when the input pictures are of the same type, the loss function is the square root of the MSE loss function, and the neural network will adjust the values ​​of W 1 and W2 to be as equal as possible, so that the smaller the value of d, the smaller the value of the loss function smaller. When the input pictures are of different types, the neural network will adjust the values ​​of W 1 and W2 to be as unequal as possible, so that the larger the value of d is, if the value of d exceeds the threshold margin, the value of the loss function will be 0.

In order to measure the accuracy of the model on the training set, it is also necessary to design a function for the calculation of Accuracy. In this example, we stipulate that when d>0.5, the neural network will recognize the two pictures as different pictures, and when d<0.5 , the neural network will identify the two pictures as the same picture. The value of division can be adjusted according to the specific situation.

The code for accuracy is as follows:

import keras.backend as K  

def accuracy(y_true, y_pred): # Tensor上的操作  

    return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))  

The code of the loss function is as follows:

import keras.backend as K  

def contrastive_loss(y_true, y_pred):  

     margin = 1  

     sqaure_pred = K.square(y_pred)  

     margin_square = K.square(K.maximum(margin - y_pred, 0))  

     return K.mean(y_true * sqaure_pred + (1 - y_true) * margin_square)

The structure of the neural network is as follows:

Figure 4 Siamese neural network structure diagram

   The complete code is as follows:

#coding:gbk  

from keras.layers import Input,Dense  

from keras.layers import Flatten,Lambda,Dropout  

from keras.models import Model  

import keras.backend as K  

from keras.models import load_model  

import numpy as np  

from PIL import Image  

import glob  

import matplotlib.pyplot as plt  

from PIL import Image  

import random  

from keras.optimizers import Adam,RMSprop  

import tensorflow as tf  

def create_base_network(input_shape):  

    image_input = Input(shape=input_shape)  

    x = Flatten()(image_input)  

    x = Dense(128, activation='relu')(x)  

    x = Dropout(0.1)(x)  

    x = Dense(128, activation='relu')(x)  

    x = Dropout(0.1)(x)  

    x = Dense(128, activation='relu')(x)  

    model = Model(image_input,x,name = 'base_network')  

    return model  

def contrastive_loss(y_true, y_pred):  

     margin = 1  

     sqaure_pred = K.square(y_pred)  

     margin_square = K.square(K.maximum(margin - y_pred, 0))  

     return K.mean(y_true * sqaure_pred + (1 - y_true) * margin_square)  

def accuracy(y_true, y_pred): # Tensor上的操作  

    return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))  

def siamese(input_shape):  

    base_network = create_base_network(input_shape)  

    input_image_1 = Input(shape=input_shape)  

    input_image_2 = Input(shape=input_shape)  

  

    encoded_image_1 = base_network(input_image_1)  

    encoded_image_2 = base_network(input_image_2)  

  

    l2_distance_layer = Lambda(  

        lambda tensors: K.sqrt(K.sum(K.square(tensors[0] - tensors[1]), axis=1, keepdims=True))  

        ,output_shape=lambda shapes:(shapes[0][0],1))  

    l2_distance = l2_distance_layer([encoded_image_1, encoded_image_2])  

      

    model = Model([input_image_1,input_image_2],l2_distance)  

      

    return model  

def process(i):  

    img = Image.open(i,"r")  

    img = img.convert("L")  

    img = img.resize((wid,hei))  

    img = np.array(img).reshape((wid,hei,1))/255  

    return img  

#model = load_model("testnumber.h5",custom_objects={'contrastive_loss':contrastive_loss,'accuracy':accuracy})  

wid=28  

hei=28  

model = siamese((wid,hei,1))  

imgset=[[],[],[],[],[],[],[],[],[],[]]  

for i in glob.glob(r"train_images\*.jpg"):  

    imgset[int(i[-5])].append(process(i))  

size = 60000  

  

r1set = []  

r2set = []  

flag = []  

for j in range(size):  

    if j%2==0:  

        index = random.randint(0,9)  

        r1 = imgset[index][random.randint(0,len(imgset[index])-1)]  

        r2 = imgset[index][random.randint(0,len(imgset[index])-1)]  

        r1set.append(r1)  

        r2set.append(r2)  

        flag.append(1.0)  

    else:  

        index1 = random.randint(0,9)  

        index2 = random.randint(0,9)  

        while index1==index2:  

            index1 = random.randint(0,9)  

            index2 = random.randint(0,9)  

        r1 = imgset[index1][random.randint(0,len(imgset[index1])-1)]  

        r2 = imgset[index2][random.randint(0,len(imgset[index2])-1)]  

        r1set.append(r1)  

        r2set.append(r2)  

        flag.append(0.0)  

r1set = np.array(r1set)  

r2set = np.array(r2set)  

flag = np.array(flag)  

model.compile(loss = contrastive_loss,  

            optimizer = RMSprop(),  

            metrics = [accuracy])  

history = model.fit([r1set,r2set],flag,batch_size=128,epochs=20,verbose=2)  

# 绘制训练 & 验证的损失值  

plt.figure()  

plt.subplot(2,2,1)  

plt.plot(history.history['accuracy'])  

plt.title('Model accuracy')  

plt.ylabel('Accuracy')  

plt.xlabel('Epoch')  

plt.legend(['Train'], loc='upper left')  

plt.subplot(2,2,2)  

plt.plot(history.history['loss'])  

plt.title('Model loss')  

plt.ylabel('Loss')  

plt.xlabel('Epoch')  

plt.legend(['Train'], loc='upper left')  

plt.show()  

model.save("testnumber.h5")

  

   The training process diagram is as follows:

Figure 5 training process diagram

Figure 6 training process diagram

   The display code for the evaluation is as follows:

import glob  

from PIL import Image  

import random  

def process(i):  

    img = Image.open(i,"r")  

    img = img.convert("L")  

    img = img.resize((wid,hei))  

    img = np.array(img).reshape((wid,hei,1))/255  

    return img  

def contrastive_loss(y_true, y_pred):  

     margin = 1  

     sqaure_pred = K.square(y_pred)  

     margin_square = K.square(K.maximum(margin - y_pred, 0))  

     return K.mean(y_true * sqaure_pred + (1 - y_true) * margin_square)  

def accuracy(y_true, y_pred): # Tensor上的操作  

    return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))  

def compute_accuracy(y_true, y_pred):  

    pred = y_pred.ravel() < 0.5  

    return np.mean(pred == y_true)  

imgset=[]  

wid = 28  

hei = 28  

imgset=[[],[],[],[],[],[],[],[],[],[]]  

for i in glob.glob(r"test_images\*.jpg"):  

    imgset[int(i[-5])].append(process(i))  

model = load_model("testnumber.h5",custom_objects={'contrastive_loss':contrastive_loss,'accuracy':accuracy})  

for i in range(50):  

    if random.randint(0,1)==0:  

        index=random.randint(0,9)  

        r1 = random.randint(0,len(imgset[index])-1)  

        r2 = random.randint(0,len(imgset[index])-1)  

        plt.figure()  

        plt.subplot(2,2,1)  

        plt.imshow((255*imgset[index][r1]).astype('uint8'))  

        plt.subplot(2,2,2)  

        plt.imshow((255*imgset[index][r2]).astype('uint8'))  

        y_pred = model.predict([np.array([imgset[index][r1]]),np.array([imgset[index][r2]])])  

        print(y_pred)  

        plt.show()  

    else:  

        index1 = random.randint(0,9)  

        index2 = random.randint(0,9)  

        while index1==index2:  

            index1 = random.randint(0,9)  

            index2 = random.randint(0,9)  

        r1 = random.randint(0,len(imgset[index1])-1)  

        r2 = random.randint(0,len(imgset[index2])-1)  

        plt.figure()  

        plt.subplot(2,2,1)  

        plt.imshow((255*imgset[index1][r1]).astype('uint8'))  

        plt.subplot(2,2,2)  

        plt.imshow((255*imgset[index2][r2]).astype('uint8'))  

        y_pred = model.predict([np.array([imgset[index1][r1]]),np.array([imgset[index2][r2]])])  

        print(y_pred)  

        plt.show()

Figure 7 Image similarity comparison

Figure 8 Image similarity comparison

Guess you like

Origin blog.csdn.net/qq_45198339/article/details/128747452