Intelligent sign language digital real-time translation based on Android+OpenCV+CNN+Keras - deep learning algorithm application (including Python, ipynb engineering source code) + data set (2)


Insert image description here

Preface

This project relies on the Keras deep learning model and is designed to classify and recognize sign language in real time. To achieve this goal, the project incorporates relevant algorithms from the OpenCV library to capture the position of the hands, enabling real-time recognition of sign language in video streams and images.

First, the project uses algorithms from the OpenCV library to capture hand positions in video streams or images. This can involve technologies such as skin color detection, motion detection, or gesture detection to pinpoint sign language gestures.

Next, the project uses a CNN deep learning model to classify the captured sign language. After training, it can recognize different sign language gestures as specific categories or characters.

During the real-time recognition process, the sign language gestures in the video stream or image are passed to the CNN deep learning model, which makes inferences and recognizes the gestures into the corresponding categories. This enables the system to recognize sign language gestures in real time and convert them into text or other forms of output.

Overall, this project combines computer vision and deep learning technology to provide a real-time solution for sign language recognition. This is a beneficial tool for hearing-impaired people and sign language users to help them communicate and understand others more easily.

overall design

This part includes the overall system structure diagram and system flow chart.

Overall system structure diagram

The overall structure of the system is shown in the figure.

Insert image description here

System flow chart

The system flow is shown in the figure.

Insert image description here

Operating environment

This part includes Python environment, TensorFlow environment, Keras environment and Android environment .

Module implementation

This project includes 6 modules: data preprocessing, data enhancement, model construction, model training and preservation, model evaluation and model testing. The functions and related codes of each module are introduced below.

1. Data preprocessing

Download the corresponding data set on Kaggle. The download address is https://www.kaggle.com/ardamavi/sign-language-digits-dataset . Load the data set downloaded in the local folder. The relevant code is as follows:

#导入相应包
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras import layers
from keras import optimizers
from sklearn.model_selection import train_test_split
import os
#打印文件夹有关信息
print(os.listdir("/Users/chenjiyan/Desktop/信息系统设计项目/Sign-language-digits-dataset"))
#加载数据集
X=np.load("/Users/chenjiyan/Desktop/信息系统设计项目/Sign-language-digits-dataset/X.npy")
y=np.load("/Users/chenjiyan/Desktop/信息系统设计项目/Sign-language-digits-dataset/Y.npy")
print("The dataset loaded...")
#定义数据概览所需函数
#初始数据概览
#代码改自https://www.kaggle.com/serkanpeldek/cnn-practices-on-sign-language-digits
#One-hot标签解码
def decode_OneHotEncoding(label):
    label_new=list()
    for target in label:
        label_new.append(np.argmax(target))  #选择最大元素(即值为1)的索引
    label=np.array(label_new)
    return label
#因为原数据集标签标注有误,所以需要纠正数据集错误标签
def correct_mismatches(label): 
    label_map={
    
    0:9,1:0,2:7,3:6,4:1,5:8,6:4,7:3,8:2,9:5} #正确标签映射列表
    label_new=list()
    for s in label:
        label_new.append(label_map[s])
    label_new=np.array(label_new)
    return label_new
#显示图像类别
def show_image_classes(image, label, n=10):
    label=decode_OneHotEncoding(label)
    label=correct_mismatches(label)
    fig, axarr=plt.subplots(nrows=n, ncols=n, figsize=(18, 18))
    axarr=axarr.flatten()
    plt_id=0
    start_index=0
    for sign in range(10):
        sign_indexes=np.where(label==sign)[0]
        for i in range(n):
            #逐行打印0~9的手语图片
            image_index=sign_indexes[i]
            axarr[plt_id].imshow(image[image_index], cmap='gray')  
            axarr[plt_id].set_xticks([])
            axarr[plt_id].set_yticks([])
            axarr[plt_id].set_title("Sign :{}".format(sign))
            plt_id=plt_id+1
    plt.suptitle("{} Sample for Each Classes".format(n))
    plt.show()
number_of_pixels=X.shape[1]*X.shape[2]
number_of_classes=y.shape[1]
print(20*"*", "SUMMARY of the DATASET",20*"*")
print("an image size:{}x{}".format(X.shape[1], X.shape[2]))  #获取图片像素大小
print("number of pixels:",number_of_pixels)
print("number of classes:",number_of_classes)
y_decoded=decode_OneHotEncoding(y.copy())     #标签解码
sample_per_class=np.unique(y_decoded, return_counts=True)
print("Number of Samples:{}".format(X.shape[0]))
for sign, number_of_sample in zip(sample_per_class[0], sample_per_class[1]):
    print("  {} sign has {} samples.".format(sign, number_of_sample))
print(65*"*")
show_image_classes(image=X, label=y.copy())

The preview effect of the data set is as shown in the figure.

Insert image description here

2. Data enhancement

In order to facilitate the display of the effect of generated images and fine-tuning of parameters, this project does not use keras to directly train the generator. Instead, an enhanced data set is first generated and then used for model training.

In data enhancement, first, define an image generator; second, perform data enhancement iteratively through the generator flow()method. The relevant code is as follows:

from keras.preprocessing.image import ImageDataGenerator
X_loaded = X.reshape(X.shape+(1,))
print("shape of X_loaded:",X_loaded.shape)
#定义图片生成器
datagen = ImageDataGenerator(featurewise_center=False, 
 #使数据集中心化, 按feature执行
           featurewise_std_normalization=False,    #使输入数据的每个样本均值为0
                             rotation_range=20,        #设定旋转角度
                             width_shift_range=0.2, height_shift_range=0.2,    
 #设定随机水平及垂直位移的幅度
                             brightness_range=[0.1, 1.3],  #亮度调整
                             horizontal_flip=False)         #设定不发生水平镜像
#迭代进行数据增强输出
X_added=X_loaded[0]
y_added=y[0]
X_added = X_added.reshape((1,)+X_added.shape)    #改变输入维数 
print("shape of X_added:",X_added.shape)
i = 0
for batch in datagen.flow(X_loaded,y, batch_size=11, shuffle=True, seed=None):     if i==0: print("shape of X in each batch:",batch[0].shape,"\n","shape of y in each batch:",batch[1].shape)
    X_added=np.vstack((X_added,batch[0]))      #添加图片
    y_added=np.vstack((y_added,batch[1]))      #添加标签
i += 1
    if i%100==0:print("process:",i,"/",X_loaded.shape[0])    #输出处理进度
    if i >= X_loaded.shape[0]:  # 生成器会退出循环,生成数据总量为原来的batch_size倍
        break         
X_added=np.vstack((X_added,X_loaded)) 
#最后添加原数据,此时生成数据为原来的batch_size+1倍
y_added=np.vstack((y_added,y))
print("shape of X_added:",X_added.shape)
print("shape of y_added:",y_added.shape)

The data enhancement process is shown in the figure.

Insert image description here

The data preview effect is as shown in the figure.

Insert image description here

3. Model construction

After the data is loaded into the model, the model structure needs to be defined and the loss function optimized.

1) Define the model structure

The convolutional neural network used this time consists of four convolution blocks followed by a fully connected layer. Each convolution block contains a convolution layer and is followed by a maximum pooling layer to reduce the dimensionality of the data. To prevent gradient disappearance and gradient explosion, data batch normalization is performed and dropout regularization is set.

The relevant code is as follows:

#模型改自https://www.kaggle.com/serkanpeldek/cnn-practices-on-sign-language-digits
def build_conv_model_8():
    model = Sequential()
    model.add(layers.Convolution2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1)))
    model.add(layers.MaxPooling2D((2, 2)))       #最大池化层
    model.add(layers.BatchNormalization())       #批量归一化
    model.add(layers.Dropout(0.25))              #随机丢弃结点
    model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.BatchNormalization())
    model.add(layers.Dropout(0.25))
    model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.BatchNormalization())
    model.add(layers.Dropout(0.25))
    model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.BatchNormalization())
    model.add(layers.Dropout(0.25))
    model.add(layers.Flatten())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))
    return model
model=build_conv_model_8()
model.summary()

2) Optimize the loss function

Compile after determining the model architecture. This is a multi-category classification problem that requires the use of cross-entropy as the loss function. Since all labels carry similar weights, accuracy is often used as a performance metric. RMSprop is a commonly used gradient descent method. This project will use this method to optimize model parameters.

The relevant code is as follows:

optimizer=optimizers.RMSprop(lr=0.0001)  #优化器
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

Related other blogs

Intelligent sign language digital real-time translation based on Android+OpenCV+CNN+Keras - deep learning algorithm application (including Python, ipynb engineering source code) + data set (1)

Intelligent sign language digital real-time translation based on Android+OpenCV+CNN+Keras - deep learning algorithm application (including Python, ipynb engineering source code) + data set (3)

Intelligent sign language digital real-time translation based on Android+OpenCV+CNN+Keras - deep learning algorithm application (including Python, ipynb engineering source code) + data set (4)

Intelligent sign language digital real-time translation based on Android+OpenCV+CNN+Keras - deep learning algorithm application (including Python, ipynb engineering source code) + data set (5)

Project source code download

For details, please see my blog resource download page


Download other information

If you want to continue to understand the learning routes and knowledge systems related to artificial intelligence, you are welcome to read my other blog " Heavyweight | Complete Artificial Intelligence AI Learning - Basic Knowledge Learning Route, all information can be downloaded directly from the network disk without following any routines.
This blog refers to Github’s well-known open source platform, AI technology platform and experts in related fields: Datawhale, ApacheCN, AI Youdao and Dr. Huang Haiguang, etc., which has nearly 100G of related information. I hope it can help all my friends.

Guess you like

Origin blog.csdn.net/qq_31136513/article/details/133076743