Python implements deep learning-based support for face recognition and emotion classification

Resource download address : https://download.csdn.net/download/sheziqiong/88358853
Resource download address : https://download.csdn.net/download/sheziqiong/88358853

1 Project introduction

1.1 Background

Vision allows humans to perceive and understand the surrounding world. About 70% of the activity in the human cerebral cortex processes visual-related information. Computer vision uses electronic methods to perceive and understand images to achieve or even surpass human visual intelligence.

From the establishment of the discipline in 1966 (MIT: The Summer Vision Project) to the present, although computer vision still has a large number of difficult-to-solve and unexplored problems in the direction of perception and cognitive intelligence, it has benefited from the mature application of deep learning algorithms (in 2012, the use of deep learning algorithms The AlexNet model with learning architecture won the ImageNet competition with a score 10 percentage points higher than the second place). Image classification technology focusing on perceptual intelligence has gradually realized commercial value in the industry, helping finance, security, Internet, mobile phones, medical, Intelligent upgrades in industry and other fields.

Beginning in the second half of 2016, computer vision-related technologies such as face recognition and video structuring have broken through the red line of industrialization in actual combat scenarios in the security field, marking the prelude to a large-scale explosion in the computer vision industry market. It is expected that the market size of China's computer vision industry will usher in significant growth in the next three years.

With the improvement of accuracy of classification and segmentation algorithms such as face recognition and object recognition, in addition to security, video advertising, pan-finance, mobile phones and Internet entertainment fields that accounted for a relatively high proportion in 2017, medical imaging, industrial manufacturing, wholesale and retail and other current fields The innovation areas in this stage will also be gradually unlocked and become an important support for the rapid development of the industry as a whole.

Currently, there are many computer vision technology companies on the market, as shown in the figure below:

  • On the whole, SenseTime’s technology has the widest coverage and Yitu the least; but the core technologies: face recognition, text recognition, image recognition, vehicle recognition, pedestrian monitoring and other technologies are all shared by the three companies.

  • From a breakdown perspective, only SenseTime has image and video editing technology and a deep learning framework; Megvii’s face technology is the most complete among the three companies, and it has human body recognition technology and can recognize gestures; vice versa; TuZe has target tracking technology

  • SenseTime, Megvii, and Yitu all focus on the financial and security fields; in the financial field, Megvii and SenseTime have a deeper and broader layout; while Yitu and Megvii work closely with the public security in the security field, and Megvii also has deep involvement in real estate security.

  • SenseTime works closely with mobile phone manufacturers and operators. In the future, SenseTime will use smartphones to popularize its facial recognition technology, become the largest technology provider, build user reputation, and establish brand advantages for opening C-side openings in the future.

To sum up, computer vision giants all focus on the business chain of large companies. For small and medium-sized enterprises, face recognition and emotion classification of ordinary tourists are insufficient.

1.2 Project significance

Just because the giants have not made enough efforts in the emotion classification business for ordinary users and small and medium-sized enterprises, I decided to start the emotion community project here.

On the one hand, the video surveillance traffic of small and medium-sized enterprises and shopping malls is huge, and manual processing is very inefficient and consumes manpower, material and financial resources. On the other hand, the huge videos of ordinary users can be classified and processed, stimulating the desire of ordinary users to share, and can build A community develops the sharing fun of ordinary users and has the function of commenting and sharing in the form of a community.

To sum up, the significance of our project can be explained from the following perspectives:

  • From small and medium-sized enterprises, shopping malls, and public places, there is an urgent need for face recognition and emotion classification;

  • For some developers, there is a great demand for cheap and suitable emotion classification API interfaces.

  • For ordinary users, emotion recognition and sharing videos (before and after recognition) are very fun, and there are many implementation scenarios.

1.3 Project goals

Regarding the "Emotional Community" project, we hope to eventually develop it into an online platform that can be open to use in various forms to provide services, including but not limited to mobile device applications, personal computer software applications, website platforms, public accounts, WeChat Applets. In order to provide services conveniently and quickly, the diverse and comprehensive use forms are also a highlight of this product.

It will eventually be made into a B/S framework app and a WeChat applet facial emotion recognition app, both of which will have community forum functions. At the same time, the recognition and classification model is deployed on the server and can be provided to small and medium-sized enterprise APIs.

According to the initial plan, the effects we hope to achieve (available services) mainly include:

  • For small and medium-sized enterprises, shopping malls, and public places, we provide CNN model API interfaces and technical support.

  • For some developers, we provide an API interface to process videos, images, and return the processed results in json form.

  • For ordinary users, we provide the web side and WeChat applet side of this app. Users can browse the app through mobile phones or PCs, upload videos, return emotion recognition results, and share them in the BBS community of the app, and can also post comments and posts.

2 Functional requirements description

2.1 System roles

  • Administrator: Video information review, user qualification review, forum data review

  • User: User registration and login upload video forum community sharing forum community comment forum community posting forum community management section modify personal information

2.2 Obtaining and maintaining user information

2.2.1 Function description

During initial registration, the system requires users to fill in relevant information and record it in the system.

2.2.2 Stimulus/response sequence

  • Stimulus: User registration or first login

    Response: Provide information filling requirements

  • Stimulus: User chooses answer

    Response: Record the user’s age and other information and provide a selection list of corresponding forum sections

2.3 Video analysis

2.3.1 Function description

Users can upload videos and get feedback results.

2.3.2 Stimulus/response sequence

  • Stimulus: User chooses to upload video function

    Response: Enter the upload video page

  • Stimulation: The user presses the upload button to upload the video

    Response: Alert the user and generate results

  • Stimulus: The user presses another button

    Response: Enter other pages

2.4 Community Forum

2.4.1 Function description

Users can browse the forum community, or share videos, post, and comment.

2.4.2 Stimulus/response sequence

  • Stimulus: User enters community forum

    Response: List related sections and posts in a certain order

  • Stimulus: User keyword search

    Response: Provide search results containing the keyword in the section or post

  • Stimulus: users write posts or comments

    Response: The system prompts whether to submit, update the page and related data

2.5 Use case diagram

2.6 ER diagram

2.7 System flow chart

2.8 System interaction diagram

2.9 Concept class diagram and data flow diagram

2.10 Data flow diagram

3 Overall design

3.1 Overall goals

3.1.1 Needs met

The core needs met are: users can upload videos through the platform to obtain system analysis results, and at the same time, they can share comments and posting needs in the forum community.

3.1.2 Technical basis and operating environment

  • Web operating system: Win95 and above

  • WeChat applet: wechat 6.6.7 and above

  • Database: MySQL database

  • Web framework: Django

  • System writing language: Python

  • Deep learning platform: Keras, tensorflow

3.2 Overall structure of the system

For users, there are the following system functions: including user management module, user video emotion analysis module, and user forum community module; for administrators, they mainly participate in the management and maintenance of the system and review relevant information. According to these criteria, the entire system is divided into several modules as shown in the figure below.

3.3 Description of each functional module

3.3.1 User management module

After the user registers and logs in, the system will record relevant information, and personal information can be modified at the same time. Every time the video file and community are used, the user file data management information in the user's management module will be updated, and the administrator's deletion of the user's information will also be retained. Permissions, if the user has violated laws and regulations.

3.3.2 User video emotion analysis module

  • User video upload: After the user clicks the upload video button, the selected video is opened from the folder and uploaded to the system. At the same time, the video format, size, resolution, and duration will be checked. For videos that do not meet the format, relevant information will pop up.

  • Algorithm model analysis: Call the trained CNN model to cut the video uploaded by the user into a frame-by-frame image. Then use the model to first cut out the face and then use the model to classify the facial expressions, and then divide the results according to the number of frames. Print it as a sequence diagram, write it into js code, and embed it in the html page.

  • User result feedback: Send the html with the result sequence diagram embedded to the user, and the user gets the results of the video analysis

3.3.3 User community module

  • Section management: Users can set up some sections, submit reports or delete them, fill in the system application form, and it will be published in the system after being reviewed by the backend.

  • Post publishing: Users can apply to publish posts in the corresponding section, fill in the form according to the system format, and submit it for review by the administrator.

  • Posts and comments: Users can select "Reply" under other people's posts to follow or comment. There is a forward button at the bottom of the post to support forwarding.

4 Data design

user table

video table

Post information table

Section information table

5 Algorithms

Mainly CNN deep learning algorithm. See the source code for other algorithms.

The basic model of this project is designed with reference to the open source project Xception framework. We propose a general convolutional neural network building framework for designing real-time CNNs. We validate our model by creating a real-time vision system that uses our proposed CNN architecture to simultaneously accomplish tasks such as face detection, gender classification, and emotion classification in one hybrid step. After presenting the details of the training program setup, we move on to the evaluation of the standard benchmark group. We report 96% accuracy on the IMDB gender dataset and 66% accuracy on the FER2013 emotion dataset. In addition to this, we also introduce the backpropagation visualization technology that has recently been enabled in real time. Guided backpropagation reveals the dynamics of weight changes and evaluates learned features. We believe that in order to reduce the gap between slow performance and real-time architectures, careful implementation of modern CNN architectures, the use of current regularization methods and the visualization of previously hidden features are necessary. Our system has been validated by deploying a Care-O-bot 3 robot used during the RoboCup@Home competition. All of our code, demos, and pre-trained architectures are released under an open source license on our public repository.

We propose two models, which we evaluate based on their testing accuracy and number of parameters. The design idea of ​​both models is to create the best accuracy in the parameter number ratio. Reducing the number of parameters helps us overcome two important problems:

  • First, using small CNNs can alleviate our slow performance in hardware-constrained systems such as robotic platforms.

  • Secondly, the reduction of parameters provides better generalization under Occam’s razor architecture.

Our first model relies on the elimination of fully connected layers. The second architecture combines the deletion of fully connected layers with depthwise separable convolution and residual modules. Both architectures are trained using the ADAM optimizer. .

Following previous architectural patterns, our initial architecture uses global average pooling to completely remove any fully connected layers. This is done by having the same number of feature maps as the number of classes in the last convolutional layer and applying a softmax activation function to each reduced feature map.

Our initially proposed architecture is a standard fully convolutional neural network, consisting of 9 convolutional layers of ReLUs, Batch Normalization and Global Average Pooling. The model contains approximately 600,000 parameters. It was trained on the IMDB gender dataset, which contains 460,723 RGB images, where each image belongs to the class "woman" or "man", and achieved 96% accuracy in this dataset. We also validate the model on the FER-2013 dataset. The dataset contains 35,887 grayscale images, where each image belongs to one of the following categories {"angry", "disgusted", "fear", "happy", "sad", "surprised", "neutral"}.

Our initial model achieved 66% accuracy on this dataset, which we refer to as a "sequential complete CNN".

Our second model is inspired by the Xception architecture. This architecture combines the use of residual modules and deep-intelligent separable convolutions. The remaining module modifies the desired mapping between the two subsequent layers so that the learned features become the difference between the original feature map and the desired feature. Therefore, to solve the easier learning problem F(x), the desired features H(x) are modified such that:

H(x) = F(x) + x

The Xeception-based model we designed is as follows:

The heat map of our prediction results is shown below:

6 Human-computer interaction design

6.1 web login interface

The user first enters the login interface. If the user has not registered before, it will jump to the registration interface. If the ordinary user has registered, it will jump to the main interface. The administrator will jump to the background management interface.

6.2 web registration interface

The registration interface will check the input and prompt a pop-up window.

6.3 Main interface

The main interface has multiple jump interfaces, which can jump to any location. At the same time, the main interface will display recent posts and popular posts. Click the link of the post to jump to the text html of the post. At the same time, there is a search engine on the page, which can support searching related posts and keywords throughout the site.

6.4 Popular posts interface

Displays the posts with the most likes. Click the readme to jump to the specific page.

6.5 Posting interface

6.6 Emotion classification interface

The first is the upload file button. After selecting the file to upload, click Upload to upload the file to the server. Then the server runs the algorithm model to analyze the face and emotion classification of each frame of image, and synthesizes a new video file, which is displayed on this page. between. At the same time, it supports pop-up prompts, and the prompt cannot be passed empty.

Show analysis results

6.7 Q&A interface

6.8 Contact us interface

6.9 Database interface

7 Summary

This project was completed by myself. The idea of ​​choosing the topic came from a paper on CNN that I read during the winter vacation, so I wanted to make an emotion classification system equipped with forum communities this semester.

In general, I followed the steps in the software engineering book and wrote down each step and each document step by step. I wrote several PPTs and seven or eight documents. Although it was very hard, I also gained a good understanding of software engineering knowledge. Deeper.

First, let me talk about my understanding of software engineering. Software engineering in my mind is how to use engineering management technology to make software. Why was software engineering born? As the amount of code increases, people's ability to control the code becomes weaker and weaker, and the logic, progress and cost of the code become increasingly difficult to control, resulting in a software crisis! In order to solve the software crisis, software engineering came into being.

On the code, what I learned in software engineering is on the details:

  • First, you need to comply with code specifications. The advantage is that it is easy to modify and maintain. Let others see your code clearly

  • Second, the separation of data and business logic. We need to modularize the code to better maintain and reuse it

  • Third, know how to design the interface reasonably. It should neither be exhaustive nor too general, just enough to be useful. For example: the interface design of the linked list is different from the interface design of the menu. Linked lists require interfaces for addition, deletion, and modification, while menu interfaces do not need to be so detailed.

  • Fourth, non-functional requirements, such as security. What needs to be talked about here in particular is thread safety, how to use the locking mechanism to construct and write safe code.

  • Fifth, design ideas, here we need to apply some design patterns summarized by predecessors, such as factory mode, observer mode, adapter mode, etc. Applying these patterns can greatly increase the scalability of code, better tolerate changes, and better reuse code. Of course, in order to broaden his horizons, the teacher also mentioned programming using functional and formal methods.

So, in general, I think it is very interesting and challenging to complete a software engineering project independently at the university level. It should be very helpful for future employment.

Resource download address : https://download.csdn.net/download/sheziqiong/88358853
Resource download address : https://download.csdn.net/download/sheziqiong/88358853

Guess you like

Origin blog.csdn.net/newlw/article/details/133064936