Intelligent video surveillance system based on Intel® AI Analytics Toolkits

[oneAPI DevSummit & OpenVINODevCon Joint Hackathon]
Jump link: https://marketing.csdn.net/p/d2322260c8d99ae24795f727e70e4d3d

Table of contents

1Project background

2Project description

3Requirements analysis

4Technical feasibility analysis

5 Detailed design 5.1 Data collection

5.2 Video decoding and frame extraction

5.3 Face detection

5.4 Behavior recognition

5.5 Data analysis

5.6 Result display

6 Solution advantages and applicable scenarios

6.1 Problems solved

6.2 Applicable scenarios

7 Summary

Tools and components used in the solution


1Project background

Contemporary society's demand for video surveillance is reflected in all walks of life, but the shortcomings of traditional video surveillance are gradually revealed.

There are mainly the following aspects. First of all, the efficiency of manual monitoring is relatively low. Traditional video surveillance systems rely on manual operation and observation. Monitoring personnel need to continuously watch the surveillance images, which makes them prone to fatigue and missing important events or behaviors, resulting in low monitoring efficiency. Secondly, the real-time performance of traditional video surveillance systems is poor. The recordings in the system require manual playback and analysis, which makes it impossible to monitor and respond to emergencies in real time. There is a certain lag in handling emergencies. Data processing and management are also difficult. The large amount of video data generated in traditional video surveillance systems Storage and management are required, and hard drives are usually used, which has problems such as limited capacity and high risk of data loss. At the same time, it is relatively difficult to find and retrieve videos of specific events. Finally, there is an obvious disadvantage that traditional video surveillance systems mainly rely on manual judgment and operation. , there are false positives and false negatives. Monitoring personnel may fail due to fatigue, visual limitations or errors in judgment.

It is precisely because traditional video surveillance has some disadvantages in terms of monitoring efficiency, real-time performance, data processing and management, and false alarms and omissions. In order to overcome these problems, intelligent video surveillance systems came into being.

By combining advanced technologies such as artificial intelligence and image recognition, intelligent video surveillance systems can achieve efficient real-time monitoring and response capabilities, intelligent data processing and management and other advantages to meet the current society's needs for security and management, and can also perform automatic analysis Compared with traditional monitoring that requires manual observation and analysis, intelligent systems can also identify abnormal behaviors more quickly and accurately.

2Project description

This solution uses product components and libraries in the Intel® oneAPI AI analysis tool suite, combined with deep learning and video analysis technology, to build an intelligent video surveillance system for real-time monitoring and analysis of personnel activities, providing monitoring, identification, alarm and other functions. It can also be used for big data analysis, remote access and management, etc.

3Requirements analysis

Functional requirements: The system needs to capture video streams in real time and perform preprocessing, including denoising and reducing resolution. At the same time, the system needs to be able to decode the video and extract key frames for face detection and behavior recognition. Finally, the system needs to display the analysis results, including labeling faces and behaviors, and be able to alarm in real time.

Performance requirements: The system needs to process and analyze a large amount of video data in real-time scenarios, so it needs to have efficient algorithms and hardware support, and ensure that the processing speed and response time meet the requirements.

Reliability requirements: The system needs to have stable and reliable operation capabilities, including the ability to handle abnormal situations, such as power outage recovery and network fault handling.

Security requirements: The system needs to ensure the security and privacy protection of video data and prevent unauthorized access and tampering.

User experience requirements: The system needs to have a good user interface and operating experience to ensure that users can easily use and understand the system's functions and result display.

4Technical feasibility analysis

1. Data collection and preprocessing: Use camera equipment to collect real-time video streams, and use image processing libraries (such as OpenCV) to preprocess the video streams. These technologies are mature and have high feasibility.

2. Video decoding and frame extraction: Use Intel® oneAPI acceleration tool to decode the video and extract key frames for subsequent face detection and behavior recognition. These technologies are supported in the Intel® Distribution of OpenVINO™ Toolkit and are highly feasible.

3. Face detection: Use the face detection model in Intel® OpenVINO™ Toolkit to detect faces in real time for each key frame. This tool contains trained and optimized models with high feasibility.

4. Behavior recognition: Combined with the behavior recognition model in Intel® Distribution of OpenVINO™ Toolkit, the human activities in the monitored area are analyzed. Use the deep learning framework to load and run the behavior recognition model, and perform behavior recognition based on key frames of the video stream. These technologies are mature and have high feasibility.

5. Result display: Use an image processing library (such as OpenCV) to annotate the results of face detection and behavior recognition on the original video frame, and display the results in real time or save them as alarms.

5Detailed Design
5.1 Data Collection

Live video streaming capture using camera device.

Use an appropriate image processing library (such as OpenCV) to preprocess the video stream, such as denoising, reducing resolution, etc.

Denoising processing: During the real-time video stream collection process, various interferences may occur, such as noise from the camera itself, light changes, etc., so denoising processing is required to improve image quality.

Gaussian blur denoising using OpenCV library

# 读取原始帧

frame = cv2.imread("original_frame.jpg")

# 高斯模糊去噪

denoised_frame = cv2.GaussianBlur(frame, (5, 5), 0)

# 显示去噪后的结果

cv2.imshow("Denoised Frame", denoised_frame)

cv2.waitKey(0)

cv2.destroyAllWindows()



降低分辨率处理: 对视频流进行降低分辨率处理可以减少数据量和计算复杂度,同时可以加快后续的人脸检测和行为识别的处理速度。



使用OpenCV库进行图像缩放

# 读取原始帧

frame = cv2.imread("original_frame.jpg")

# 缩小分辨率

scaled_frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5)

# 显示降低分辨率后的结果

cv2.imshow("Scaled Frame", scaled_frame)

cv2.waitKey(0)

cv2.destroyAllWindows()

Through the above denoising and resolution reduction processing, subsequent steps such as video decoding, face detection, and behavior recognition can be made more efficient and accurate.

5.2 Video decoding and frame extraction

Video decoding and frame extraction is a very important step in the intelligent video surveillance system. It involves extracting key frames from video data to provide data support for subsequent face detection and behavior recognition.

Video decoding: Video decoding is the process of decoding the compressed data in the video file into the original video frame data for subsequent processing and analysis. In this step, you can use the corresponding libraries and tools provided in Intel® oneAPI acceleration tool for video decoding. We use Media SDK for hardware-accelerated video decoding.

Frame extraction: Generally, it is not necessary to perform face detection and behavior recognition on every frame of the video, because video data usually contains a large amount of redundant information. Therefore, during the frame extraction process, we can choose to extract key frames in the video, and then perform subsequent processing and analysis on these key frames.

We open a video file and use the OpenCV library for video decoding and frame extraction. By setting the extraction interval, we can control the frequency of keyframe extraction. When the extraction interval is reached, we save the current frame as a keyframe image file for subsequent face detection and behavior recognition.

# 打开视频文件

video_capture = cv2.VideoCapture('input_video.mp4')

# 视频帧计数器

frame_count = 0

# 提取间隔,例如每隔5帧提取一次

extract_interval = 5

# 逐帧读取视频while video_capture.isOpened():

    # 读取一帧

    ret, frame = video_capture.read()

    if not ret:

        break

    # 如果达到提取间隔,保存当前帧为关键帧

    if frame_count % extract_interval == 0:

        key_frame_name = 'keyframe_{}.jpg'.format(frame_count)

        cv2.imwrite(key_frame_name, frame)

        print('Saved key frame: {}'.format(key_frame_name))

    frame_count += 1

video_capture.release()

cv2.destroyAllWindows()

In this way, the video decoding and frame extraction steps are completed, and we obtain a series of key frame images that can be used for subsequent face detection and behavior recognition.

5.3 Face detection

Detect faces in real-time at every keyframe using the face detection model in the Intel® OpenVINO™ Toolkit.

Use the following code example for face detection:

import cv2 from openvino.inference_engine import IECore

# 加载模型

ie = IECore()

Net=ie.read_network(model='face_detection.xml',weights='face_detection.bin')

exec_net = ie.load_network(network=net, device_name='CPU')

# 读取关键帧

frame = cv2.imread("keyframe.jpg")

# 预处理

input_blob = cv2.dnn.blobFromImage(frame, size=(300, 300), ddepth=cv2.CV_8U)

# 推理

result = exec_net.infer(inputs={'input_blob_name': input_blob})

# 解析结果for detection in result['detection_out']:

    confidence = detection[2]

    if confidence > 0.5:

        x_min, y_min, x_max, y_max = detection[3:7]

        cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

# 显示结果

cv2.imshow("Face Detection", frame)

cv2.waitKey(0)

cv2.destroyAllWindows()

5.4 Behavior recognition

First load the already trained behavior recognition model (trained through TensorFlow). Then the key frames are preprocessed and input into the model for inference to obtain behavioral prediction results. Finally, the predicted behavioral results are annotated on the image, and the annotated image is displayed or saved for display to monitoring personnel or for further processing and analysis.

以下是一个用TensorFlow进行行为识别的简单代码:

import tensorflow as tfimport cv2

# 加载行为识别模型

model = tf.keras.models.load_model('behavior_model.h5')

# 读取关键帧

frame = cv2.imread("keyframe.jpg")

# 对关键帧进行预处理

processed_frame = preprocess_image(frame)  # 进行图像预处理,例如调整大小、归一化等操作

# 使用模型进行推理

predictions = model.predict(processed_frame)

# 获取最终的行为识别结果

predicted_behavior = get_predicted_behavior(predictions)  # 根据模型输出获取最终的行为识别结果

# 在图像上标注行为识别结果

cv2.putText(frame,predicted_behavior,(50,50),cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 显示带有行为识别结果标注的图像

cv2.imshow("Behavior Recognition", frame)

cv2.waitKey(0)

cv2.destroyAllWindows()

5.5 Data analysis

Leverage the distributed computing power of Intel® DevCloud and Intel® oneAPI to process and analyze large-scale video data in parallel.

Use a distributed computing framework such as Apache Spark to shard data and process it in parallel on multiple processors to increase processing speed and efficiency.

In the data analysis step, parallel processing and analysis of large-scale video data is very critical, which we achieve through the distributed computing capabilities of Intel® DevCloud and Intel® oneAPI. Use a distributed computing framework (using Apache Spark) to process and analyze video data in parallel:

We use SparkSession to create a Spark application and read large-scale video data through Spark. Then use operations such as zipWithIndex and map to fragment the video data, and use mapPartitions to parallelize the processing tasks. Finally, the results of each shard are summarized through reduce, and the final analysis results are displayed or saved. It is necessary to write functions such as process_video_data, merge_results and show_or_save_results according to different business scenarios and data characteristics to implement actual data analysis logic and result display operations.

from pyspark.sql import SparkSession

# 创建SparkSession

spark = SparkSession.builder.appName("VideoDataAnalysis").getOrCreate()

# 读取大规模视频数据

video_data = spark.read.format("video").load("hdfs://path_to_video_data")

# 对视频数据进行分片

video_data_rdd = video_data.rdd.zipWithIndex().map(lambda x: (x[1] % num_partitions, x[0]))

# 在多个处理器上并行处理

result_rdd = video_data_rdd.mapPartitions(process_video_data)

# 将分析结果汇总

final_result = result_rdd.reduce(merge_results)

# 展示或保存分析结果

show_or_save_results(final_result)

# 停止SparkSession

spark.stop()

5.6 Result display

Use an image processing library (such as OpenCV) to annotate the results of face detection and behavior recognition on the original video frames, and display the results in real time or save them as alarm records.

Real-time result display: The results of face detection and behavior recognition can be displayed through real-time video streaming. For example, video images marked with face frames and behavior categories are displayed in real time on the monitoring screen of the monitoring center. This can help monitoring personnel detect abnormal situations in time and take appropriate measures.

Display real-time face detection results using the OpenCV library

import cv2from openvino.inference_engine import IECore

# 加载模型

ie = IECore()

net = ie.read_network(model='face_detection.xml', weights='face_detection.bin')

exec_net = ie.load_network(network=net, device_name='CPU')

# 读取视频流

video_capture = cv2.VideoCapture(0)

while True:

    # 逐帧读取视频

    ret, frame = video_capture.read()

    if not ret:

        break

    # 预处理

    input_blob=cv2.dnn.blobFromImage(frame,size=(300,300), ddepth=cv2.CV_8U)

    # 推理

    result = exec_net.infer(inputs={'input_blob_name': input_blob})

    # 解析结果

    for detection in result['detection_out']:

        confidence = detection[2]

        if confidence > 0.5:

            x_min, y_min, x_max, y_max = detection[3:7]

            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

    # 显示结果

    cv2.imshow("Real-time Face Detection", frame)



    # 按下 'q' 键退出循环

    if cv2.waitKey(1) & 0xFF == ord('q'):

        break

video_capture.release()

cv2.destroyAllWindows()

我们实时从摄像头中获取视频流,并在每一帧上进行人脸检测并实时展示结果。

报警记录保存:当发现异常情况时,还将标记了人脸检测和行为识别结果的关键帧保存为报警记录,以便后续的查看和分析。通过将结果帧保存为图片文件来实现。

保存标记了人脸检测结果的关键帧

import cv2

# 读取关键帧

frame = cv2.imread("keyframe.jpg")

# 在关键帧上标记人脸检测结果# ...

# 保存标记后的关键帧为报警记录图片文件

cv2.imwrite("alarm_record.jpg",frame)print('Saved alarm record: alarm_record.jpg')

Through the above methods, we achieve real-time display of face detection and behavior recognition results and save alarm records of abnormal situations. This can effectively improve the intelligence level and work efficiency of the monitoring system.

The above is the rough implementation process of this plan.

6 Solution advantages and applicable scenarios

6.1 Problems solved

1. Security: It can monitor activities in the surveillance area in real time, identify abnormal behaviors (such as theft, fights, etc.), and issue alarms in a timely manner, thereby improving security and reducing the possibility of criminal incidents.

2. Accident prevention: By conducting real-time analysis of activities in the monitored area, the system can identify potential safety risks and dangerous behaviors, and take timely preventive measures to reduce the occurrence of accidents.

3. Personnel management: The system can help managers monitor and track the activities of staff and customers, and assist managers to better allocate resources and plan work processes.

4. Data analysis: The system can collect a large amount of video data and use data analysis technology to extract useful information, such as customer flow statistics, behavioral trend analysis, etc., to provide reference for business decisions.

5. Remote monitoring: Users can remotely access the monitoring screen through the network to realize remote monitoring and management of the monitoring area, improving management efficiency and convenience.

6.2 Applicable scenarios

1. Stores and supermarkets: used for theft prevention and management supervision.

2. Public transportation hub: used to monitor public places such as stations and airports to ensure the safety and order of passengers.

3. Factories and warehouses: Used to monitor production lines and storage areas to improve safety and production efficiency.

4. Schools and campuses: used for student safety and management, monitoring campus activities.

5. Intelligent traffic management: Use video surveillance systems to monitor traffic flow and detect violations to improve the efficiency of road traffic management.

6. Environmental monitoring: Combined with image recognition and monitoring technology, it can be used in fields such as environmental monitoring and natural disaster early warning.

7. Healthcare: Used to monitor patients and the elderly in hospitals and nursing homes to ensure their safety and health.

7 Summary

In the future, with the continuous development of deep learning technology and hardware accelerators, intelligent monitoring systems will achieve higher accuracy and faster processing capabilities, bringing more possibilities to the security field. Intelligent video surveillance systems based on Intel® AI Analytics Toolkits are expected to make significant progress in the field of intelligent surveillance. However, implementing this solution also faces some challenges, including data annotation for model training, large computing power requirements, real-time requirements, etc. Factors such as hardware, software, and data need to be comprehensively considered to achieve stable operation and high efficiency of the system. deal with.

Toolsused in the solutionand components

1.Intel® Distribution of OpenVINO™ Toolkit

2. Intel® DevCloud

3. Intel® oneAPI acceleration tool

Guess you like

Origin blog.csdn.net/chenchenchencl/article/details/134741263