Real-time wearable object detection

A software architecture for real-time object detection using machine learning (ML) in an augmented reality (AR) environment. Real-time Wearable Object Detection for Augmented Reality: ONNX Conversion and Deployment Based on YOLOv8

Our approach uses the recent state-of-the-art YOLOv8 network running on a Microsoft HoloLens 2 head-mounted display (HMD). The main motivation behind this research is to apply advanced ML models to enhance perception and situational awareness through a wearable, hands-free AR platform. We demonstrate the image processing pipeline of the YOLOv8 model and techniques to make it real-time on a resource-constrained edge computing platform for headsets. Experimental results show that our solution achieves real-time processing without the need to offload the task to the cloud or any other external server, while maintaining satisfactory accuracy in terms of the usual mAP metrics and qualitative performance of the measurements.

Augmented reality (AR) technology, which falls under the category of immersive technologies, offers the ability to blend digital artifacts with the physical environment by overlaying digital content in the user's field of view (FoV) (pictured below).

Currently, popular AR applications running on mobile devices such as smartphones or tablets can be further enhanced with machine learning (ML). Thanks to this approach, we can include vision-based object detection and tracking features on video and image data. However, mobile AR solutions have significant limitations, such as a relatively small FoV limited by the screen canvas or requiring manual controls. The latter narrows down the potential scenarios where we can successfully deploy AR, such as manual assembly, equipment repair tasks, or the use of AR enhancers by the elderly. In this case, users are not only able to move their hands freely, but also rapidly change the unconstrained FoV or body posture, which is critical for safety issues and task completion.

Alternative technologies to wearable smart head-mounted displays (HMDs) circumvent these caveats. AR headsets, such as the widely considered state-of-the-art Microsoft HoloLens 2 (HL2), offer hands-free AR experiences. Unfortunately, HL2 and other similar headsets do not provide satisfactory support for ML-based processing, which can enhance the user's ability to interact with the environment. Therefore, running onboard real-time ML models in the headset's edge computing platform is crucial for developing new AR application domains.

We address real-time object detection on HL2, including the latest YOLOv8 framework. We focus on defining the necessary steps to achieve the image processing frame rates required for onboard ML models while identifying the constraints of the HL2 computing platform. Overcoming these limitations enables the use of widely available ML algorithms on headphones. We also believe that AR developers can leverage our work on YOLOv8 for HL2 to create new applications that extend the current use cases for this headset.

The overall idea is shown in the figure below. We first prepare the YOLOv8 neural network model for HL2. These models can be selectively retrained (fine-tuned) to include different detection classes. The next step involves exporting the model to the Open Neural Network Exchange (ONNX) format. The Barracuda library in the Unity engine then uses this model to perform object detection on HL2 and provides a visualization of the detected objects. We decided to use the Unity platform because it is one of the most widely used software frameworks in AR and VR (Virtual Reality) research.

 

Barracuda library for ML inference

The neural network part of the detection pipeline is based on the Barracuda library. It is an open source library developed by Unity for using neural networks in game engines. It supports the most common deep learning layers and provides GPU and CPU inference engines. By loading pre-trained neural networks using the ONNX format, cross-framework support for different machine learning libraries is ensured. It enables interoperability between different ML frameworks, providing a set of standard operations for deep learning.

model preparation

Every model used in online operations can be prepared using the same pipeline. We export each model in the PyTorch serialized .pt file to ONNX format. Since the current Barracuda version supports ONNX deep learning operations (opsets) as high as version 9, it is critical to export a model with the correct opset flags. In addition to exporting, models can also be reduced using the ONNX simplification tool. This operation merges redundant operators using constant folding, which speeds up inference. We successfully tested exporting and deploying the publicly available original YOLOv8 object detection model. Furthermore, we can train YOLOv8 for any custom class with sufficient data, while following the guidelines for model fine-tuning on custom datasets. whaosoft aiot  http://143ai.com

Object Detection Pipeline on HL2 We propose a generic pipeline for onboard real-time YOLO object detection on HL2 as shown in the figure above. 

experiment

The performance of different detection model configurations is shown in the figure below. mAP@50 and mAP@50-95 average the performance within the IoU threshold range. The obtained results show that inference performance drops significantly when smaller models are used. 

Besides the size of the network, another possibility is to reduce the size of the input image, since it directly affects the inference time. For different image input sizes, the results we obtained are shown in the figure above. Performance of the proposed real-time YOLOv8n when using an input image size of 160×160 

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/132228815