Intelligent travel assistance system for visually impaired people integrating MMEdu and Transformers technologies (Shanghai Puyu AI Future Summer Camp final paper)

Intelligent travel assistance system for visually impaired people integrating MMEdu and Transformers technologies

Summary

Faced with the travel needs of many visually impaired people in social life, the visually impaired travel intelligent assistance system integrates MMEdu and Transformers technologies to provide real-time road condition analysis and more convenient, safe and comfortable travel services for the visually impaired. This system uses software technologies such as image classification, target detection, and depth estimation to analyze each frame of image acquired by the camera in real time, detect the presence of blind roads, traffic light colors, obstacles ahead, etc., and broadcast it to the visually impaired through voice. Based on the original research on travel assistance equipment for blind people on the Internet, our intelligent system has innovated in the application of technology, combining multiple AI technologies to achieve accurate and real-time information analysis. Based on the existing research results, this system will continue to be improved in the future. We hope to replace the existing code conversion form of ".bat" file with packaging the code into mobile phone software or making it into a front-end form to call the mobile phone camera Carry out identification, make the device more portable, and speed up the operation of the program to make the program more sensitive by using lighter models and other means.

Key words

Intelligent assistance system for visually impaired travel Image classification Target detection Depth estimation Software technology

I. Introduction

With the continuous development of society, more and more people are paying attention to the quality of life of disabled people, especially the visually impaired. As a special group in society, the visually impaired face many difficulties in their daily lives, especially in travel, due to their loss of visual perception. In the past, visually impaired people usually relied on traditional assistive tools such as guide dogs and blind canes to navigate and perceive their surroundings. However, due to the limitations of these tools, they often could only provide limited help and were difficult to meet the growing needs of visually impaired people. need. Therefore, how to improve the travel experience of visually impaired people so that they can travel freely in a safer and more convenient environment has become an urgent problem in today's society.

1.1   Research background

In recent years, with the rapid development of artificial intelligence technology, research on using machine learning algorithms to solve the travel problems of the visually impaired has attracted more and more attention. For example, based on target detection, image classification, depth estimation and other technologies, the intelligent assistance system established can help the visually impaired better perceive the surrounding environment and obstacles, improving their travel efficiency and safety. However, these technologies still have some limitations, such as the inability to handle complex scenes and insufficient recognition accuracy. Therefore, how to further optimize and improve these technologies and improve their application effects has become a research hotspot in this field.

1. Current status of travel environment for visually impaired people

The travel environment for visually impaired people faces multiple challenges. First of all, the lack or crowding of blind roads is gradually becoming a serious problem (see Figure 1.1), making it impossible for them to rely on blind roads to find the correct path; secondly, traffic light signals are often difficult for them to distinguish, making it difficult to pass the road; in addition, , because they cannot accurately determine whether there are obstacles ahead, they often face the risk of collision with pedestrians, vehicles, etc. These difficulties make traveling extremely difficult and inconvenient for visually impaired people.

Figure 1.1 Blind lanes are occupied/missing

2. History and current situation of auxiliary methods

Mobility for the visually impaired has been an important social issue over the past few decades. Traditionally, they often rely on guide dogs, blind canes, and other assistive tools to navigate and sense their surroundings. However, these methods have some limitations. The obstacles they can detect are relatively limited and cannot provide sufficient real-time information and safety guarantee. In addition, using the help of animals such as guide dogs may come with a series of high expenses.

In recent years, with the continuous advancement of technology, solutions to the travel problems of the visually impaired have become increasingly technological. However, these solutions still have some shortcomings, making them difficult to be widely used in the daily life of the visually impaired. For example, some projects use various sensors such as photoelectric sensing and infrared obstacle detection. Although they can accurately detect the surrounding environment, they are expensive and difficult to popularize [1].

3. Future development trends of assistive methods

With the rapid development of artificial intelligence and machine learning, technologies such as target detection, image classification, and depth estimation are gradually becoming mature. We can foresee that intelligent assistance systems based on these AI technologies may become powerful assistants for the visually impaired, helping them better integrate into society and enjoy more independence and autonomy.

1.2 Research purpose

1. Provide comprehensive and real-time travel assistance

In the past, visually impaired people could only rely on traditional tools such as canes or guide dogs to judge surrounding obstacles and road conditions. However, these tools have great limitations and cannot meet the complexity of modern urban transportation. Therefore, we hope to use advanced technology to enable visually impaired people to easily understand their surroundings in any environment. Specifically, we plan to integrate MMEdu and Transformers technologies and use voice broadcasts to comprehensively explain the surrounding environment information to blind people, including the types of obstacles in the environment, the lighting conditions of traffic lights, etc. At the same time, we will also use cameras and multiple models for real-time detection and analysis, so that visually impaired people can obtain accurate and real-time environmental descriptions through the system. In this way, no matter where they are, they can feel the changes in their surroundings and make appropriate decisions.

2. Increase travel safety and autonomy

Visually impaired people often feel uneasy and anxious because they cannot accurately perceive their environment; therefore, our intelligent assistance systems can help them take every step with more confidence. Our system will be able to help them perceive their surroundings more accurately and avoid collisions with obstacles such as pedestrians and vehicles; this will provide them with greater safety. At the same time, traffic light detection is used to help the visually impaired pass the road independently and get rid of dependence, thereby increasing their autonomy during travel; creating more opportunities and freedom for the visually impaired so that they can better enjoy life.

3. Provide practical and economical solutions

We understand that projects need to take into account the actual needs and affordability of users. Therefore, we will make this project by combining open source models and self-trained models. On the one hand, the open source model has been extensively researched and verified and has high accuracy and reliability; on the other hand, the self-trained model can meet some relatively niche needs in the project. This will not only reduce costs, but also increase the popularity of the solution, bringing convenience and well-being to more visually impaired people.

1.3 Research value

1. Call on society to pay attention to vulnerable groups

The visually impaired are one of the vulnerable groups in society who face many difficulties and challenges in their daily lives, especially when it comes to travel. Due to visual impairment, they often have difficulty meeting their daily travel needs independently and need to rely on the help of others or special assistive devices. Therefore, how to provide more convenient, safe and comfortable travel services for the visually impaired is a very important matter.

2. Improve the travel experience for visually impaired people

We make the equipment lightweight and install it on daily electronic products to improve the portability of the equipment. At the same time, the equipment uses cameras and multiple models to perform real-time detection and analysis to realize the depiction of the environment and improve the visual impairment. convenience for people.

All in all, the value of this study is not only to provide more convenient travel services for the visually impaired, but also to provide them with the opportunity to walk independently, allowing them to regain their confidence, get rid of dependence, and pick up their own lives and pursuits. It also called on the entire society to pay attention to and support vulnerable groups. By strengthening cooperation and joint efforts from all sectors of society, we can further promote social inclusion and equality and create a more beautiful, harmonious and dignified living environment for every citizen.

2. Project implementation process

2.1 Usage methods and principles

1. How to use

Our functions are mainly implemented through the MMEDu and Transformers libraries - using the MMclassification and MMdetection modules in the MMEDu library for image classification and target detection respectively; using the Pipeline module of the Transformers library for depth estimation.

2. Method principle

(1)Leadership

MMEdu is a deep learning development tool for computer vision and a tool for training AI models.

Table 2.1 Overview of MMEdu’s built-in modules[2]

module name

abbreviation

Function

MMClassification

MMCls

Image classification

MMDetection

MMDet

Object detection in pictures

MMGeneration

MMGen

GAN, stylized

MMPose

MMPose

skeleton

MMEditing

MMSegmentation

Pixel level recognition

①MMclassification

The main function of MMClassification (cls for short) is to classify images. It has built-in common image classification network models, including LeNet, MobilNet, ResNet18, ResNet50, etc.

Figure 2.1 Principle of image classification

The data set supported by the MMclassification module requires that the folder contain three picture folders, namely test_set, training_set, val_set, which store the pictures of the test set, training set and verification set respectively; and three txt files, classes.txt records the data The category of the set, test.txt and val.txt record the image names of the test set and verification set respectively.

Figure 2.2 Data set file structure[3]

②MMdetection

The main function of MMDetection (MMdet for short) is to output the names of multiple objects appearing in pictures or videos, and at the same time use a box to outline the square area where the object is located. The SOTA models it supports include FasterRCNN, Yolov3, SSD_Lite, etc.

Figure 2.3 Target detection principle

The data set supported by the MMdetection module requires that the folder contain two folders, annotations and images, which store annotation information and image data respectively. There are two json files, train and valid, under each folder.

Figure 2.4 Data set file structure[4]

(2)Pipeline from Transformers

Depth estimation is by identifying the difference in pixel coordinates of points in the same world coordinate system in different images, which is called parallax. The parallax between different images can be used to calculate the distance between the object and the shooting point, which is the depth.

Figure 2.5 Depth estimation principle diagram 1[5]

Figure 2.6 Depth estimation principle diagram 2[6]

2.2 Development process

1.Environment setup

We download the MMEdu library according to the method provided at https://xedu.readthedocs.io/zh/latest/mmedu/installation.html and build an environment for program writing and model training.

2. Model selection and training

We first selected some open source models that are difficult to train for us, but are widely used and have high accuracy and reliability to download [6][7]; used to classify and classify obstacles. Depth estimation from viewpoint distance.

Then perform relevant model training for relatively niche functional requirements.

(1) Blind road detection (image classification)

First, a large number of data set pictures are obtained through Internet search and actual shooting (Figure 2.7).

Figure 2.7 Blind road detection data set source image

Then use the pre-written program to train the model and obtain the weight file with the best training effect (Figure 2.8).

Figure 2.8 Model training

When training the model, it is necessary to appropriately adjust hyperparameters such as lr (learning rate) and batch_size (batch size) during training based on feedback to achieve higher training efficiency and better training results.

Finally, the trained model is subjected to an inference test to test the actual application effect of the model (Figure 2.9).

Figure 2.9 Model reasoning

If the resulting model does not perform well in inference, it should be retrained.

(2) Traffic light detection (target detection)

Hundreds of source images of the data set were also obtained through online queries and on-site shooting (Figure 2.10).

Figure 2.10 Traffic light effect classification data set source image

The photos are then annotated with data and converted to coco format (Figure 2.11).

Figure 2.11 Data annotation and conversion

Finally, MMEdu is used to train and infer the traffic light model. We found that this model can accurately locate the location of traffic lights and identify the lighting effects (see Table 2.2).

Table 2.2 Traffic light effect detection results

Original picture

Inference results

3. Programming

We write the program according to the idea shown in the figure (see Figure 2.12). Detailed functions will be introduced below.

Figure 2.12 Programming technology roadmap

3. Project innovation points and function display

3.1 Main functions

1. Real-time detection of images

In the program startup part, OpenCV is used to turn on the camera to ensure that image analysis can be time-sensitive in the form of real-time video acquisition.

Figure 3.1 Part of the code for opening the camera

2. Road type classification and reminders

First, import the road image classification model and use the BaseDeploy library to determine whether what you see is a blind road. If not, it means that the road is uncertain and unreliable; therefore, blind people are reminded to "move forward carefully and ensure safety."

Figure 3.2 Display of key codes for road type classification

3. Obstacle information broadcast

First import the ssdlite_mobilenetv2_scratch_600e_coco_20210629_110627-974d9307.pth (coco.onnx) model used to detect item categories, as well as the item categories that the model can detect.

Figure 3.3 Obstacle Monitoring-Target Detection Part Key Code Display 1

Figure 3.4 Item categories that the model can detect (text document)

Then detect the type of obstacle res seen in front of you, and obtain the position coo of the obstacle in the image.

Figure 3.5 Obstacle Monitoring-Target Detection Part Key Code Display 2

At the same time, import the model dpt-hybrid-midas used for depth estimation to obtain the distance between each pixel in the image and the viewpoint.

Figure 3.6 Obstacle monitoring-depth estimation part of the key code display 1

According to the obstacle coordinates obtained during target detection, the minimum distance between the obstacle part in the image and the viewpoint is obtained, and the obstacle type and distance information are combined to form the environmental information warn_sentence broadcast to the blind.

Figure 3.7 Obstacle Monitoring-Depth Estimation Part Key Code Display 2

4. Traffic light classification and reminders

If the obstacle type res includes the number 9 representing a traffic light, the light.onnx model is used to detect the type of traffic light seen. If a red light is recognized (label 0 has been set to represent a red light during model training), "Red light, please stop" will be broadcast; otherwise, "Green light, please pass" will be broadcast.

Figure 3.8 Display of key codes for traffic light classification 1

If a vehicle suddenly appears when a green light is detected, it means an emergency has occurred. The device first uses the winsound library to emit a buzzer, and then broadcasts "Emergency, please give way."

Figure 3.9 Display of key codes for traffic light classification 2

3.2 Function display

Table 3.1 Equipment function display

Red light + non-blind road

green light

Environmental Monitoring + Blind Road

3.3 Innovation points

1. Explore innovation and use AI technology to reduce the use of hardware

As mentioned in §1.1, most existing obstacle avoidance assistance systems use a large amount of hardware. However, we use artificial intelligence technology to replace the original work of the hardware and simplify the equipment, but still achieve the desired effect.

2. Application innovation: Combine known artificial intelligence technology with other libraries to achieve reminder functions

This system faces the problem of extremely weak vision of blind people. It uses the pyttsx3 library to implement speech synthesis and broadcast reminders through speakers, and uses the winsound library to generate buzzer alarms. Compared with installing hardware such as speech synthesis modules/buzzers, it is simpler. Portable; it also implements our concept of “using AI technology to reduce hardware usage”.

3. Combination innovation: combine multiple technologies to achieve expected results

This system combines target detection, depth estimation, image recognition and other technologies to detect obstacles, blind roads and other objects, and the detection effect is good.

4. Theme Argument

The visually impaired travel intelligent assistance system provides intelligent travel assistance for the visually impaired. It aims to call on society to pay attention to disadvantaged groups, help the visually impaired regain their confidence, and build an equal and harmonious social environment. Utilizing open source resources to provide inspiration and technical support for this project, the visually impaired travel intelligent assistance system uses artificial intelligence software technology to conduct real-time analysis of road conditions in real society to assist the visually impaired.

Image classification, target detection and depth estimation are the core technologies for realizing this intelligent system. Image classification technology can classify images into multiple categories through a large amount of model training; target detection technology can capture target information in the image; depth estimation technology calculates the distance from each pixel in the image to the camera, using the depth of the color Presents the estimation results. With the mutual support of different software technologies, the intelligent system analyzes each frame of image acquired by the camera in real time, detects the presence of blind roads, the color of traffic lights, obstacles ahead, etc., and broadcasts it to the visually impaired through voice. In practical applications, image classification technology detects the presence of blind roads and helps the visually impaired plan travel routes; target detection technology determines traffic light conditions and obstacles ahead, allowing the visually impaired to avoid obstacles in time to avoid accidents; depth estimation technology The distance from the object to the camera is reported to the visually impaired to improve the accuracy of the system in detecting obstacles. The visually impaired can use this intelligent system to know the road conditions ahead when traveling, so as to assist the visually impaired in traveling and ensure the safety of the visually impaired.

In actual tests, this system has been able to analyze and report road conditions accurately and in real time, while also playing the role of avoiding accidents such as traffic accidents. This practical effect not only realizes the basic functions of the system, but also maintains the harmony and stability of social life, and also arouses the whole society's attention and support for disadvantaged groups represented by the visually impaired.

5. Project results and prospects

5.1 Project results

The visually impaired travel intelligent assistance system that integrates MMEdu and Transformers technologies has basically met expectations.

The intelligent travel assistance system for the visually impaired designed in this article, which integrates MMEdu and Transformers technologies, is implemented through Thonny and has the following outstanding features:

1. Able to provide comprehensive and real-time travel assistance to the visually impaired;

2. Increase the safety and autonomy of visually impaired people when traveling;

3. Provide practical and economical solutions to solve the travel problems of the visually impaired;

4. Successfully use AI technology to reduce the use of sensors;

5. Successfully combine target detection, image classification, depth estimation and other technologies to achieve expected results and read environmental information;

6. The final effect can be executed directly by double-clicking, making the interaction friendly.

The main work flow of this project is divided into the following parts:

1. Find the problem

2. Background research

3. Review existing research and relevant literature

4. Design solutions

5. Clarify the technical route and operating logic of the system

6. Search the data set using graphs

7. Find open source models and training models

8. Integrate, program and experiment

9.Write a research report

5.2 Future prospects

1. Install the system on your mobile phone

Although the existing .bat file can meet the basic needs, because the file itself is installed on the Windows system, it cannot meet the external needs of all users, and the convenience still needs to be improved. Therefore, we hope to replace the existing code conversion form of ".bat" file with the form of packaging the code into mobile software or making it into a front-end, so that users can call the mobile phone camera for identification.

This change not only makes the device easier to carry, but also improves the user experience: through mobile phone software, users can identify objects anytime and anywhere, without being limited to fixed places and devices.

2. Speed ​​up the program operation speed

We also hope that by using more lightweight models and other means, we can speed up the running speed of the program, make the program more sensitive, and bring users a more flexible and efficient environmental information monitoring experience, so that users can understand the surrounding environment more quickly.

6. References

  1. Ruan Xiaoyang, Song Fangzhou. An intelligent travel assistance device for the blind based on Arm technology [J]. Electronic Technology and Software Engineering, 2016 (18): 138-138.
  2. MMEdu basic functionality — OpenXLabEdu documentation .
  3. Unlocking the image classification module: MMClassification — OpenXLabEdu documentation .
  4. Demystifying the Object Detection Module: MMDetection — OpenXLabEdu documentation .
  5. A review of monocular depth estimation methods - Zhihu .
  6. Intel/dpt-hybrid-midas · Hugging Face.
  7. https://github.com/open-mmlab/mmdetection/tree/main/configs/ssd.

Guess you like

Origin blog.csdn.net/zyl_coder/article/details/132654555