Development of assisted driving APP based on Android mobile phone

Directory Structure:

  1. Project introduction

  2. Network design

  3. Data collection

  4. APP development

  5. APP download

  6. Effect display

 

1. Project introduction:

The main purpose of this project is to explore whether it is possible to implement an APP with a driving assistance function on a general-purpose Android phone.

Function and performance:

  • It can detect regular traffic participants such as pedestrians, bicycles, battery cars, cars, trucks, and buses.

  • In the case of a large number of pedestrians, etc., a reminder can be issued.

  • When the distance to the vehicle ahead is too small, a reminder can be issued.

  • Ability to detect lane lines (to be added in the next version)

  • A reminder can be issued when departing from the lane line (to be added in the next version)

  • Can run in real time on mobile phones (>=10fps)

 

2. Network design

To realize the detection of vehicles, pedestrians, etc., a conventional target detection network is required (regardless of the direction of the object, otherwise the performance of the mobile phone is not enough), and a lane line detection network is required to realize lane line detection. If you use the serial method, it will inevitably increase the delay, which will cause the mobile terminal to fail to run smoothly. Therefore, it is considered to merge the two networks here, share the backbone and neck parts, and the target detection head and the lane line detection head are independent of each other. As shown below:

 

Object detection network:

There are actually many lightweight target networks, such as mobilenetSSD, yolo-lite, yolo-fastest, nanodet, yolov5s, etc. In terms of comprehensive computing power and accuracy, yolov5s is a good choice. In general, on the basis of yolov3, yolov5 adds some modules that reduce the amount of calculation, such as FOCUS module, CSP module, etc., which reduces the amount of calculation of yolo. On the other hand, it uses PAN, SPP, etc. to increase the network accuracy. This target detection chooses yolov5s.

Although yolov5s is already a relatively lightweight general-purpose target detection framework, it is still a great challenge for devices such as mobile phones to achieve real-time detection. Using the official yolov5s model to run on a mobile phone, the time to detect a picture is about 800ms-1.5s, which is far from real-time. Therefore, it is necessary to simplify yolov5s. The following figure shows the network structure diagram of yolov5s (input 640*640).

The following briefly introduces the network structure. It is mainly composed of backbne+neck+head, as shown in the three gray modules in the figure. Regarding each module used in the network, its details are shown in the beige module in the figure. Among them, CBH is the basic module in yolov5, including a convolution, a BN layer and a Swish activation layer. FOCUS is a fast downsampling method. In the traditional CNN network, two large convolutions 7*7 or 5*5 are generally used to quickly reduce the resolution of the input image. But a large convolution kernel means a large amount of calculation. FOCUS first slices the input image, and then cats it together to fuse information with a convolution. It greatly reduces the amount of calculation in the early stage of the network. Let's look at the Neck part of the network again. Obviously, another branch is added. The original FPN upsamples the small-resolution feature map and fuses it to predict each level. The current PAN adds a path from large resolution downsampling to small resolution, and further integrates the information between each level, so that each level can well retain spatial information and semantic information.

 

Yolov5s calculation amount

Next, analyze the calculation amount of yolov5s. The red font in the figure has marked the proportion of calculation of each module. It can be seen that the backbone accounts for about 72% of the calculation, while the neck accounts for 27% of the calculation. The total calculation amount is about 7.59G calculation amount. This amount of calculation is still very huge compared to mobile phones. Generally, in order to run smoothly on mobile phones, it is best to reduce the amount of calculation to <1G. Due to the huge amount of calculation at present, you can consider reducing the resolution of the input image. We know that the computational complexity of CNN on the resolution of the input image can be expressed as: O(H*W), where H is the height of the input image and W is the width of the input image. Therefore, reducing the image resolution can reduce the computational complexity by a factor of two, which is still relatively large. For example, if the input resolution of the image is reduced to half, the computational complexity becomes O(1/2H*1/2W) = 1/4O(H*W). That is to say, the resolution is reduced by half, and the calculation amount is reduced to 1/4 of the original. Typically lightweight classification networks. Considering that our assisted driving only needs to judge the objects near the vehicle, and does not need to recognize the small targets in the distance, then choose the resolution of 320*320 as the input. In this way, the calculation amount becomes approximately 1/4*7.59G = 1.89G.

 

Yolov5s cropping

Continue to observe the network structure. Since the neck uses PAN, a data path is added, which doubles the calculation amount of the neck part. Then you can consider changing PAN to FPN to reduce the calculation amount. As shown below:

Another way is to reduce the output level. By default, Yolov5s outputs 3 levels of feature maps corresponding to stride=8, 16, and 32 respectively. Since the input resolution has been reduced by us, and in order to achieve the goal of real-time operation, we can also consider deleting the level of stride=8, which can also reduce the amount of calculation. As shown below:

Comparing the two schemes, deleting the PAN channel reduces the calculation amount relatively more, but combined with the final accuracy (MAP) of the training, the accuracy of the deletion of the PAN channel is more obvious, and the deletion of the stride=8 scheme can obtain a Better balance. Therefore, the final network structure of the simplified version is:

 

Lane line detection network:

There are not many lightweight lane detection networks. Because most of the lane line detection is based on segmentation tasks. Everyone knows that one of the characteristics of segmentation tasks is the need to maintain a high resolution, so the amount of calculation and memory are not a small challenge. Here is a relatively innovative lane line detection algorithm: "Ultra Fast Structure-aware Deep Lane Detection", the effect is average, and the official network only supports 4 lane line detection, but the feature is fast. The motivation of the paper is to solve the problem of large delay in detecting lane lines in traditional segmentation-based methods. Let's interpret this paper (Paper address: https://arxiv.org/abs/2004.11757)

 

core method

Lane line detection is defined as a collection of lane lines in certain rows in the image, that is, based on row-based position selection and classification (row-based classification). This sentence is the core of the whole paper, as shown in the figure below.

In lane line detection based on segmentation tasks, the classification object is each pixel point, and a pixel point is classified into C+1 categories (C is the preset number of lane lines, +1 represents the background category, due to the use of CE loss, So you need to add a background class, if you don't use CE loss, +1 may not be needed). The scheme is to classify a line, and the number of categories is w+1 (w indicates the number of divisions in the width of the picture, that is, the number of grid_cells, and +1 indicates that there is no lane line on the line), which means that the line is above Where is the driveway. Therefore, the classification meanings of the two schemes are different. The classification of the segmentation task is: H*W*(C+1), and the classification of the paper is: C*h*(w+1). Due to the classification method of this paper, it is very convenient to make some prior constraints, such as smoothness and rigidity between phase leading lines. However, based on the original segmentation task, it is difficult to do this constraint.

 

How to fix slow

Since our solution is row-wise selection, assuming we make selections on h rows, we only need to deal with the classification problem on h rows, but the classification problem on each row is W-dimensional. Therefore, this simplifies the original HxW classification problems to only h classification problems, and since the positioning on which rows can be set manually, the size of h can be set as needed, but generally h is much smaller than the image Height H. (The author said so in the blog, but I think that although the classification problems of this scheme have become less, the dimension of each classification has become larger from the original C+1 to w+1, that is, the original H* W classification problems, the dimension of each classification is C+1, now there are C*h classification problems, the dimension of each classification is w+1, so the final total classification dimension is from H*W*(C+1 ) becomes C*h*(w+1), so the main reason for the speed is h<<H and w<<W, not the reduction in the number of categories.)

 

How to deal with feeling wild

The difficult problem of complex lane line detection caused by the small local receptive field. Since our method is not a fully convolutional form of segmentation, but a general classification based on fully connected layers, the features it uses are global features. This directly solves the problem of the receptive field. For our method, when detecting the position of a lane line in a row, the receptive field is the size of the entire image. Therefore, good results can be achieved without complex information transmission mechanisms. (This also brings about the problem of a large amount of parameters, because h and w do not drop as much as the classification network, so for the last fully connected layer, the amount of parameters will be large)

 

prior constraints

Smoothness constraints:

Rigid constraints:

 

overall network structure

The overall network structure is shown in the figure below (the segmentation module is used as the training module, and it is not required for reasoning)

 

3. Data collection

Generally speaking, deep learning relies on a large amount of training data. Fortunately, for traffic scenarios, there are already a lot of open source datasets. Such as kitti, bdd100k, apploscape, cityscapes, tuSimple Lane, Culane, etc. Since there is only mocrosoft's coco dataset at hand, and the coco dataset contains regular pedestrians, non-motor vehicles (bicycles), motor vehicles (car, bus, trunk), signal lights, etc. So the author is going to use the coco data set for training first.

 

COCO dataset screening

Since the original coco data set has 80 categories, many of them are meaningless for our assisted driving, so it is necessary to select data with traffic categories from it. Here the author wrote a python script to perform automated screening.

from pycocotools.coco import COCOimport numpy as npimport skimage.io as ioimport matplotlib.pyplot as pltimport osfrom PIL import Imagefrom PIL import ImageDrawimport csvimport shutildef create_coco_maps(ann_handle):    coco_name_maps = {}    coco_id_maps = {}    cat_ids = ann_handle.getCatIds()    cat_infos = ann_handle.loadCats(cat_ids)    for cat_info in cat_infos:        cat_name = cat_info['name']        cat_id = cat_info['id']        if cat_name not in coco_name_maps.keys():            coco_name_maps[cat_name] = cat_id        if cat_id not in coco_id_maps.keys():            coco_id_maps[cat_id] = cat_name    return coco_name_maps, coco_id_mapsdef get_need_cls_ids(need_cls_names, coco_name_maps):    need_cls_ids = []    for cls_name in coco_name_maps.keys():        if cls_name in need_cls_names:            need_cls_ids.append(coco_name_maps[cls_name])    return need_cls_idsdef get_new_label_id(name, need_cls_names):    for i,need_name in enumerate(need_cls_names):        if name == need_name:            return i    return Noneif __name__ == '__main__':    # create coco ann handle    need_cls_names = ['person','bicycle','car','motorcycle','bus','truck','traffic light']    dst_img_dir = '/dataset/coco_traffic_yolov5/images/val/'    dst_label_dir = '/dataset/coco_traffic_yolov5/labels/val/'    min_side = 0.04464 # while 224*224, min side is 10. 0.04464=10/224    dataDir='/dataset/COCO/'    dataType='val2017'    annFile = '{}/annotations/instances_{}.json'.format(dataDir,dataType)    ann_handle=COCO(annFile)    # create coco maps for id and name    coco_name_maps, coco_id_maps = create_coco_maps(ann_handle)    # get need_cls_ids    need_cls_ids = get_need_cls_ids(need_cls_names, coco_name_maps)    # get all imgids    img_ids = ann_handle.getImgIds() # get all imgids    for i,img_id in enumerate(img_ids):        print('process img: %d/%d'%(i, len(img_ids)))        new_info = ''        img_info = ann_handle.loadImgs(img_id)[0]        img_name = img_info['file_name']        img_height = img_info['height']        img_width = img_info['width']        boj_infos = []        ann_ids = ann_handle.getAnnIds(imgIds=img_id,iscrowd=None)        for ann_id in ann_ids:            anns = ann_handle.loadAnns(ann_id)[0]            obj_cls = anns['category_id']            obj_name = coco_id_maps[obj_cls]            obj_box = anns['bbox']             if obj_name in need_cls_names:                new_label = get_new_label_id(obj_name, need_cls_names)                x1 = obj_box[0]                y1 = obj_box[1]                w = obj_box[2]                h = obj_box[3]                #x_c_norm = (x1) / img_width                #y_c_norm = (y1) / img_height                x_c_norm = (x1 + w / 2.0) / img_width                y_c_norm = (y1 + h / 2.0) / img_height                w_norm = w / img_width                h_norm = h / img_height                if w_norm > min_side and  h_norm > min_side:                    boj_infos.append('%d %.4f %.4f %.4f %.4f\n'%(new_label, x_c_norm, y_c_norm, w_norm, h_norm))        if len(boj_infos) > 0:            print('  this img has need cls')            shutil.copy(dataDir + '/' + dataType + '/' + img_name, dst_img_dir + '/' + img_name)            with open(dst_label_dir + '/' + img_name.replace('.jpg', '.txt'), 'w') as f:                f.writelines(boj_infos)        else:            print('  this img has no need cls')

 

4. APP development

model conversion

Since the training process uses the pytorch framework, and the pytorch framework cannot be run directly on the mobile phone, it is necessary to convert the pytorch model to a model supported by the mobile phone. This is actually a problem of deep learning model deployment. There are many open source projects to choose from, such as mnn, ncnn, tnn, etc. Here I choose to use ncnn, because ncnn is open source early, many people use it, and the network support and hardware support are not bad. Unfortunately, ncnn does not support importing the pytorch model directly. It needs to be converted into onnx format first, and then import the onnx format into ncnn. Also note that there are many glue ops after the pytroch model is transferred to onnx, which is not supported in ncnn. You need to use another open source tool: onnx-simplifier to cut the onnx model and then import it into ncnn. Therefore, the whole process is still a bit cumbersome. For the sake of simplicity, I wrote a conversion script from "pytorch model -> onnx model -> onnx model simplification -> ncnn model", which is convenient for everyone to convert with one click and reduce errors in the intermediate process. I posted the code of the main process.

# 1、pytroch模型导出到onnx模型torch.onnx.export(net,input,onnx_file,verbose=DETAIL_LOG)# 2、调用onnx-simplifier工具对onnx模型进行精简cmd = 'python -m onnxsim ' + str(onnx_file) + ' ' + str(onnx_sim_file)ret = os.system(str(cmd))# 3、调用ncnn的onnx2ncnn工具,将onnx模型准换为ncnn模型cmd = onnx2ncnn_path + ' ' + str(new_onnx_file) + ' ' + str(ncnn_param_file) + ' ' + str(ncnn_bin_file)ret = os.system(str(cmd))# 4、对ncnn模型加密(可选步骤)cmd = ncnn2mem_path + ' ' + str(ncnn_param_file) + ' ' + str(ncnn_bin_file) + ' ' + str(ncnn_id_file) + ' ' + str(ncnn_mem_file)ret = os.system(str(cmd))

APP structure

After completing the development of all algorithm modules, APP development is actually a matter of course. The framework in the sports technology APP is used here, as shown in the figure below:

 

APP main source code:

Source code example in reference counting APP development:

Activity class core source code

Camera class core source code

Alg class core source code


5. APP download

Yuque platform download:

https://www.yuque.com/lgddx/xmlb/ny150b

 

Baidu network disk download:

Link: https://pan.baidu.com/s/13YAPaI_WdaMcjWWY8Syh5w

Extraction code: hhjl


6. Effect display

Hand-held mobile phone to take pictures at the intersection:

https://www.yuque.com/lgddx/xmlb/wyozh2

 

>Duiduixing WeChat: 15158106211

>Official WeChat discussion group: add stars first, then pull into the group

>Official QQ discussion group: 885194271

Guess you like

Origin blog.csdn.net/cjnewstar111/article/details/116330242