Bird detection and recognition system based on deep learning (including UI interface, Python code)

insert image description here

Abstract: Bird recognition is a popular application in the field of deep learning and machine vision. This article introduces the bird detection and recognition system based on YOLOv5 in detail. While introducing the principle of the algorithm, it also gives the implementation code of Py th o n and Py th o n Qt 's UI interface. In the interface, you can select various bird pictures, videos, and turn on the camera for detection and recognition; you can select files through the UI interface, switch the mark recognition target, support switching models, and support users to log in to the registration interface; based on YOLOv5 model training implementation, providing training data Set and training code, with fast detection speed and high recognition accuracy; in addition, trainable code and data set are also provided. The blog post gives an introduction to Python code and a tutorial on how to use it, which is suitable for beginners to refer to. For the complete code resource file, please go to the download link at the end of the article. The catalog of this blog post is as follows:

➷Click to jump to the download page of all the complete code files involved at the end of the article☇

Code introduction and demonstration video link: https://www.bilibili.com/video/BV1QL411C783/ (It is being updated, welcome to follow the blogger B station video)


foreword

        As an important indicator of biodiversity and ecological environment in a region, "the number and distribution of birds" has been paid more and more attention by agencies such as nature reserves, wetland parks, and animal protection supervision departments. Real-time monitoring of bird species, number and distribution, It has become a normalized work in various regions. This article uses the YOLOv5 target detection algorithm, which can provide AI technical support for bird monitoring and recognition, improve the efficiency of monitoring and recognition, solve the inefficiency and errors caused by pure manual monitoring, and provide better data support for bird protection and breeding.

        Bird monitoring and identification is of high complexity. Bird flight routes are uncertain, landing points are uncertain, and time is uncertain. Monitoring areas are diverse (forests, wetlands, lakes, grasslands, etc.), and some birds have extremely similar habits, shapes, and colors. , These factors make the requirements for bird monitoring and identification technology extremely high. Traditional machine vision algorithms are difficult to accurately and quickly identify the species and location of birds. In recent years, machine learning and deep learning have made great progress. Compared with traditional methods, deep learning methods have performed better in terms of detection accuracy and speed. performance. YOLOv5 is the fifth generation of the single-stage target detection algorithm YOLO. According to experiments, it has been significantly improved in terms of speed and accuracy. For its paper, please refer to TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection On Drone-captured Scenarios , the open source code can be found at https://github.com/ultralytics/yolov5 (official source code warehouse).

        Automated bird identification can help people more easily understand the number and activities of specific birds in a geographical area. At present, there are few applications of bird detection and identification on the Internet, and there are not many examples that can be referenced. Almost no one has developed it into a A complete software that can be displayed is not convenient for selecting pictures, video files and real-time detection. In this regard, the blogger here uses the Caltech-UCSD Birds-200-2011 Dataset to train the YOLOv5 model, and gives a self-designed UI interface, keeping the same simple style of the blogger, and the function is also available Meet the recognition and detection of pictures, videos and cameras, switch models, save results, etc. I hope you can like it. The initial interface is as follows:

insert image description here

        The screenshot of the interface when detecting the type of bird (click the picture to enlarge) is as shown below, which can identify multiple bird types in the screen, and can also enable camera or video detection:

insert image description here

         For the detailed function demonstration effect, please refer to the blogger’s B station video or the animation demonstration in the next section. Friends who think it is good, please like, follow and bookmark! The design workload of the system UI interface is relatively large, and the interface beautification needs to be carefully crafted. If you have any suggestions or opinions, you can comment and exchange them below.


1. Effect demonstration

        The bird detection and recognition system is mainly used for the recognition of bird images in the wild or in daily life scenes, displaying the category, position, number, confidence, etc. of bird targets in the image; images that can be read from pictures and video files, or Birds are identified from the real-time images captured by the camera, and the algorithm model can be replaced; the system interface includes user registration and login functions, which is convenient for users to manage and use; the recognition results are visualized, and the results are displayed in real time and can be marked and displayed one by one. and data display; the screen display window can be zoomed, dragged, and self-adapted, and the results can be saved by clicking the button, which is convenient for subsequent reference and use.

        Whether the software is easy to use or not, appearance is very important. First of all, let’s take a look at the effect of bird recognition through animations. The main function of the system is to recognize birds in pictures, videos and camera images. The detection results are visualized in In the interface and images, functions such as model switching and single target selection are provided, and the demonstration effect is as follows.

(1) User registration and login interface

        A login interface is designed here, where you can register an account and password, and then log in. The interface still refers to the current popular UI design. The left side is a moving picture, and the right side enters the account number, password, verification code, etc.

insert image description here

(2) Select bird picture recognition

        The system allows you to select a picture file for identification. After clicking the picture selection button icon to select a picture, all bird identification results will be displayed. You can view the results of a single bird through the drop-down box. The interface display of this function is shown in the figure below:

insert image description here

(3) Video recognition effect display

        Many times we need to identify the species of birds in a video, here is a video selection function. Click the video button to select the video to be detected, and the system will automatically analyze the video to identify birds frame by frame, and mark the results on the screen, as shown in the following figure:

insert image description here

(4) Camera detection effect display

        In real scenes, we often use the device camera to obtain real-time images, and at the same time need to identify the birds in the images, so this article takes this function into consideration. As shown in the figure below, after clicking the camera button, the system enters the ready state, the system displays the real-time picture and starts to detect the bird in the picture, and the recognition result is displayed as shown in the figure below:

insert image description here
(5) Switch the bird detection model

        The trained detection model can be selected, and the optimized model can be used for detection, which is generally applicable to the pre-trained model of YOLOv5. Here you can freely switch between different models to compare different detection effects.

insert image description here


2. Bird dataset and training

        The bird recognition dataset we use here is the Caltech-UCSD Birds-200-2011 Dataset . The CUB dataset has a total of 200 categories and a total of 11,788 pictures. Each picture includes categories In addition to the label, there is also a marked object frame (Bounding Box), key points and some other attributes, which belong to a bird image dataset with higher fine-grainedness.

insert image description here
        Each species in the CUB dataset is associated with a Wikipedia article and organized by scientific classification (order, family, genus, species), containing 200 bird subcategories, with 5994 images in the training dataset and 5794 in the test set images. Each image provides the image tag information, the bounding box of the bird in the image, the key part information of the bird, and the attribute information of the bird. After downloading the CUB data and decompressing it, you will get the following folder

insert image description here
        Since the annotation file of the CUB dataset is inconsistent with the format of YOLO, here we select the category in the CUB dataset and convert it to the YOLO format. For the code of label format conversion, please refer to the blog: CUB_200_2011 dataset to Yolo format , and finally we get the dataset file in YOLO format and perform model training.

        Before training the model, in order for our data to be found by YOLO, we need to write a data.yaml file and store it in the case directory, and record the path of the data and the tag category to be recognized by the model in it. The content of the file is as follows. YOLO reads the data.yaml file in the directory, and then finds the location where our data set is stored to read the data for training and verification.

train: F:\BlogCode\BirdDet\Bird\train.txt # 训练集
val: F:\BlogCode\BirdDet\Bird\test.txt    # 验证集
nc: 36   # 训练的类别
names: ['Acadian_Flycatcher','American_Crow','American_Goldfinch','American_Pipit',
'American_Redstart','American_Three_toed_Woodpecker','Anna_Hummingbird','Artic_Tern','Baird_Sparrow','Baltimore_Oriole',
'Bank_Swallow','Barn_Swallow','Bay_breasted_Warbler','Belted_Kingfisher',
'Bewick_Wren','Black_Tern','Black_and_white_Warbler','Black_billed_Cuckoo','Black_capped_Vireo','Black_footed_Albatross','Black_throated_Blue_Warbler',
'Black_throated_Sparrow','Blue_Grosbeak','Blue_Jay','Blue_headed_Vireo','Blue_winged_Warbler','Boat_tailed_Grackle',
'Bobolink','Bohemian_Waxwing','Brandt_Cormorant','Brewer_Blackbird','Brewer_Sparrow','Bronzed_Cowbird',
'Brown_Creeper','Brown_Pelican','Brown_Thrasher']

        The training model is carried out by calling train.py under the model folder, and the training batch size and the number of training rounds can be adjusted through the –batch parameter and –epochs parameter. YOLOv5 provides pre-trained parameters on the COCO dataset. We can load the pre-trained parameters through the parameter –weights yolov5s.pt for migration learning, or use an empty –weights '' when training a large dataset (such as COCO) The parameters are trained from scratch. Then set various parameters, the code is as follows:

parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='./weights/best.pt',
                    help='model.pt path(s)')  # 模型路径仅支持.pt文件
parser.add_argument('--img-size', type=int, default=480, help='inference size (pixels)')  # 检测图像大小,仅支持480
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')  # 置信度阈值
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')  # NMS阈值
# 选中运行机器的GPU或者cpu,有GPU则GPU,没有则cpu,若想仅使用cpu,可以填cpu即可
parser.add_argument('--device', default='',
                    help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--save-dir', type=str, default='inference', help='directory to save results')  # 文件保存路径
parser.add_argument('--classes', nargs='+', type=int,
                    help='filter by class: --class 0, or --class 0 2 3')  # 分开类别
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')  # 使用NMS
opt = parser.parse_args()  # opt局部变量,重要
out, weight, imgsz = opt.save_dir, opt.weights, opt.img_size  # 得到文件保存路径,文件权重路径,图像尺寸
device = select_device(opt.device)  # 检验计算单元,gpu还是cpu
half = device.type != 'cpu'  # 如果使用gpu则进行半精度推理

model = attempt_load(weight, map_location=device)  # 读取模型

        We can enter the following command in the terminal to train, of course, we can also directly click train.py to run.

python train.py --batch 32 --epochs 300 --data data.yaml --weights yolov5s.pt --hyp data/hyps/hyp.scratch-med.yaml --cache

        In deep learning, we usually observe the model training situation through the curve of the loss function decline. The YOLOv5 training mainly includes three aspects of loss: rectangular box loss (box_loss), confidence loss (obj_loss) and classification loss (cls_loss). After the training is over, we can also find some training processes in the logs directory. summary graph. The figure below shows the model training curve for bloggers training bird recognition.

insert image description here
        During our training process, mAP50, as a commonly used target detection evaluation indicator, quickly reached a high level, and mAP50:95 also continued to improve during the training process, indicating that our model performed well from the perspective of training-validation. Read in a test folder for prediction, select the weight best.pt with the best effect on the verification set obtained through training to conduct experiments, and obtain the PR curve as shown in the figure below.

insert image description here


3. Bird detection and identification

        After the training is completed, the best model is obtained. Next, we input the frame image to the network for prediction, so as to obtain the prediction result. The code of the prediction method (predict.py) part is as follows:

def predict(img):
    img = torch.from_numpy(img).to(device)
    img = img.half() if half else img.float()
    img /= 255.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    t1 = time_synchronized()
    pred = model(img, augment=False)[0]
    pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes,
                               agnostic=opt.agnostic_nms)
    t2 = time_synchronized()
    InferNms = round((t2 - t1), 2)

    return pred, InferNms

        After getting the prediction result, we can frame the bird in the frame image, and then use opencv drawing operation on the picture to output the category of the bird and the prediction score of the bird. The following is a script for reading and detecting a bird picture. First, the picture data is preprocessed and sent to predict for detection, and then the position of the marked frame is calculated and marked in the picture.

if __name__ == '__main__':
    img_path = "./UI_rec/test_/Bobolink_0079_10736.jpg"
    image = cv_imread(img_path)
    img0 = image.copy()
    img = letterbox(img0, new_shape=imgsz)[0]
    img = np.stack(img, 0)
    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
    img = np.ascontiguousarray(img)

    pred, useTime = predict(img)

    det = pred[0]
    p, s, im0 = None, '', img0
    if det is not None and len(det):  # 如果有检测信息则进入
        det[:, :4] = scale_coords(img.shape[1:], det[:, :4], im0.shape).round()  # 把图像缩放至im0的尺寸
        number_i = 0  # 类别预编号
        detInfo = []
        for *xyxy, conf, cls in reversed(det):  # 遍历检测信息
            c1, c2 = (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3]))
            # 将检测信息添加到字典中
            detInfo.append([names[int(cls)], [c1[0], c1[1], c2[0], c2[1]], '%.2f' % conf])
            number_i += 1  # 编号数+1

            label = '%s %.2f' % (names[int(cls)], conf)

            # 画出检测到的目标物
            plot_one_box(image, xyxy, label=label, color=colors[int(cls)])
    # 实时显示检测画面
    cv2.imshow('Stream', image)
    # if cv2.waitKey(1) & 0xFF == ord('q'):
    #     break
    c = cv2.waitKey(0) & 0xff

        The result of the execution is shown in the figure below. The species and confidence values ​​of the birds are marked in the figure, and the prediction speed is fast. Based on this model, we can design it as a system with an interface, select a picture, video or camera on the interface and then call the model for detection.

insert image description here
        The blogger conducted a detailed test on the entire system, and finally developed a version with a smooth and refreshing interface, which is the display of the demo part of the blog post, complete UI interface, test picture video, code files, and Python offline dependency package (easy to install and run, but also You can configure the environment by yourself), all of which have been packaged and uploaded, and interested friends can obtain them through the download link.

insert image description here


download link

        If you want to obtain the complete and complete program files involved in the blog post (including test pictures, videos, py, UI files, etc., as shown in the figure below), they have been packaged and uploaded to the blogger’s Bread Multi-platform. See blogs and videos for reference. Package all the involved files into it at the same time, and click to run. The screenshot of the complete file is as follows:

insert image description here

    The resources under the folder are displayed as follows, which includes the offline dependency package of Python. Readers can install Anaconda and Pycharm software after installing them correctly. The detailed demonstration can also be seen in my B station video.

insert image description here

Note : This code is developed with Pycharm+Python3.8, and it can run successfully after testing. The main programs of the running interface are runMain.py and LoginUI.py. The test picture script can run testPicture.py, and the test video script can run testVideo.py. To ensure that the program runs smoothly, please configure the version of the Python dependency package according to requirements.txt. Python version: 3.8 , do not use other versions, see the requirements.txt file for details; download the complete file of the project, please refer to the reference blog or reference video : ➷➷➷

Reference blog post: https://zhuanlan.zhihu.com/p/612187570

Reference video demonstration: https://www.bilibili.com/video/BV1QL411C783/

Offline dependency library download link : https://pan.baidu.com/s/1hW9z9ofV1FRSezTSj59JSg?pwd=oy4n (extraction code: oy4n)


conclusion

        Due to the limited ability of the blogger, even if the method mentioned in the blog post has been tested, it is inevitable that there will be omissions. I hope you can enthusiastically point out the mistakes, so that the next revision can be presented to everyone in a more perfect and rigorous manner. At the same time, if there is a better way to achieve it, please let me know.

Guess you like

Origin blog.csdn.net/qq_32892383/article/details/129326740