Use PaddleDetection to make your own image prediction project (2) Camera detection to obtain coordinates

I used to get the hardware and 3D printing pigeons for a long time, now I continue to use the instructions of PaddleDetection last time.

The purpose of this article is to call the camera for target detection, so opencv should be used.
It is worth reminding that due to the version change, if you install opencv in python and install it with pip, it is pip install opencv-python instead of pip install opencv,

And when referencing the package, it should be written as import cv2 instead of import opencv.

Step 1 : prepare weight files

In the previous blog, we have trained the weight file, as shown in the figure below (the name is best_model, the selected weight file will have different names, and the output model folder in the previous blog is inference_model)
Insert picture description here
Now we need to correct These three weight files are processed again. Here we need to call the export_model.py file in the tools folder. The
command is as follows:

python tools/export_model.py -c configs/yolov3_mobilenet_v1_fruit.yml -o weights=inference_model/yolov3_mobilenet_v1_fruit

Among them: (Remember to add the name of the weight file to be processed!) Insert picture description here
These three files will be output afterwards. The Insert picture description here
default output path is output\yolov3_mobilenet_v1_fruit where (yolov3_mobilenet_v1_fruit is the folder named by the name of the model you choose)
to output the specified location Add --output_dirthe path you want to output after the command line

It is worth noting that in the three files obtained, use Notepad to open and see if
the label in infer_cfg.yml is your own label. If not, change it.

I have a situation where the program uses the default label that comes with the code (such as bicycle, the specific reason is that the part of the code that handles the label is wrong, and the system defaults to the previous preset value. As for the wrong part , I’m too lazy to find it recently. Let’s just change the output infer_cfg.yml file, the pro-test effect is the same)

The second part, modify the infer file

Before modifying the infer file, we can use it to input video (MP4 avi and the like) to see if we can output a video file with a marked location to verify the accuracy of the weight file (usually your image prediction is OK, video Then there is no problem)

python deploy/python/infer.py --model_dir=output/yolov3_mobilenet_v1_fruit/ --video_file=../../work/test.mp4 --use_gpu=True  --thresh=0.2

Among them:
--model_dir= followed by the path of the three files you got in the first step

--Video_file= followed by the path of the video file you are testing (note the problem of relative path and absolute path, it is recommended to use absolute path)

--Use_gpu=True and False determine whether to enable GPU computing to help speed up

--Thresh= followed by the recognition limit threshold, the higher the higher the similarity, the higher the similarity can be recognized, it is recommended to be about 0.2
(oh, suddenly remember, if it is an old iron in the linux environment, remember to add it to the call command Add an exclamation mark in front!
Otherwise, the permissions may not be enough, and it will not run, you have to add it in Baidu al studio)

After the above output prediction video is no problem, it is recommended to back up the infer file, and then we can start to modify it.
According to the following part, we will modify the default video_file parameter from string type to integer, and set the default value to 0 (I Remember that the external camera is 0, and the built-in is 1)
Insert picture description here
Then, we find the source program part and modify it as follows. Insert picture description here
Due to our own project exploration, there are many places to change the program. Ignore the number of lines on the left
. It is the same as the picture above. Find the video_file variable, the actual modification is the predict_video function (method)

In fact, it is simply to replace the path of the video stream with the camera. Anyway, replace the original with the following predict_video after my modification.

def predict_video():
    print("predict_video")
    detector = Detector(
        FLAGS.model_dir, use_gpu=FLAGS.use_gpu, run_mode=FLAGS.run_mode)
    print("打印下")
    print(FLAGS.video_file)
    capture = cv2.VideoCapture(FLAGS.video_file)
    capture.set(3, 640)
    capture.set(4, 480)  # 360
    # capture.set(3, 320)
    # capture.set(4, 180)
    # capture.set(3, 1280)
    # capture.set(4, 720)
    # capture.set(3, 160)
    # capture.set(4, 90)
    # fps = 30
    # width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
    # height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
    # fourcc = cv2.VideoWriter_fourcc(*'mp4v')

    index = 1
    while (1):
        ret, frame = capture.read()
        if not ret:
            break
        print('detect frame:%d' % (index))
        index += 1
        results = detector.predict(frame, FLAGS.threshold)  # 坐標得到..............

        if (results == 0):

            im = frame

        else:
            im = visualize_box_mask(
                frame,
                results,
                detector.config.labels,
                mask_resolution=detector.config.mask_resolution)
            im = np.array(im)
        cv2.imshow("capture", im)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

The command invoked is

python deploy/python/infer.py --model_dir==这里填存放之前转出来的三个文件的文件夹的路径   --thresh=0.2

At this time, you can actually see the video image on the running screen. As shown in the
figure: Insert picture description here
then some warm tips, the commented part of the code can replace the used part to change the resolution and frame number of the video.
I want to recognize video recording, then save the modified reference code here a few lines (the blogger very detailed written comments) portal

Finally, I will write a little bit about obtaining the coordinate data of the recognized object. I compared the printed data with the marked picture and found that the coordinate data here is placed in results["boxes"]
as follows:
In paddledetection, the output value of the dictionary results["boxes"][0][1] is the similarity, and the rest are the two coordinate points of the picture mark, namely the upper left corner and the lower right corner, through these two coordinates Click to mark and select the objects in the video stream.The
following code is the method to extract the similarity and coordinate points.

 sim1 = results["boxes"][0][1]  # 相似度0.8213138,
                left_up_x1 = results["boxes"][0][2]  # x轴,左上(左上为零点)10.143881   ,
                right_down_x1 = results["boxes"][0][4]  # x轴,左上(左上为零点)44.352264
                left_up_y1 = results["boxes"][0][3]  # y轴,右下(左上为零点)294.28348
                right_down_y1 = results["boxes"][0][5]  # y轴,右下(左上为零点)328.0142

Oh, remember to like

Guess you like

Origin blog.csdn.net/weixin_43134049/article/details/108170667