[Debug perception] Solve bugs with the idea of BFS (breadth-first traversal)

Zero. Foreword

This is the author's temporary perception when debugging in the middle of the night. Because the debugging process this time is relatively rough, it is worth summarizing.

In this blog, you will not gain technical knowledge, but you may get inspiration from the debug working layer. This may help advance your code work in other ways.

1. Debug problem description and logical reasoning

1.1 Problem Description

When the author deployed the deep learning system to the production environment, some problems occurred.

Figure 1: The server tried to build on the windows system and the 3D reconstruction results Figure 1: The server tried to build on the windows system and the 3D reconstruction resultsFigure 1: The server and 3D reconstruction results tried to build on the win do w s system

Figure 2: The server tried to build on the Linux system and the 3D reconstruction results Figure 2: The server tried to build on the Linux system and the 3D reconstruction resultsFigure 2: The server and 3D reconstruction results tried to build on the linux system

It is not difficult to see from here: the two 3D reconstruction results are very different , the former is still barely readable, but the latter is difficult to see.

Why is this so?

1.2 Find the cause

The process of 3D reconstruction is to extract the outline and reconstruct it according to the output mask. If there is a problem with the reconstruction result, there must be a problem with the output mask sequence.

So the author then compared the masks output by the two environments of windows and linux:

Figure 3: Masks image sequence output in two system environments Figure 3: Masks image sequence output in two system environmentsFigure 3: ma s k s image sequence output in two system environments

  • The figure above is the sequence of masks written and output by the windows system, from mask_0.pngtomask_19.png
  • The figure below is the mask sequence written and output by the linux system, and the pictures are also named from mask_0.pngtomask_19.png

After careful comparison, it is found that the sequence of windows is basically normal, while the sequence of masks images of linux, although the prediction result is indeed correct with the above picture, but there is a problem with the output sequence .

The picture below mask_0.pngcorresponds to the picture above mask_8.png, why is this so?

1.3 Reason revealed

In fact, it is a problem of linux file sorting.

When the windows system reads files, it uses os.listdir()functions to sort them according to the default order, and the sorting method of windows is the name of the files.

However, the file sorting method of the linux system is based on the creation time (modification time) of the fileos.listdir() . That is to say, the final read sequence of the function in linux may be like this:

images: ['mask_8.png', 'mask_0.png', 'mask_5.png', 'mask_3.png', ..., 'mask_17.png']

That is, in random order. Then during inference, in the function in the dataset called __getitem__, the reading scheme must be written:

self.image_names = os.listdir(image_folder)
self.image_names.sort(key=lambda x: int(x.split(".")[0].split("_")[2]))

That is to add a sentence below to sort the list so that the subsequent function of reading photos can be read in order.

Output in standard order is only possible if read sequentially.

2. Debug experience and reflection

2.1 Overview

This answer looks very simple, and it is actually very simple, but how did the author think about it, this is worth recording, and it is a milestone node in the author's debugging career.

When we came into contact with a problem, such as this problem, we were at a loss at first. We didn't know what caused the sequence of output images to be problematic, because the system was relatively large.

Is there a problem with the way of cv2.imwrite() when outputting? Or is there a problem with the logic of the intermediate processing? Or is it a problem when reading? What are the root causes of these major problems?

Once the code system is huge, there will be more problems, because it is really unknown which link has a problem, in other words: any link may have a problem

2.2 The method of cracking

The process of solving bugs must be a gradual and logical process. **He will not come out in a mysterious process of going around and going around. Because the code itself is a logical work, to solve this problem, you must rely on a logical method.

Recalling the author's process of solving the whole problem, I went from the initial confusion to the subsequent clarification of ideas, progressively, layer by layer, constantly confirming which places must be no problem and which places may have problems , and I reasoned out layer by layer, Finally found the problem.

Here I want to simply take the idea of ​​​​solving this bug as an example to prove the above argument:

  1. It is found that the output of the two operating systems is different, and it is suspected to be a problem with the operating system
  2. It is found that the image sequences output by the two systems are different. It may be the last cv2.imwrite('mask_{k}.png', xxx), etc. There is a problem with the function of the final output image
  3. After inspection, it is found kthat it is only an index, and there is no problem with the method of determining this index, which means that it is not a problem at the output layer, and we need to look further (this step is actually affirming something and negating something, this affirmation and negation It is very important to decide whether you can continue to deduce )
  4. Investigate whether it is a problem that arises during reasoning
  5. Strictly check the reasoning process and find that there is no problem with the reasoning code, then go forward to see if it is a problem with the loading of the dataset dataset
  6. Find that there may be problems with data set loading, just in os.listdir()the function, which may cause disorder
  7. Sure enough, this function is out of order on the linux system, the input images are out of order, and the output will definitely not be in the correct order, so I correct it.

So far, the problem has been found and a solution has been made.

Looking back at the whole process, I found that the debugging process seems to have inadvertently used BFSthe idea of ​​breadth-first traversal in commonly used algorithms.

To check the code content layer by layer, the key is what is affirmed and what is denied.

This process of affirming and denying 剪枝is similar to what is often said in algorithms.

At this point, a few things seem to strung together:

  • acm common algorithm
  • debug and logic
  • Debug troubleshooting plan and yes or no plan (pruning)
  • Project code construction and architecture

Some of the things learned in the past are gathered together at this moment.

After this, the practice in the code has risen to a higher level.

Guess you like

Origin blog.csdn.net/Samurai_Bushido/article/details/130278981