Inside the darknet yolo-53 model

I don't want to be a headline party, but I still feel very mysterious when I first started to study deep learning!

Like the title, I found a rule after outputting the pictures of different layers in the yolov3 model prediction process. As shown in Figure 1.2.3. below, I predict three pictures are dog.jpg, person.jpg and a picture of a crowd. These pictures on the last layer show obvious differences! They are 106-172 and 106-173 respectively, and 106-87 and 106-88.

222

 

The code used is also very simple, add the following code to the image prediction function, int i=100 can output the intermediate picture after 100 layers, if it is changed to int i=0, all the pictures in the process are output. It outputs more than 50 M and 37629 pictures at a time, which is quite a lot. I think I found a small rule for your research and inspiration. Why is this model always reflected in the four pictures on the last layer? It is worthwhile for us to delve into the internal weight of the model to study its formation process.

for (int i = 100; i < net.n; ++i) {        
            layer l = net.layers[i];
            printf(" --net------->%d  net.n= %d l.w= %d l.h= %d l.c= %d \n", i, net.n, l.w, l.h, l.c);        
            layer lfront = net.layers[i-1];
            if (l.c*lfront.c>0)
            {
                for (int j = 0; j < l.c; j++)
                {
                    image out = make_empty_image(l.w, l.h, 1);
                    out.data = calloc(l.h*l.w, sizeof(float));
                    int tmpindex = l.w*l.h*j;
                    for (int k = 0; k < l.w*l.h;k++) {
                        out.data[k] = lfront.output[tmpindex + k];
                    }                                        
                    char str[20];
                    sprintf(str, "%d%s%d",i,"_",j);
                    printf("image--%d  im.w= %d im.h= %d savename= %s  \n", j, l.w, l.h, str);                
                    save_image(out, str);
                    free_image(out);

                }
            }        
        }

The first layer picture is shown in Figure 4

 

It is quite interesting, and this discovery is for everyone to study and use. Welcome to QQ exchange: 329611847

Add some personal opinions:

The biggest advancement of the current CNN-based deep learning is to superimpose and analyze discrete pixels, to superimpose and combine the space, and then realize the perception of block shape, which is a great progress. It should be noted that the entire image recognition is still a supervised classification method. If this continues, it will really become artificial intelligence. These technologies are sufficient at this stage. However, this manual training lacks the improvement of the logic ability of the machine, so it falls into the trap of yolo9000. What is taught depends entirely on training.

If you want to develop further, you must not rely entirely on supervised training, but design a semi-supervised learning method. The first choice is to teach the machine to recognize some features, and then design a logical dynamic learning mechanism to help the machine establish related concepts, so as to get the greatest generalization , Summarize the cognitive effect. These dynamic learning mechanisms require artificial design of the main mathematical models, and I personally do not have enough cognitive grasp at present.

But I can share with you some personal thoughts. That is the dynamic online random forest classification method. I suggest you look at the relevant literature. In short, the random forest method can use fewer samples to complete the classification, and it can also learn incrementally, that is, 10,000 samples, and the final effect of full learning is basically the same as that of 100 learning 100 times each time. This mechanism allows for staged learning of a concept, and the more you learn, the higher the accuracy, which conforms to the law of continuous learning and cognition.

Another is the dynamic template memory method, which is similar to the concept of LSTM networks, that is, how to learn key features and have a launch mechanism. Cognition is a process, for example, every thing has different stages, such as morning, noon, and evening. , These dynamic templates need to have timing memory mechanism.

In general, there are some ideas. In deep learning, thinking must be innovative. Can not be limited to the current achievements.

 

Guess you like

Origin blog.csdn.net/sun19890716/article/details/84336638