[Target detection series] yolov3, yolov4 train their own data (pytorch version)

 

Main content: Part of the theory. How to prepare your own data set and train predictions. Call the trained model method in vs2017 under windows.

PS: yolov3 and yolov4 are unified, the difference is that they modify their cfg files and load the pre-trained network. The training process found that the v4 GPU is fluctuating, v3 is stable at a value, v4 is much smaller than the batchsize setting of v3, and training is slower.

1. Code address :

https://github.com/ultralytics/yolov3

2. Principle: Mainly talk about the network structure, because if you don't look at this, it doesn't correspond to the code. The network output is still a picture. The red part can be regarded as the backbone of the network, such as darknet53 , the blue part is the output result on the 13*13 layer, the orange part is the output result under the 26*26 layer, and the green part is the 52*52 layer. The output result. The origin of this 13*13. Original image 416*416 zoomed 32 times and that's it. This picture corresponds to many parameters in the cfg file one-to-one. For the corresponding relationship, see https://blog.csdn.net/gbz3300255/article/details/106255335 . Mainly focus on input and output, because it needs to correspond to the code

The reason there are three color circles is for multi-scale detection. Three times of detection, the corresponding receptive field is different each time, the receptive field of 32 times downsampling is the largest, suitable for detecting large targets , 16 times is suitable for general-sized objects , and the receptive field of 8 times is the smallest, suitable for detecting small targets. The specific anchor box size is set in the cfg file. As shown in the figure below, the red color indicates the anchor center.

How to calculate the loss function, and how to make the a priori box will not be explained. Only input and output here.

The input and output of the above code are as follows

Input: Suppose it is a 416*416*3 image. (The default input size program is 320*640, pay attention when training your own data)

Output: [1, (13*13 +26*26 + 52*52)*3, 85] one dimension of data

(13*13 +26*26 + 52*52 *3) What is it? How many detection centers are there in total? Multiply by 3, there are 3 types of prior boxes for each center. Then (13*13 +26*26 + 52*52)*3 means that there are so many check box results.

What is 85. It is the dimension of the feature value of a point on the 13*13 or 26*26 or 512*512 feature map above. How does this dimension come about? The network detection target has 80 categories, so the detection box corresponding to the point has 80 probability values, which correspond to the credibility of each category, and the detection box of each point has 4 values ​​about the position of the frame. (X, y, w, h), and 1 confidence level of this box. Then the eigenvalue of the box corresponding to this point is (1 + 4 + 80) = 85 dimensions.

Lorry, it's so much because we need to change this value for our own data training in the cfg file.

3. Steps to train your own data set:

1. Prepare the data set: first understand what the data set required by yolov3 looks like. To put it bluntly, it means that a picture corresponds to a label file. A pile of images and a corresponding pile of files form an image data set and a label data set. The label data set name corresponds to the image name one-to-one. The content of the label data set is: category label target frame x center, target frame y center, target frame width value, target frame height value . Note that the previous category number is directly 0 1 2 3 4.... and so on, the latter value is the floating point value divided by the width or height. The picture below is super practical.

Because in order to get started quickly and see the effect. Download the ready-made data set directly. I use the CCTSDB data set. Use the program to read out the box and write it into the form in the man007.txt text in the figure above.

The code is inconvenient to post, it is very simple to talk about the function. Read the CCTSDB data set, read each picture, and read the corresponding json file, read out the category and the box, the category is numbered as 0 1 2 .., the box data is calculated according to the above figure, and it is written into a column. Such as:

0 0.669 0.5785714285714286 0.032 0.08285714285714285

1.1 Prepare the text file:  train.txt test.txt val.txt lables text file

train.txt, record the name of the image under the data set, similar to this, the data set image is stored in the /data/images/ directory.

BloodImage_00091
BloodImage_00156
BloodImage_00389
BloodImage_00030
BloodImage_00124
BloodImage_00278
BloodImage_00261

test.txt, same as the surface format, the content is the file name of the graph to be tested

BloodImage_00258
BloodImage_00320
BloodImage_00120

val.txt, the same as the face format, the content is the file name of the image in the verification set

BloodImage_00777
BloodImage_00951

Lables type text, each image in images corresponds to a text about labels, in the form as follows, and the name is similar to BloodImage_00091.txt.

0 0.669 0.5785714285714286 0.032 0.08285714285714285

The lables text is unified in /data/lables/ of the above code

1.2 Prepare the rbc.data file, take the file name whatever you want, remember to input the program according to this file name when inputting the parameters, the content is as follows,

The first is the number of categories, the following are the paths of the pictures participating in the training, the pictures participating in the test, and the text path of the name of each category.

classes=4
train=data/train.txt
valid=data/test.txt
names=data/rbc.names
backup=backup/
eval=coco

1.3 Prepare the rbc.names file, take the file name whatever you want, remember to input the program according to this file name when inputting the parameters, the content is as follows.

Four types of types, if you are lazy, just write a, b, c, d and change it according to your own category.

a
b
c
d

 

1.4 Prepare image data, put training images into images, and test images into samples. The figures in the images correspond to the text in the labels one-to-one.

The final storage structure is similar to this, under the data folder.

                                                         

2. Modify the cfg file: Determine which model to use and then modify which cfg file. For example, if I use yolov3 for training, then go to the cfg folder to find yolov3.cfg and modify it. I only modified the number of categories and filters. Value, because filters are related to the number of categories. Looking at the network structure of yolov3, we can see that there are 3 modifications required. Others, such as the size of the anchors, etc. If the original frame is significantly different from the target to be detected, it is recommended to re-cluster and calculate a set of anchors.

classes = 4


#filters=3 * (5 + classes )
filters= 27  #3 * (5 + 4)

Modify the anchors. If the content of your training set is different from the image, you must modify it. The code for calculating anchors is as follows. Quote Great God Code

# -*- coding: utf-8 -*-
import numpy as np
import random
import argparse
import os
#参数名称
parser = argparse.ArgumentParser(description='使用该脚本生成YOLO-V3的anchor boxes\n')
parser.add_argument('--input_annotation_txt_dir',required=True,type=str,help='输入存储图片的标注txt文件(注意不要有中文)')
parser.add_argument('--output_anchors_txt',required=True,type=str,help='输出的存储Anchor boxes的文本文件')
parser.add_argument('--input_num_anchors',required=True,default=6,type=int,help='输入要计算的聚类(Anchor boxes的个数)')
parser.add_argument('--input_cfg_width',required=True,type=int,help="配置文件中width")
parser.add_argument('--input_cfg_height',required=True,type=int,help="配置文件中height")
args = parser.parse_args()
'''
centroids 聚类点 尺寸是 numx2,类型是ndarray
annotation_array 其中之一的标注框
'''
def IOU(annotation_array,centroids):
    #
    similarities = []
    #其中一个标注框
    w,h = annotation_array
    for centroid in centroids:
        c_w,c_h = centroid
        if c_w >=w and c_h >= h:#第1中情况
            similarity = w*h/(c_w*c_h)
        elif c_w >= w and c_h <= h:#第2中情况
            similarity = w*c_h/(w*h + (c_w - w)*c_h)
        elif c_w <= w and c_h >= h:#第3种情况
            similarity = c_w*h/(w*h +(c_h - h)*c_w)
        else:#第3种情况
            similarity = (c_w*c_h)/(w*h)
        similarities.append(similarity)
    #将列表转换为ndarray
    return np.array(similarities,np.float32) #返回的是一维数组,尺寸为(num,)
 
'''
k_means:k均值聚类
annotations_array 所有的标注框的宽高,N个标注框,尺寸是Nx2,类型是ndarray
centroids 聚类点 尺寸是 numx2,类型是ndarray
'''
def k_means(annotations_array,centroids,eps=0.00005,iterations=200000):
    #
    N = annotations_array.shape[0]#C=2
    num = centroids.shape[0]
    #损失函数
    distance_sum_pre = -1
    assignments_pre = -1*np.ones(N,dtype=np.int64)
    #
    iteration = 0
    #循环处理
    while(True):
        #
        iteration += 1
        #
        distances = []
        #循环计算每一个标注框与所有的聚类点的距离(IOU)
        for i in range(N):
            distance = 1 - IOU(annotations_array[i],centroids)
            distances.append(distance)
        #列表转换成ndarray
        distances_array = np.array(distances,np.float32)#该ndarray的尺寸为 Nxnum
        #找出每一个标注框到当前聚类点最近的点
        assignments = np.argmin(distances_array,axis=1)#计算每一行的最小值的位置索引
        #计算距离的总和,相当于k均值聚类的损失函数
        distances_sum = np.sum(distances_array)
        #计算新的聚类点
        centroid_sums = np.zeros(centroids.shape,np.float32)
        for i in range(N):
            centroid_sums[assignments[i]] += annotations_array[i]#计算属于每一聚类类别的和
        for j in range(num):
            centroids[j] = centroid_sums[j]/(np.sum(assignments==j))
        #前后两次的距离变化
        diff = abs(distances_sum-distance_sum_pre)
        #打印结果
        print("iteration: {},distance: {}, diff: {}, avg_IOU: {}\n".format(iteration,distances_sum,diff,np.sum(1-distances_array)/(N*num)))
        #三种情况跳出while循环:1:循环20000次,2:eps计算平均的距离很小 3:以上的情况
        if (assignments==assignments_pre).all():
            print("按照前后两次的得到的聚类结果是否相同结束循环\n")
            break
        if diff < eps:
            print("按照eps结束循环\n")
            break
        if iteration > iterations:
            print("按照迭代次数结束循环\n")
            break
        #记录上一次迭代
        distance_sum_pre = distances_sum
        assignments_pre = assignments.copy()
if __name__=='__main__':
    #聚类点的个数,anchor boxes的个数
    num_clusters = args.input_num_anchors
    #索引出文件夹中的每一个标注文件的名字(.txt)
    names = os.listdir(args.input_annotation_txt_dir)
    #标注的框的宽和高
    annotations_w_h = []
    for name in names:
        txt_path = os.path.join(args.input_annotation_txt_dir,name)
        #读取txt文件中的每一行
        f = open(txt_path,'r')
        for line in f.readlines():
            line = line.rstrip('\n')
            w,h = line.split(' ')[3:]#这时读到的w,h是字符串类型
            #eval()函数用来将字符串转换为数值型
            annotations_w_h.append((eval(w),eval(h)))
        f.close()
        #将列表annotations_w_h转换为numpy中的array,尺寸是(N,2),N代表多少框
        annotations_array = np.array(annotations_w_h,dtype=np.float32)
    N = annotations_array.shape[0]
    #对于k-means聚类,随机初始化聚类点
    random_indices = [random.randrange(N) for i in range(num_clusters)]#产生随机数
    centroids = annotations_array[random_indices]
    #k-means聚类
    k_means(annotations_array,centroids,0.00005,200000)
    #对centroids按照宽排序,并写入文件
    widths = centroids[:,0]
    sorted_indices = np.argsort(widths)
    anchors = centroids[sorted_indices]
    #将anchor写入文件并保存
    f_anchors = open(args.output_anchors_txt,'w')
    #
    for anchor in  anchors:
        f_anchors.write('%d,%d'%(int(anchor[0]*args.input_cfg_width),int(anchor[1]*args.input_cfg_height)))
        f_anchors.write('\n')

The execution statement is as follows:

python kmean.py --input_annotation_txt_dir data/labels --output_anchors_txt 123456.txt --input_num_anchors 9 --input_cfg_width 640 --input_cfg_height 320

The resulting file is as follows:

12,15
14,20
18,25
24,32
24,18
33,44
39,28
59,49
115,72

Write it in cfg

3. Modify the code:

There are more pitfalls here. For example, it writes a hyperparameter list in train.py, and some value configurations in cfg will not take effect.

I want to change the batch size in the train.py file. . . . Default 16...

parser.add_argument('--batch-size', type=int, default=16)  # effective bs = batch_size * accumulate = 16 * 4 = 64

There are many others, such as whether it is a single target detection or a multi-target detection setting.

parser.add_argument('--single-cls', action='store_false', help='train as single-class dataset')

The optimization method chooses sgd or adam

parser.add_argument('--adam', action='store_true', help='use adam optimizer')

Learning rate, if you use adam, you will find that if you see the training effect loss is very slow, and it seems to be large all the time, just turn on this line and lower the learning rate.

#hyp['lr0'] *= 0.1  # reduce lr (i.e. SGD=5E-3, Adam=5E-4)

Key question: Some settings are not effective after being set in the above code. . . .

For example, it defaults to single-target detection. I changed it to multi-target detection. The training was still incorrect. Later, I found that the parameter value did not take effect, and finally forced it to take effect. . . .

4. Training: Change the input at your own discretion

python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --epochs 2000

The share training is interrupted, it will automatically save a weight file or transfer one by yourself, and then put it into the weight folder. The code load_darknet_weights is the part of the next training. It reads the weight file of the last training, and then continues training. .

elif len(weights) > 0:  # darknet format
        # possible weights are '*.weights', 'yolov3-tiny.conv.15',  'darknet53.conv.74' etc.
        load_darknet_weights(model, weights)

5. The most common mistakes in training :

To explode the video memory, the modification method is to reduce the size of batchsize.

The pre-training model is loaded inside, and you can download a path and place it according to the parameter settings. For example, the yolov3.weights file corresponding to yolov3

6. Forecast:

python detect.py --cfg cfg/yolov3-tiny.cfg --weights weights/best.pt

The prediction result is obtained by the following sentence, after the non-maximum value suppression is done

pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres,multi_label=False, 
                           classes=opt.classes, agnostic=opt.agnostic_nms)

Just find this sentence in the code. For example, if there are 2 targets in the figure, pred is a list, and the 1*6 array is the test result. The first 4 digits store the coordinates, the fifth digit stores the confidence level, the sixth digit stores the category number, and the form is as follows: Obviously they all belong to the first category... The coordinates should be noted, they are needed on the zoom It corresponds to the original picture

tensor([[ 74.13127, 203.66556, 103.19365, 216.29456,   0.72875,   1.00000],
        [255.31650, 123.80228, 284.67999, 136.11815,   0.61970,   1.00000]],
        device='cuda:0')

The box box of the prediction result will be enlarged to the size of the original image, and the sentence is as follows. The storage format is (x0, y0, x1, y1) upper left corner coordinates and lower right corner coordinates.

 det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

 

Predict the pits encountered: work with two computers, one for training, one for testing, and the result is an error:

RuntimeError: Error(s) in loading state_dict for Darknet

Reason: errors introduced by different versions of pytorch

Modification method: modify the loading model part in detect.py, the principle did not go to deep learning, refer to the great god's practice

将model.load_state_dict(torch.load(weights, map_location=device)['model'])
改为:
model.load_state_dict(torch.load(weights, map_location=device)['model'], False)

7. Isn't it cool to use opencv to tune up?

1. See reference 5 for details. I will carry it. There is an error in it. If you use it directly, you will find that, hehe, the reliability of the test results is quite high. What is the box? O(∩_∩)O , Has been modified . The result of the above training is best.pt, and the following vs2017 project is the called .weights file. There is a conversion method code, there is a save_weights function under modle.py, you can use it to convert directly. After the conversion I set, best.pt becomes converted.weights. The remaining ones are absolute paths, so you can read the specific files for yourself. Add this at the end of modle.py to convert the pt file into a weights file.

if __name__ == '__main__':
	cfg='cfg/yolov3.cfg'
	weights='last136.pt'
	model = Darknet(cfg)
	if weights.endswith('.pt'):  # if PyTorch format
		model.load_state_dict(torch.load(weights, map_location='cpu')['model'], False)
		save_weights(model, path='converted.weights', cutoff=-1)
		print("Success: converted '%s' to 'converted.weights'" % weights)

2. Note that the following method of scaling the image is inconsistent with the method of yolov3. This should be changed. I am lazy and didn't change it. . . The result of no change is that the target cannot be detected . I used the 1280*720 picture and scaled it to 512. The result is that the scaled picture is exactly 512*288, which is a multiple of 32. Keep in mind that the purpose of zooming is to zoom the length and width of the image to multiples of 32, and the original image cannot be changed (deformation is not allowed). Generally, you need to zoom in rows or columns, and then fill in one of the directions, and fill in multiples of 32. I wrote a piece of code, but suddenly I found that I didn’t need it. Put it on and fill it up.

 

void YoloResize(Mat in, Mat &out)
{
	int w = in.cols;
	int h = in.rows;
	int target_w = 512;
	int target_h = 512;
	float ratio0 = (float)target_w / w;
	float ratio1 = (float)target_h / h;
	float scale = min(ratio0, ratio1);//转换的最小比例

	//保证长或宽,至少一个符合目标图像的尺寸
	int nw = int(w * scale);
	int nh = int(h * scale);
	//缩放图像
	cv::resize(in, out, cv::Size(nw, nh), (0, 0),(0, 0),cv::INTER_CUBIC);
	//设置输出图像大小,凑足32的倍数。将缩放好的图像放在输出图中间。
	if (ratio0 <= ratio1)//
	{
		//上下填充
		int addh = nh % 32;
		int newh = nh + addh;
	}
	else
	{
		//左右填充
	}
}

 

The complete calling code is here

// This code is written at BigVision LLC.
//It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html

#include <fstream>
#include <sstream>
#include <iostream>
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>


using namespace cv;
using namespace dnn;
using namespace std;

// Initialize the parameters
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4;  // Non-maximum suppression threshold
int inpWidth = 512;  // Width of network's input image
int inpHeight = 192; // Height of network's input image
vector<string> classes;

// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector<Mat>& out);

// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);

// Get the names of the output layers
vector<String> getOutputsNames(const Net& net);

int main(int argc, char** argv)
{

//*
	string classesFile = "E:\\LL\\rbc.names";
	ifstream ifs(classesFile.c_str());
	string line;
	while (getline(ifs, line)) classes.push_back(line);

	// Give the configuration and weight files for the model
	String modelConfiguration = "E:\\LL\\yolov3_new.cfg";
	String modelWeights = "E:\\LL\\converted.weights";

	// Load the network
	Net net = readNetFromDarknet(modelConfiguration, modelWeights);
	net.setPreferableBackend(DNN_BACKEND_OPENCV);
	net.setPreferableTarget(DNN_TARGET_CPU);

	// Open a video file or an image file or a camera stream.
	string str, outputFile;
	//VideoCapture cap("E:\\SSS.mp4");
	VideoWriter video;
	Mat frame, blob;



	// Create a window
	static const string kWinName = "Deep learning object detection in OpenCV";
	namedWindow(kWinName, WINDOW_NORMAL);

	// Process frames.
	while (waitKey(1) != 27)
	{
		// get frame from the video
		//cap >> frame;

		frame = imread("E:\\LL\\1.jpg");

		// Stop the program if reached end of video
		if (frame.empty()) {
			//waitKey(3000);
			break;
		}
		// Create a 4D blob from a frame.
		cout << "inpWidth = " << inpWidth << endl;
		cout << "inpHeight = " << inpHeight << endl;
		blobFromImage(frame, blob, 1 / 255.0, cv::Size(inpWidth, inpHeight), Scalar(0, 0, 0), true, false);

		//Sets the input to the network
		net.setInput(blob);

		// Runs the forward pass to get output of the output layers
		vector<Mat> outs;
		net.forward(outs, getOutputsNames(net));

		// Remove the bounding boxes with low confidence
		postprocess(frame, outs);

		// Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
		vector<double> layersTimes;
		double freq = getTickFrequency() / 1000;
		double t = net.getPerfProfile(layersTimes) / freq;
		string label = format("Inference time for a frame : %.2f ms", t);
		putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));

		// Write the frame with the detection boxes
		Mat detectedFrame;
		frame.convertTo(detectedFrame, CV_8U);

		imshow(kWinName, frame);
		waitKey(100000);
	}

	//cap.release();

	
	//*/
	return 0;
}

// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector<Mat>& outs)
{
	vector<int> classIds;
	vector<float> confidences;
	vector<Rect> boxes;

	for (size_t i = 0; i < outs.size(); ++i)
	{
		// Scan through all the bounding boxes output from the network and keep only the
		// ones with high confidence scores. Assign the box's class label as the class
		// with the highest score for the box.
		float* data = (float*)outs[i].data;
		for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
		{
			Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
			Point classIdPoint;
			double confidence;
			// Get the value and location of the maximum score
			minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
			if (confidence > 0)
			{
				confidence = confidence;
			}
			if (confidence > confThreshold)
			{
				int centerX = (int)(data[0] * frame.cols);
				int centerY = (int)(data[1] * frame.rows);
				int width = (int)(data[2] * frame.rows);
				int height = (int)(data[3] * frame.cols);
				int left = centerX - width / 2;
				int top = centerY - height / 2;

				classIds.push_back(classIdPoint.x);
				confidences.push_back((float)confidence);
				boxes.push_back(Rect(left, top, width, height));
			}
		}
	}

	// Perform non maximum suppression to eliminate redundant overlapping boxes with
	// lower confidences
	vector<int> indices;
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	for (size_t i = 0; i < indices.size(); ++i)
	{
		int idx = indices[i];
		Rect box = boxes[idx];
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
	}
}

// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
	//Draw a rectangle displaying the bounding box
	rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 178, 50), 3);

	//Get the label for the class name and its confidence
	string label = format("%.2f", conf);
	if (!classes.empty())
	{
		CV_Assert(classId < (int)classes.size());
		label = classes[classId] + ":" + label;
	}

	//Display the label at the top of the bounding box
	int baseLine;
	Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
	top = max(top, labelSize.height);
	rectangle(frame, Point(left, top - round(1.5*labelSize.height)), Point(left + round(1.5*labelSize.width), top + baseLine), Scalar(255, 255, 255), FILLED);
	putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 0, 0), 1);
}

// Get the names of the output layers
vector<String> getOutputsNames(const Net& net)
{
	static vector<String> names;
	if (names.empty())
	{
		//Get the indices of the output layers, i.e. the layers with unconnected outputs
		vector<int> outLayers = net.getUnconnectedOutLayers();

		//get the names of all the layers in the network
		vector<String> layersNames = net.getLayerNames();

		// Get the names of the output layers in names
		names.resize(outLayers.size());
		for (size_t i = 0; i < outLayers.size(); ++i)
			names[i] = layersNames[outLayers[i] - 1];
	}
	return names;
}

Look at the last result graph

The target of detection is a speed limit card.

8. All for speed: openvino acceleration method also comes.

It seems that openvino does not support darknet, so find a way to convert it. Continue~~

https://www.cnblogs.com/jsxyhelu/p/11340822.html  mark it

 


references:

1. https://blog.csdn.net/zhangping1987/article/details/84942680    anchors calculation

2. https://blog.csdn.net/sinat_34054843/article/details/88046041    Import model error solution

3. https://codeload.github.com/zqfang/YOLOv3_CPP/zip/master    yolov3's C++ code

4. https://blog.csdn.net/sue_kong/article/details/104401008   error handling, when installing opencv4.0

5. https://blog.csdn.net/zmdsjtu/article/details/81913927  opencv calls the compiled network weights to make predictions

6. https://blog.csdn.net/hzqgangtiexia/article/details/80509211  about learning rate

7. https://www.cnblogs.com/lvdongjie/p/11270447.html  is also the learning rate

 

To be continued...

 

Bringing goods: yolov3 loss function https://www.optbbs.com/thread-5590827-1-1.html

yolov3 loss function https://www.cnblogs.com/pprp/p/12590801.html

https://www.cnblogs.com/king-lps/p/9497836.html   explains focalloss

 

Guess you like

Origin blog.csdn.net/gbz3300255/article/details/106276897