Yolo 目标检测总结帖(yolov3,yolov2)

由于项目的需求,需要完成一个目标检测的任务,经过个人一段时间的实践,现将自己实现的功能以及体验过的事情在这里做个总结,以便后续查看,也让其它人少走一些弯路,在这个过程中参考了一些博客,便于入门与提升。

个人将大多数的时间花费在yolov3上,其精度效果会比yolov2的效果要好,但仿真和测试时间会花费一倍左右的时间,并且将yolov3的过程弄明白之后,yolov2如何跑动只是更改部分参数和预训练模型罢了。注意,如果想训练自己的数据集,最好是有一台带GPU的服务器,一般运行到3w次左右其损失值会下降到0.0x的量级,CPU跑一个batchsize很慢,不建议使用CPU训练,但可以使用CPU进行测试,使用CPU进行测试时有个小技巧,能够加快一倍的测试时间。

文章分为以下几个部分:

1.准备工具

软硬件环境:本地MacBook Pro,阿里云服务器(P100显卡)

1.1 yolo网络下载

yolo官方网站:https://pjreddie.com/darknet/yolo/

github项目地址:https://github.com/pjreddie/darknet/tree/master/data

1.2 labelImg(有缺陷)

github项目地址: https://github.com/tzutalin/labelImg

2.安装

2.1 yolo包的安装

参考官方文档:https://pjreddie.com/darknet/install/

2.1.1 CPU版本

git clone https://github.com/pjreddie/darknet.git
cd darkness
make

这里如果自己的电脑支持Openmp的话,也可以更改Makefile文件将其中的OPENMP的值更改为1,会加快训练和测试速度

GPU=0
CUDNN=0  
OPENCV=0
OPENMP=0 # 若电脑支持Openmp时,可以将其设置为1
DEBUG=0

2.1.2 GPU版本

git clone https://github.com/pjreddie/darknet.git
cd darknet
vim Makefile
make

对于GPU的Makefile更改的地方较多:

GPU=1 # 设置为1
CUDNN=1 # 设置为1
OPENCV=0 # 若后续想用opencv进行处理,可以将其设置为1
OPENMP=0
DEBUG=0

ARCH= -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52] \   
      -gencode arch=compute_60,code=[sm_60,compute_60]                                                                                                                                                                                                       # 这个地方是根据自己的GPU架构进行设置,不同架构的GPU的运算能力不一样,本文使用的是帕斯卡结构,查阅英伟达官网查看对应的计算能力为6.0            #  -gencode arch=compute_20,code=[sm_20,sm_21] \ This one is deprecated?                                                   # This is what I use, uncomment if you know your arch and want to specify
# ARCH= -gencode arch=compute_52,code=compute_52
VPATH=./src/:./examples
SLIB=libdarknet.so
ALIB=libdarknet.a
EXEC=darknet
OBJDIR=./obj/
CC=gcc
NVCC=nvcc # 这个地方若没有定义为环境变量,最好是使用绝对路径,大概位于`/usr/local/cuda/bin/nvcc`

对于GPU版本的安装,需要根据对应的地方更改Makefile文件。

2.2 labelImg的安装

两种安装方式:

2.2.1 文件包安装的方式:

labelImg的文件包安装见github的地址:https://github.com/tzutalin/labelImg

2.2.2 pip安装:

pip install labelImg
or 
brew install labelImg

注意,经过实践,发现labelImg对.png格式图像不友好,不支持对.png图像的标注,即使标注出来其标签文件也不对。

3.数据集的准备与制作

数据集的准备安装网上教程即可:

3.1 数据集标注

labelImg的使用方法一些博客都有讲解:参考博客 https://blog.csdn.net/xunan003/article/details/78720189/

有几个关键的地方需要强调一下:

OpenDir 是要标注图像的文件地址

Change Save Dir 是修改保存标记文件的地址

Next Image 标注完点击这个进行下一张的标注

Prev Image 想查看之前标注的情况

PascalVOC/YOLO 这个可选,前一种是得到的格式为xml的标签文件,后一种是直接得到格式为txt的标签文件,后一种适用于YOLO网络,前一种适合RCNN系列文章,根据自身选择,本文由于之前尝试过使用RCNN系列模型,就先标记为xml文件,这里不用担心,darknet提供了转换程序./scripts/voc_label.py。

3.2 数据集xml转成yolo

对于使用tensorfloe-objection detection api的人来说,标签格式是xml,好在darknet中提供了将xml格式的标签转换为txt标签的函数,darknet提供了转换程序./scripts/voc_label.py。注意这里需要修改的地方:


  
  
  1. import xml.etree.ElementTree as ET
  2. import pickle
  3. import os
  4. from os import listdir, getcwd
  5. from os.path import join
  6. # sets=[('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
  7. # 前一个表示年份,后一个表示训练或测试集文件
  8. sets=[( '2007', 'train'),( '2007', 'test')]
  9. # classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
  10. classes = [ "1", "2", "3"]
  11. # classes表示自己的类别名称
  12. def convert(size, box):
  13. dw = 1./(size[ 0])
  14. dh = 1./(size[ 1])
  15. x = (box[ 0] + box[ 1])/ 2.0 - 1
  16. y = (box[ 2] + box[ 3])/ 2.0 - 1
  17. w = box[ 1] - box[ 0]
  18. h = box[ 3] - box[ 2]
  19. x = x*dw
  20. w = w*dw
  21. y = y*dh
  22. h = h*dh
  23. return (x,y,w,h)
  24. def convert_annotation(year, image_id):
  25. in_file = open( 'VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
  26. out_file = open( 'VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
  27. tree=ET.parse(in_file)
  28. root = tree.getroot()
  29. size = root.find( 'size')
  30. w = int(size.find( 'width').text)
  31. h = int(size.find( 'height').text)
  32. for obj in root.iter( 'object'):
  33. difficult = obj.find( 'difficult').text
  34. cls = obj.find( 'name').text
  35. if cls not in classes or int(difficult)== 1:
  36. continue
  37. cls_id = classes.index(cls)
  38. xmlbox = obj.find( 'bndbox')
  39. b = (float(xmlbox.find( 'xmin').text), float(xmlbox.find( 'xmax').text), float(xmlbox.find( 'ymin').text), float(xmlbox.find( 'ymax').text))
  40. bb = convert((w,h), b)
  41. out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
  42. wd = getcwd()
  43. for year, image_set in sets:
  44. if not os.path.exists( 'VOCdevkit/VOC%s/labels/'%(year)):
  45. os.makedirs( 'VOCdevkit/VOC%s/labels/'%(year))
  46. image_ids = open( 'VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
  47. list_file = open( '%s_%s.txt'%(year, image_set), 'w')
  48. for image_id in image_ids:
  49. list_file.write( '%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
  50. convert_annotation(year, image_id)
  51. list_file.close()
  52. #这里将最后两行注射掉,运行后得到的训练集和测试集 组合在一起是整个数据集,而不是将训练集和测试集和一块作为训练集
  53. #os.system("cat 2007_train.txt 2007_val.txt 2012_train.txt 2012_val.txt > train.txt")
  54. #os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt 2012_train.txt 2012_val.txt > train.all.txt")

运行之后,在./scripts文件夹就得到训练集和测试集txt

对应的label文件夹中有了转换好的txt格式的label:

4. 网络模型的训练与测试

4.1 网络模型的训练

4.1.1 需要更改的地方

修改cfg/voc.data


  
  
  1. # 注意路径,相对路径和绝对路径都可以
  2. classes= n #类别数为n 你分几类就将n设置为几
  3. train = ./scripts/ 2007_train.txt #对应刚才生成的训练集txt
  4. valid = ./scripts/ 2007_test.txt
  5. names = data/voc.names
  6. backup = ./results/ #网络模型训练好的参数保存路径

修改data/voc.names


  
  
  1. #在这个地方输入你标签的名称,每类一行,比如我分三类,分别为“ni”,"hao","ma",则下面是
  2. ni
  3. has
  4. ma

修改cfg/yolov3-voc.cfg 网络模型参数


  
  
  1. [net]
  2. # 将头部代码更改为train,batch数量根据你自身的电脑能力设置,默认设置是64
  3. # Testing
  4. # batch=1
  5. # subdivisions=1
  6. # Training
  7. batch= 64
  8. subdivisions= 16
  9. width= 416
  10. height= 416
  11. channels= 3
  12. momentum= 0.9
  13. decay= 0.0005
  14. angle= 0
  15. saturation = 1.5
  16. exposure = 1.5
  17. hue= .1
  18. learning_rate= 0.001
  19. burn_in= 1000
  20. max_batches = 50200 #最大迭代batches数
  21. policy=steps
  22. steps= 20000, 35000 # 每迭代多少次改变一次学习率,这里是*0.1
  23. scales= .1, .1

  
  
  1. [convolutional]
  2. size= 1
  3. stride= 1
  4. pad= 1
  5. filters= 24 #这里filters数量更改,与类别有关,一般公式是(classes_nums + 5) *3
  6. activation=linear
  7. [yolo]
  8. mask = 0, 1, 2
  9. anchors = 10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326
  10. classes= 3 # 修改成你自己的类别数
  11. num= 9
  12. jitter= .3
  13. ignore_thresh = .5
  14. truth_thresh = 1
  15. random= 0

4.1.2 预训练模型训练

目前都是使用迁移学习,将成熟网络的部分参数直接用过来,这里也一样:

下装与训练模型: 

wget https://pjreddie.com/media/files/darknet53.conv.74
  
  

训练:

./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74
  
  

4.1.3 中断后继续训练

当训练进行到一半的时候,可能中途中断或者是停了想继续进行时,只需将上面的语句最后的预训练权重更换为之前在voc.data中设置的模型训练保存路径中存在的权重即可,这里以yolov3.weights表示:

./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg results/yolov3.weights
  
  

4.2 网络模型的测试

4.2.1 单张测试

单张测试就是指定一张图像名称进行测试,可类似于darknet网站中给定的例子那样,只不过需要修改相关路径及被测图片名称:

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
  
  
./darknet detect cfg/yolov3.cfg result/yolov3.weights /path/to/your picture
  
  

4.2.2 批量测试

如果想进行批量测试,则需要修改对应的源码,参考博客 https://blog.csdn.net/mieleizhi0522/article/details/79989754

但存在一个问题是无法将检测后的图像保存时,其名称与原始名称一样,有时候出错为null,在其基础上对其GetFilename函数进行修改。


  
  
  1. #include "darknet.h"
  2. static int coco_ids[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90};
  3. //获取文件的名字
  4. char* GetFilename(char *fullname)
  5. {
  6. int from,to,i;
  7. char *newstr,*temp;
  8. if(fullname!= NULL)
  9. {
  10. if((temp= strchr(fullname, '.'))== NULL) //if not find dot
  11. newstr = fullname;
  12. else
  13. {
  14. from = strlen(fullname) - 1;
  15. to = (temp-fullname); //the first dot's index;
  16. for(i=from;i--;i<=to)
  17. if(fullname[i]== '.') break; //find the last dot
  18. newstr = ( char*) malloc(i+ 1);
  19. strncpy(newstr,fullname,i);
  20. *(newstr+i)= 0;
  21. }
  22. }
  23. char name[ 50] = { ""};
  24. char *q = strrchr(newstr, '/') + 1;
  25. strncpy(name,q, 40);
  26. return name;
  27. }
  28. void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear)
  29. {
  30. list *options = read_data_cfg(datacfg);
  31. char *train_images = option_find_str(options, "train", "data/train.list");
  32. char *backup_directory = option_find_str(options, "backup", "/backup/");
  33. srand(time( 0));
  34. char *base = basecfg(cfgfile);
  35. printf( "%s\n", base);
  36. float avg_loss = -1;
  37. network **nets = calloc(ngpus, sizeof(network));
  38. srand(time( 0));
  39. int seed = rand();
  40. int i;
  41. for(i = 0; i < ngpus; ++i){
  42. srand(seed);
  43. #ifdef GPU
  44. cuda_set_device(gpus[i]);
  45. #endif
  46. nets[i] = load_network(cfgfile, weightfile, clear);
  47. nets[i]->learning_rate *= ngpus;
  48. }
  49. srand(time( 0));
  50. network *net = nets[ 0];
  51. int imgs = net->batch * net->subdivisions * ngpus;
  52. printf( "Learning Rate: %g, Momentum: %g, Decay: %g\n", net->learning_rate, net->momentum, net->decay);
  53. data train, buffer;
  54. layer l = net->layers[net->n - 1];
  55. int classes = l.classes;
  56. float jitter = l.jitter;
  57. list *plist = get_paths(train_images);
  58. //int N = plist->size;
  59. char **paths = ( char **)list_to_array(plist);
  60. load_args args = get_base_args(net);
  61. args.coords = l.coords;
  62. args.paths = paths;
  63. args.n = imgs;
  64. args.m = plist->size;
  65. args.classes = classes;
  66. args.jitter = jitter;
  67. args.num_boxes = l.max_boxes;
  68. args.d = &buffer;
  69. args.type = DETECTION_DATA;
  70. //args.type = INSTANCE_DATA;
  71. args.threads = 64;
  72. pthread_t load_thread = load_data(args);
  73. double time;
  74. int count = 0;
  75. //while(i*imgs < N*120){
  76. while(get_current_batch(net) < net->max_batches){
  77. if(l.random && count++% 10 == 0){
  78. printf( "Resizing\n");
  79. int dim = (rand() % 10 + 10) * 32;
  80. if (get_current_batch(net)+ 200 > net->max_batches) dim = 608;
  81. //int dim = (rand() % 4 + 16) * 32;
  82. printf( "%d\n", dim);
  83. args.w = dim;
  84. args.h = dim;
  85. pthread_join(load_thread, 0);
  86. train = buffer;
  87. free_data(train);
  88. load_thread = load_data(args);
  89. #pragma omp parallel for
  90. for(i = 0; i < ngpus; ++i){
  91. resize_network(nets[i], dim, dim);
  92. }
  93. net = nets[ 0];
  94. }
  95. time=what_time_is_it_now();
  96. pthread_join(load_thread, 0);
  97. train = buffer;
  98. load_thread = load_data(args);
  99. /*
  100. int k;
  101. for(k = 0; k < l.max_boxes; ++k){
  102. box b = float_to_box(train.y.vals[10] + 1 + k*5);
  103. if(!b.x) break;
  104. printf("loaded: %f %f %f %f\n", b.x, b.y, b.w, b.h);
  105. }
  106. */
  107. /*
  108. int zz;
  109. for(zz = 0; zz < train.X.cols; ++zz){
  110. image im = float_to_image(net->w, net->h, 3, train.X.vals[zz]);
  111. int k;
  112. for(k = 0; k < l.max_boxes; ++k){
  113. box b = float_to_box(train.y.vals[zz] + k*5, 1);
  114. printf("%f %f %f %f\n", b.x, b.y, b.w, b.h);
  115. draw_bbox(im, b, 1, 1,0,0);
  116. }
  117. show_image(im, "truth11");
  118. cvWaitKey(0);
  119. save_image(im, "truth11");
  120. }
  121. */
  122. printf( "Loaded: %lf seconds\n", what_time_is_it_now()-time);
  123. time=what_time_is_it_now();
  124. float loss = 0;
  125. #ifdef GPU
  126. if(ngpus == 1){
  127. loss = train_network(net, train);
  128. } else {
  129. loss = train_networks(nets, ngpus, train, 4);
  130. }
  131. #else
  132. loss = train_network(net, train);
  133. #endif
  134. if (avg_loss < 0) avg_loss = loss;
  135. avg_loss = avg_loss* .9 + loss* .1;
  136. i = get_current_batch(net);
  137. printf( "%ld: %f, %f avg, %f rate, %lf seconds, %d images\n", get_current_batch(net), loss, avg_loss, get_current_rate(net), what_time_is_it_now()-time, i*imgs);
  138. if(i% 100== 0){
  139. #ifdef GPU
  140. if(ngpus != 1) sync_nets(nets, ngpus, 0);
  141. #endif
  142. char buff[ 256];
  143. sprintf(buff, "%s/%s.backup", backup_directory, base);
  144. save_weights(net, buff);
  145. }
  146. if(i% 10000== 0 || (i < 1000 && i% 100 == 0)){
  147. #ifdef GPU
  148. if(ngpus != 1) sync_nets(nets, ngpus, 0);
  149. #endif
  150. char buff[ 256];
  151. sprintf(buff, "%s/%s_%d.weights", backup_directory, base, i);
  152. save_weights(net, buff);
  153. }
  154. free_data(train);
  155. }
  156. #ifdef GPU
  157. if(ngpus != 1) sync_nets(nets, ngpus, 0);
  158. #endif
  159. char buff[ 256];
  160. sprintf(buff, "%s/%s_final.weights", backup_directory, base);
  161. save_weights(net, buff);
  162. }
  163. static int get_coco_image_id(char *filename)
  164. {
  165. char *p = strrchr(filename, '/');
  166. char *c = strrchr(filename, '_');
  167. if(c) p = c;
  168. return atoi(p+ 1);
  169. }
  170. static void print_cocos(FILE *fp, char *image_path, detection *dets, int num_boxes, int classes, int w, int h)
  171. {
  172. int i, j;
  173. int image_id = get_coco_image_id(image_path);
  174. for(i = 0; i < num_boxes; ++i){
  175. float xmin = dets[i].bbox.x - dets[i].bbox.w/ 2.;
  176. float xmax = dets[i].bbox.x + dets[i].bbox.w/ 2.;
  177. float ymin = dets[i].bbox.y - dets[i].bbox.h/ 2.;
  178. float ymax = dets[i].bbox.y + dets[i].bbox.h/ 2.;
  179. if (xmin < 0) xmin = 0;
  180. if (ymin < 0) ymin = 0;
  181. if (xmax > w) xmax = w;
  182. if (ymax > h) ymax = h;
  183. float bx = xmin;
  184. float by = ymin;
  185. float bw = xmax - xmin;
  186. float bh = ymax - ymin;
  187. for(j = 0; j < classes; ++j){
  188. if (dets[i].prob[j]) fprintf(fp, "{\"image_id\":%d, \"category_id\":%d, \"bbox\":[%f, %f, %f, %f], \"score\":%f},\n", image_id, coco_ids[j], bx, by, bw, bh, dets[i].prob[j]);
  189. }
  190. }
  191. }
  192. void print_detector_detections(FILE **fps, char *id, detection *dets, int total, int classes, int w, int h)
  193. {
  194. int i, j;
  195. for(i = 0; i < total; ++i){
  196. float xmin = dets[i].bbox.x - dets[i].bbox.w/ 2. + 1;
  197. float xmax = dets[i].bbox.x + dets[i].bbox.w/ 2. + 1;
  198. float ymin = dets[i].bbox.y - dets[i].bbox.h/ 2. + 1;
  199. float ymax = dets[i].bbox.y + dets[i].bbox.h/ 2. + 1;
  200. if (xmin < 1) xmin = 1;
  201. if (ymin < 1) ymin = 1;
  202. if (xmax > w) xmax = w;
  203. if (ymax > h) ymax = h;
  204. for(j = 0; j < classes; ++j){
  205. if (dets[i].prob[j]) fprintf(fps[j], "%s %f %f %f %f %f\n", id, dets[i].prob[j],
  206. xmin, ymin, xmax, ymax);
  207. }
  208. }
  209. }
  210. void print_imagenet_detections(FILE *fp, int id, detection *dets, int total, int classes, int w, int h)
  211. {
  212. int i, j;
  213. for(i = 0; i < total; ++i){
  214. float xmin = dets[i].bbox.x - dets[i].bbox.w/ 2.;
  215. float xmax = dets[i].bbox.x + dets[i].bbox.w/ 2.;
  216. float ymin = dets[i].bbox.y - dets[i].bbox.h/ 2.;
  217. float ymax = dets[i].bbox.y + dets[i].bbox.h/ 2.;
  218. if (xmin < 0) xmin = 0;
  219. if (ymin < 0) ymin = 0;
  220. if (xmax > w) xmax = w;
  221. if (ymax > h) ymax = h;
  222. for(j = 0; j < classes; ++j){
  223. int class = j;
  224. if (dets[i].prob[class]) fprintf(fp, "%d %d %f %f %f %f %f\n", id, j+ 1, dets[i].prob[class],
  225. xmin, ymin, xmax, ymax);
  226. }
  227. }
  228. }
  229. void validate_detector_flip(char *datacfg, char *cfgfile, char *weightfile, char *outfile)
  230. {
  231. int j;
  232. list *options = read_data_cfg(datacfg);
  233. char *valid_images = option_find_str(options, "valid", "data/train.list");
  234. char *name_list = option_find_str(options, "names", "data/names.list");
  235. char *prefix = option_find_str(options, "results", "results");
  236. char **names = get_labels(name_list);
  237. char *mapf = option_find_str(options, "map", 0);
  238. int * map = 0;
  239. if (mapf) map = read_map(mapf);
  240. network *net = load_network(cfgfile, weightfile, 0);
  241. set_batch_network(net, 2);
  242. fprintf( stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net->learning_rate, net->momentum, net->decay);
  243. srand(time( 0));
  244. list *plist = get_paths(valid_images);
  245. char **paths = ( char **)list_to_array(plist);
  246. layer l = net->layers[net->n -1];
  247. int classes = l.classes;
  248. char buff[ 1024];
  249. char *type = option_find_str(options, "eval", "voc");
  250. FILE *fp = 0;
  251. FILE **fps = 0;
  252. int coco = 0;
  253. int imagenet = 0;
  254. if( 0== strcmp(type, "coco")){
  255. if(!outfile) outfile = "coco_results";
  256. snprintf(buff, 1024, "%s/%s.json", prefix, outfile);
  257. fp = fopen(buff, "w");
  258. fprintf(fp, "[\n");
  259. coco = 1;
  260. } else if( 0== strcmp(type, "imagenet")){
  261. if(!outfile) outfile = "imagenet-detection";
  262. snprintf(buff, 1024, "%s/%s.txt", prefix, outfile);
  263. fp = fopen(buff, "w");
  264. imagenet = 1;
  265. classes = 200;
  266. } else {
  267. if(!outfile) outfile = "comp4_det_test_";
  268. fps = calloc(classes, sizeof(FILE *));
  269. for(j = 0; j < classes; ++j){
  270. snprintf(buff, 1024, "%s/%s%s.txt", prefix, outfile, names[j]);
  271. fps[j] = fopen(buff, "w");
  272. }
  273. }
  274. int m = plist->size;
  275. int i= 0;
  276. int t;
  277. float thresh = .005;
  278. float nms = .45;
  279. int nthreads = 4;
  280. image *val = calloc(nthreads, sizeof(image));
  281. image *val_resized = calloc(nthreads, sizeof(image));
  282. image *buf = calloc(nthreads, sizeof(image));
  283. image *buf_resized = calloc(nthreads, sizeof(image));
  284. pthread_t *thr = calloc(nthreads, sizeof( pthread_t));
  285. image input = make_image(net->w, net->h, net->c* 2);
  286. load_args args = { 0};
  287. args.w = net->w;
  288. args.h = net->h;
  289. //args.type = IMAGE_DATA;
  290. args.type = LETTERBOX_DATA;
  291. for(t = 0; t < nthreads; ++t){
  292. args.path = paths[i+t];
  293. args.im = &buf[t];
  294. args.resized = &buf_resized[t];
  295. thr[t] = load_data_in_thread(args);
  296. }
  297. double start = what_time_is_it_now();
  298. for(i = nthreads; i < m+nthreads; i += nthreads){
  299. fprintf( stderr, "%d\n", i);
  300. for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
  301. pthread_join(thr[t], 0);
  302. val[t] = buf[t];
  303. val_resized[t] = buf_resized[t];
  304. }
  305. for(t = 0; t < nthreads && i+t < m; ++t){
  306. args.path = paths[i+t];
  307. args.im = &buf[t];
  308. args.resized = &buf_resized[t];
  309. thr[t] = load_data_in_thread(args);
  310. }
  311. for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
  312. char *path = paths[i+t-nthreads];
  313. char *id = basecfg(path);
  314. copy_cpu(net->w*net->h*net->c, val_resized[t].data, 1, input.data, 1);
  315. flip_image(val_resized[t]);
  316. copy_cpu(net->w*net->h*net->c, val_resized[t].data, 1, input.data + net->w*net->h*net->c, 1);
  317. network_predict(net, input.data);
  318. int w = val[t].w;
  319. int h = val[t].h;
  320. int num = 0;
  321. detection *dets = get_network_boxes(net, w, h, thresh, .5, map, 0, &num);
  322. if (nms) do_nms_sort(dets, num, classes, nms);
  323. if (coco){
  324. print_cocos(fp, path, dets, num, classes, w, h);
  325. } else if (imagenet){
  326. print_imagenet_detections(fp, i+t-nthreads+ 1, dets, num, classes, w, h);
  327. } else {
  328. print_detector_detections(fps, id, dets, num, classes, w, h);
  329. }
  330. free_detections(dets, num);
  331. free(id);
  332. free_image(val[t]);
  333. free_image(val_resized[t]);
  334. }
  335. }
  336. for(j = 0; j < classes; ++j){
  337. if(fps) fclose(fps[j]);
  338. }
  339. if(coco){
  340. fseek(fp, -2, SEEK_CUR);
  341. fprintf(fp, "\n]\n");
  342. fclose(fp);
  343. }
  344. fprintf( stderr, "Total Detection Time: %f Seconds\n", what_time_is_it_now() - start);
  345. }
  346. void validate_detector(char *datacfg, char *cfgfile, char *weightfile, char *outfile)
  347. {
  348. int j;
  349. list *options = read_data_cfg(datacfg);
  350. char *valid_images = option_find_str(options, "valid", "data/train.list");
  351. char *name_list = option_find_str(options, "names", "data/names.list");
  352. char *prefix = option_find_str(options, "results", "results");
  353. char **names = get_labels(name_list);
  354. char *mapf = option_find_str(options, "map", 0);
  355. int * map = 0;
  356. if (mapf) map = read_map(mapf);
  357. network *net = load_network(cfgfile, weightfile, 0);
  358. set_batch_network(net, 1);
  359. fprintf( stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net->learning_rate, net->momentum, net->decay);
  360. srand(time( 0));
  361. list *plist = get_paths(valid_images);
  362. char **paths = ( char **)list_to_array(plist);
  363. layer l = net->layers[net->n -1];
  364. int classes = l.classes;
  365. char buff[ 1024];
  366. char *type = option_find_str(options, "eval", "voc");
  367. FILE *fp = 0;
  368. FILE **fps = 0;
  369. int coco = 0;
  370. int imagenet = 0;
  371. if( 0== strcmp(type, "coco")){
  372. if(!outfile) outfile = "coco_results";
  373. snprintf(buff, 1024, "%s/%s.json", prefix, outfile);
  374. fp = fopen(buff, "w");
  375. fprintf(fp, "[\n");
  376. coco = 1;
  377. } else if( 0== strcmp(type, "imagenet")){
  378. if(!outfile) outfile = "imagenet-detection";
  379. snprintf(buff, 1024, "%s/%s.txt", prefix, outfile);
  380. fp = fopen(buff, "w");
  381. imagenet = 1;
  382. classes = 200;
  383. } else {
  384. if(!outfile) outfile = "comp4_det_test_";
  385. fps = calloc(classes, sizeof(FILE *));
  386. for(j = 0; j < classes; ++j){
  387. snprintf(buff, 1024, "%s/%s%s.txt", prefix, outfile, names[j]);
  388. fps[j] = fopen(buff, "w");
  389. }
  390. }
  391. int m = plist->size;
  392. int i= 0;
  393. int t;
  394. float thresh = .005;
  395. float nms = .45;
  396. int nthreads = 4;
  397. image *val = calloc(nthreads, sizeof(image));
  398. image *val_resized = calloc(nthreads, sizeof(image));
  399. image *buf = calloc(nthreads, sizeof(image));
  400. image *buf_resized = calloc(nthreads, sizeof(image));
  401. pthread_t *thr = calloc(nthreads, sizeof( pthread_t));
  402. load_args args = { 0};
  403. args.w = net->w;
  404. args.h = net->h;
  405. //args.type = IMAGE_DATA;
  406. args.type = LETTERBOX_DATA;
  407. for(t = 0; t < nthreads; ++t){
  408. args.path = paths[i+t];
  409. args.im = &buf[t];
  410. args.resized = &buf_resized[t];
  411. thr[t] = load_data_in_thread(args);
  412. }
  413. double start = what_time_is_it_now();
  414. for(i = nthreads; i < m+nthreads; i += nthreads){
  415. fprintf( stderr, "%d\n", i);
  416. for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
  417. pthread_join(thr[t], 0);
  418. val[t] = buf[t];
  419. val_resized[t] = buf_resized[t];
  420. }
  421. for(t = 0; t < nthreads && i+t < m; ++t){
  422. args.path = paths[i+t];
  423. args.im = &buf[t];
  424. args.resized = &buf_resized[t];
  425. thr[t] = load_data_in_thread(args);
  426. }
  427. for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
  428. char *path = paths[i+t-nthreads];
  429. char *id = basecfg(path);
  430. float *X = val_resized[t].data;
  431. network_predict(net, X);
  432. int w = val[t].w;
  433. int h = val[t].h;
  434. int nboxes = 0;
  435. detection *dets = get_network_boxes(net, w, h, thresh, .5, map, 0, &nboxes);
  436. if (nms) do_nms_sort(dets, nboxes, classes, nms);
  437. if (coco){
  438. print_cocos(fp, path, dets, nboxes, classes, w, h);
  439. } else if (imagenet){
  440. print_imagenet_detections(fp, i+t-nthreads+ 1, dets, nboxes, classes, w, h);
  441. } else {
  442. print_detector_detections(fps, id, dets, nboxes, classes, w, h);
  443. }
  444. free_detections(dets, nboxes);
  445. free(id);
  446. free_image(val[t]);
  447. free_image(val_resized[t]);
  448. }
  449. }
  450. for(j = 0; j < classes; ++j){
  451. if(fps) fclose(fps[j]);
  452. }
  453. if(coco){
  454. fseek(fp, -2, SEEK_CUR);
  455. fprintf(fp, "\n]\n");
  456. fclose(fp);
  457. }
  458. fprintf( stderr, "Total Detection Time: %f Seconds\n", what_time_is_it_now() - start);
  459. }
  460. void validate_detector_recall(char *datacfg, char *cfgfile, char *weightfile)
  461. {
  462. network *net = load_network(cfgfile, weightfile, 0);
  463. set_batch_network(net, 1);
  464. fprintf( stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net->learning_rate, net->momentum, net->decay);
  465. srand(time( 0));
  466. list *options = read_data_cfg(datacfg);
  467. char *valid_images = option_find_str(options, "valid", "data/train.list");
  468. list *plist = get_paths(valid_images);
  469. char **paths = ( char **)list_to_array(plist);
  470. layer l = net->layers[net->n -1];
  471. int j, k;
  472. int m = plist->size;
  473. int i= 0;
  474. float thresh = .001;
  475. float iou_thresh = .5;
  476. float nms = .4;
  477. int total = 0;
  478. int correct = 0;
  479. int proposals = 0;
  480. float avg_iou = 0;
  481. for(i = 0; i < m; ++i){
  482. char *path = paths[i];
  483. image orig = load_image_color(path, 0, 0);
  484. image sized = resize_image(orig, net->w, net->h);
  485. char *id = basecfg(path);
  486. network_predict(net, sized.data);
  487. int nboxes = 0;
  488. detection *dets = get_network_boxes(net, sized.w, sized.h, thresh, .5, 0, 1, &nboxes);
  489. if (nms) do_nms_obj(dets, nboxes, 1, nms);
  490. char labelpath[ 4096];
  491. find_replace(path, "images", "labels", labelpath);
  492. find_replace(labelpath, "JPEGImages", "labels", labelpath);
  493. find_replace(labelpath, ".jpg", ".txt", labelpath);
  494. find_replace(labelpath, ".JPEG", ".txt", labelpath);
  495. int num_labels = 0;
  496. box_label *truth = read_boxes(labelpath, &num_labels);
  497. for(k = 0; k < nboxes; ++k){
  498. if(dets[k].objectness > thresh){
  499. ++proposals;
  500. }
  501. }
  502. for (j = 0; j < num_labels; ++j) {
  503. ++total;
  504. box t = {truth[j].x, truth[j].y, truth[j].w, truth[j].h};
  505. float best_iou = 0;
  506. for(k = 0; k < l.w*l.h*l.n; ++k){
  507. float iou = box_iou(dets[k].bbox, t);
  508. if(dets[k].objectness > thresh && iou > best_iou){
  509. best_iou = iou;
  510. }
  511. }
  512. avg_iou += best_iou;
  513. if(best_iou > iou_thresh){
  514. ++correct;
  515. }
  516. }
  517. fprintf( stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, ( float)proposals/(i+ 1), avg_iou* 100/total, 100.*correct/total);
  518. free(id);
  519. free_image(orig);
  520. free_image(sized);
  521. }
  522. }
  523. void test_detector(char *datacfg, char *cfgfile, char *weightfile, char *filename, float thresh, float hier_thresh, char *outfile, int fullscreen)
  524. {
  525. list *options = read_data_cfg(datacfg);
  526. char *name_list = option_find_str(options, "names", "data/names.list");
  527. char **names = get_labels(name_list);
  528. image **alphabet = load_alphabet();
  529. network *net = load_network(cfgfile, weightfile, 0);
  530. set_batch_network(net, 1);
  531. srand( 2222222);
  532. double time;
  533. char buff[ 256];
  534. char *input = buff;
  535. float nms= .45;
  536. int i= 0;
  537. while( 1){
  538. if(filename){
  539. strncpy(input, filename, 256);
  540. image im = load_image_color(input, 0, 0);
  541. image sized = letterbox_image(im, net->w, net->h);
  542. //image sized = resize_image(im, net->w, net->h);
  543. //image sized2 = resize_max(im, net->w);
  544. //image sized = crop_image(sized2, -((net->w - sized2.w)/2), -((net->h - sized2.h)/2), net->w, net->h);
  545. //resize_network(net, sized.w, sized.h);
  546. layer l = net->layers[net->n -1];
  547. float *X = sized.data;
  548. time=what_time_is_it_now();
  549. network_predict(net, X);
  550. printf( "%s: Predicted in %f seconds.\n", input, what_time_is_it_now()-time);
  551. int nboxes = 0;
  552. detection *dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, 0, 1, &nboxes);
  553. //printf("%d\n", nboxes);
  554. //if (nms) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
  555. if (nms) do_nms_sort(dets, nboxes, l.classes, nms);
  556. draw_detections(im, dets, nboxes, thresh, names, alphabet, l.classes);
  557. free_detections(dets, nboxes);
  558. if(outfile)
  559. {
  560. save_image(im, outfile);
  561. }
  562. else{
  563. //save_image(im, "predictions");
  564. char image[ 2048];
  565. sprintf(image, "./data/predict/%s",GetFilename(filename));
  566. save_image(im,image);
  567. printf( "predict %s successfully!\n",GetFilename(filename));
  568. #ifdef OPENCV
  569. cvNamedWindow( "predictions", CV_WINDOW_NORMAL);
  570. if(fullscreen){
  571. cvSetWindowProperty( "predictions", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
  572. }
  573. show_image(im, "predictions");
  574. cvWaitKey( 0);
  575. cvDestroyAllWindows();
  576. #endif
  577. }
  578. free_image(im);
  579. free_image(sized);
  580. if (filename) break;
  581. }
  582. else {
  583. printf( "Enter Image Path: ");
  584. fflush( stdout);
  585. input = fgets(input, 256, stdin);
  586. if(!input) return;
  587. strtok(input, "\n");
  588. list *plist = get_paths(input);
  589. char **paths = ( char **)list_to_array(plist);
  590. printf( "Start Testing!\n");
  591. int m = plist->size;
  592. if(access( "./data/out", 0)== -1) //"/home/FENGsl/darknet/data"修改成自己的路径
  593. {
  594. if (mkdir( "./data/out", 0777)) //"/home/FENGsl/darknet/data"修改成自己的路径
  595. {
  596. printf( "creat file bag failed!!!");
  597. }
  598. }
  599. for(i = 0; i < m; ++i){
  600. char *path = paths[i];
  601. image im = load_image_color(path, 0, 0);
  602. image sized = letterbox_image(im, net->w, net->h);
  603. //image sized = resize_image(im, net->w, net->h);
  604. //image sized2 = resize_max(im, net->w);
  605. //image sized = crop_image(sized2, -((net->w - sized2.w)/2), -((net->h - sized2.h)/2), net->w, net->h);
  606. //resize_network(net, sized.w, sized.h);
  607. layer l = net->layers[net->n -1];
  608. float *X = sized.data;
  609. time=what_time_is_it_now();
  610. network_predict(net, X);
  611. printf( "Try Very Hard:");
  612. printf( "%s: Predicted in %f seconds.\n", path, what_time_is_it_now()-time);
  613. int nboxes = 0;
  614. detection *dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, 0, 1, &nboxes);
  615. //printf("%d\n", nboxes);
  616. //if (nms) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
  617. if (nms) do_nms_sort(dets, nboxes, l.classes, nms);
  618. draw_detections(im, dets, nboxes, thresh, names, alphabet, l.classes);
  619. free_detections(dets, nboxes);
  620. if(outfile){
  621. save_image(im, outfile);
  622. }
  623. else{
  624. char b[ 2048];
  625. sprintf(b, "./data/out/%s",GetFilename(path)); //"/home/FENGsl/darknet/data"修改成自己的路径
  626. save_image(im, b);
  627. printf( "save %s successfully!\n",GetFilename(path));
  628. #ifdef OPENCV
  629. cvNamedWindow( "predictions", CV_WINDOW_NORMAL);
  630. if(fullscreen){
  631. cvSetWindowProperty( "predictions", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
  632. }
  633. show_image(im, "predictions");
  634. cvWaitKey( 0);
  635. cvDestroyAllWindows();
  636. #endif
  637. }
  638. free_image(im);
  639. free_image(sized);
  640. if (filename) break;
  641. }
  642. }
  643. }
  644. }
  645. void run_detector(int argc, char **argv)
  646. {
  647. char *prefix = find_char_arg(argc, argv, "-prefix", 0);
  648. float thresh = find_float_arg(argc, argv, "-thresh", .5);
  649. float hier_thresh = find_float_arg(argc, argv, "-hier", .5);
  650. int cam_index = find_int_arg(argc, argv, "-c", 0);
  651. int frame_skip = find_int_arg(argc, argv, "-s", 0);
  652. int avg = find_int_arg(argc, argv, "-avg", 3);
  653. if(argc < 4){
  654. fprintf( stderr, "usage: %s %s [train/test/valid] [cfg] [weights (optional)]\n", argv[ 0], argv[ 1]);
  655. return;
  656. }
  657. char *gpu_list = find_char_arg(argc, argv, "-gpus", 0);
  658. char *outfile = find_char_arg(argc, argv, "-out", 0);
  659. int *gpus = 0;
  660. int gpu = 0;
  661. int ngpus = 0;
  662. if(gpu_list){
  663. printf( "%s\n", gpu_list);
  664. int len = strlen(gpu_list);
  665. ngpus = 1;
  666. int i;
  667. for(i = 0; i < len; ++i){
  668. if (gpu_list[i] == ',') ++ngpus;
  669. }
  670. gpus = calloc(ngpus, sizeof( int));
  671. for(i = 0; i < ngpus; ++i){
  672. gpus[i] = atoi(gpu_list);
  673. gpu_list = strchr(gpu_list, ',')+ 1;
  674. }
  675. } else {
  676. gpu = gpu_index;
  677. gpus = &gpu;
  678. ngpus = 1;
  679. }
  680. int clear = find_arg(argc, argv, "-clear");
  681. int fullscreen = find_arg(argc, argv, "-fullscreen");
  682. int width = find_int_arg(argc, argv, "-w", 0);
  683. int height = find_int_arg(argc, argv, "-h", 0);
  684. int fps = find_int_arg(argc, argv, "-fps", 0);
  685. //int class = find_int_arg(argc, argv, "-class", 0);
  686. char *datacfg = argv[ 3];
  687. char *cfg = argv[ 4];
  688. char *weights = (argc > 5) ? argv[ 5] : 0;
  689. char *filename = (argc > 6) ? argv[ 6]: 0;
  690. if( 0== strcmp(argv[ 2], "test")) test_detector(datacfg, cfg, weights, filename, thresh, hier_thresh, outfile, fullscreen);
  691. else if( 0== strcmp(argv[ 2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus, clear);
  692. else if( 0== strcmp(argv[ 2], "valid")) validate_detector(datacfg, cfg, weights, outfile);
  693. else if( 0== strcmp(argv[ 2], "valid2")) validate_detector_flip(datacfg, cfg, weights, outfile);
  694. else if( 0== strcmp(argv[ 2], "recall")) validate_detector_recall(datacfg,cfg, weights);
  695. else if( 0== strcmp(argv[ 2], "demo")) {
  696. list *options = read_data_cfg(datacfg);
  697. int classes = option_find_int(options, "classes", 20);
  698. char *name_list = option_find_str(options, "names", "data/names.list");
  699. char **names = get_labels(name_list);
  700. demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, avg, hier_thresh, width, height, fps, fullscreen);
  701. }
  702. //else if(0==strcmp(argv[2], "extract")) extract_detector(datacfg, cfg, weights, cam_index, filename, class, thresh, frame_skip);
  703. //else if(0==strcmp(argv[2], "censor")) censor_detector(datacfg, cfg, weights, cam_index, filename, class, thresh, frame_skip);
  704. }

  
  
  1. ./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg backup/yolov3-voc_final.weights
  2. layer filters size input output
  3. 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs
  4. 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BFLOPs
  5. .......
  6. 104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
  7. 105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BFLOPs
  8. 106 detection
  9. Loading weights from yolov3.weights...Done!
  10. Enter Image Path:

这里让输入图像路径,一个txt保存的路径即可,我在这里输入的是之前生成的2007_test.txt

5. Python接口

darknet提供了python接口,直接使用python即可调用程序得到检测结果,python接口在`./darknet/python`文件夹中,调用的是编译时生成的libdarknet.so文件,不同的机器平台编译生成的文件不一样,如果换机器或使用cpu或gpu运行时,请重新编译一下。

该文件有两个文件夹:darknet.py 与provertbot.py,目前的版本支持python2.7,适当修改代码使其支持python3.x,个人做好的api已上传到github上,方便使用。

python darknet.py
  
  

 输出结果形式为: res.append((meta.names[i], dets[j].prob[i], (b.x, b.y, b.w, b.h)))

依次为检测出物体的名称,概率,检测框大小范围(在原图中所处的位置),其中x,y表示方框中心,w和h分别表示中心到两边的宽度和高度,如下图所示:

本人的api对其进行了更改,输出的是方框的横纵坐标的范围(x1,x2,y1,y2),且个人只分三类,区分服装上衣,下衣及全身装。读者有需要的话只需修改你训练的model位置及配置文件cfg即可。

https://github.com/UncleLLD/img-detect-yolov3

7. 计算MAP和recall

1.生成检测结果文件

./darknet detector valid cfg/car.data cfg/car.cfg backup/car_final.weights -out car.txt -gpu 0 -thresh .5
  
  

2.把car.txt 用faster rcnn 中voc_eval计算mAP


  
  
  1. /home/sam/src/caffeup2date_pyfasterrcnn/lib/datasets/compute_mAP.py
  2. from voc_eval import voc_eval
  3. print(voc_eval( '/home/sam/src/darknet/results/{}.txt',/home/sam/datasets/car2/VOC2007/Annotations/{}.xml ','/home/sam/datasets/car2/VOC2007/ImageSets/Main/test.txt ', 'ca r', '. ')

第三个结果就是

如果只想计算大于0.3的输出结果的mAP,把 voc_eval.py文件中如下代码更改


  
  
  1. sorted_ind = np.argsort(-confidence)
  2. sorted_ind1 = np.where(confidence[sorted_ind] >= .3)[ 0] #np.argsort(-confidence<=-.3)
  3. sorted_ind = sorted_ind[sorted_ind1]

3.计算recall 

./darknet detector recall cfg/car.data cfg/car.cfg backup/car_final.weights -out car.txt -gpu 0 -thresh .5
  
  

7.参考

YOLO V3

参考:

* YOLOv3训练自己的VOC数据集(配置及训练)

*YOLOv3批量测试图片并保存在自定义文件夹下(批量测试)

注:文件夹内容保存图片命名问题,*GetFilename(char *p)函数中限制了文件名长度,修改即可

YOLOv3 ubuntu 配置及训练自己的VOC格式数据集(配置及训练)

YOLOv3: 训练自己的数据(训练为主,部分测试问题可以参考)

* YOLO 网络终端输出参数意义

  英文:https://timebutt.github.io/static/understanding-yolov2-training-output/

  中文:https://blog.csdn.net/dcrmg/article/details/78565440

* yolo官方文档https://pjreddie.com/darknet/yolo/

YOLO V2

参考:

* YOLOv2训练自己数据集的一些心得----VOC格式 (可视化)

* YOLOv2训练自己的数据集

* 使用YOLO v2训练自己的数据

后续待完善...

猜你喜欢

转载自blog.csdn.net/qq_31511955/article/details/87871981