目标检测:SSD 安装、训练、测试

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/lilai619/article/details/53791420

说明:这个是SSD刚出来时候的博文记录,最新的可能会有更变,如有问题,请大家查阅官网链接。

环境:ubuntu14.04+cuda7.5+openvc2.4.9

安装

1.下载SSD

sudo apt-get install git

git clone https://github.com/weiliu89/caffe.git

cd caffe

git checkout ssd

验证包是否齐全

sudo apt-get install python-pip

sudo apt-get install python-numpy

sudo apt-get install python-scipy

pip install cython -ihttp://pypi.douban.com/simple

pip install eaydict


也可以指定git clone 存放地址(到指定目录下,运行以上命令就行了)
2.修改Makefile.config文件

复制根目录下的Makefile.config.exampleMakefile.config

根据本机环境,调整以下参数:

CUDA_ARCH:

BLAS:
MATLAB_DIR:(可选)

PYTHON_INCLUDE:

3.编译
在源码包的根目录下运行以下命令:

make -j8

make py

make test -j8

make runtest -j8(可选)

4.编译错误分析

查看gcc当前版本:

gcc -v

注意:cuda8.0要将gcc升级到5.0,否则就会出现上图的错误。错误链接https://github.com/weiliu89/caffe/issues/237

The problem was that to make caffe with CUDA 8 it is necessary a 5.3 or 5.4 GCC version.
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

错误1

如果有多GPU,运行make runtest 出错

解决方案:export CUDA_VISIBLE_DEVICES=0; makeruntest -j8

如果出现错误: check failed :error == cudasuccess(10 vs. 0) invaliddevice ordinal

解决方案:首先需要确保使用的是特定的GPU,或者尝试

unsetCUDA_VISIBLE_DEVICES

错误2

使用caffe时编译出错

include and lib

使用自己机器编译的include和lib (caffe/build/lib, caffe/include)

caffe.pb.h丢失问题:

/home/xxx/caffe/include/caffe/blob.hpp:9:34:fatal error: caffe/proto/caffe.pb.h: No such file or directory

 #include "caffe/proto/caffe.pb.h"

解决方法: 用protoc从caffe/src/caffe/proto/caffe.proto生成caffe.pb.h和caffe.pb.cc

li@li:~/caffe/src/caffe/proto$protoc --cpp_out=/home/xxx/caffe/include/caffe/ caffe.proto

错误3

stdc++

linker error:

/usr/bin/ld:caffe_cnn_handler.o: undefined reference to symbol'_ZNSs4_Rep10_M_destroyERKSaIcE@@GLIBCXX_3.4'

//usr/lib/x86_64-linux-gnu/libstdc++.so.6:error adding symbols: DSO missing from command line

解决方案:是找不到libstdc++.so.6,解决方法是在Makefile中加入:

LIBS +=-L/usr/lib/x86_64-linux-gnu -lstdc++

 

测试

1.下载官网提供的模型,解压放到/caffe/models/

比如:models_VGGNet_VOC0712_SSD_300x300.tar.gz,

解压出来的是models文件夹,把这个文件夹里面的VGGNet拷贝放到caffe/models/下

2.测试

源码包根目录下运行: 

python examples/ssd/score_ssd_pascal.py (数值在0.718左右)(老版本使用)

python examles/ssd/ssd_pascal_webcam.py

python examles/ssd/ssd_pascal_video.py

3.错误解析

错误1

提示:no module named caffe

在score_ssd_pascal.py/ssd_pascal_webcam.py/ssd_pascal_video.py等对应脚本中添加

import sys

sys.path.insert(0,'/home/xxx/caffe/python')

 

训练

1.制作自己的数据集(与faster rcnn类似)可参考我的另一篇博文:faster rcnn的安装、训练、调试

新建

(1)data/VOCdevkit/VOC2007新建 Annotations;ImageSets/Main;JPEGImages

Annotations:保存标签txt转换的xml文件

JPEGImages: 图片文件

ImageSets/Main:文件名列表(不含后缀)

训练集:     train.txt

训练验证集: trainval.txt

测试集:     test.txt

验证集:     val.txt

拷贝

将data/VOC0712下面的create_list.sh、create_data.sh、labelmap_voc.prototxt拷贝到data/VOCdevkit2007/VOC2007/

修改接口

**create_list.sh**:修改3

1.root_dir=$HOME/data/VOCdevkit/               

改写为 root_dir=$HOME/caffe/data/VOCdevkit/

2.for name inVOC2007 VOC2012                  

改写为 for name in VOC2007

3.$bash_dir/../../build/tools/get_image_size    

改写为 $HOME/caffe/build/tools/get_image_size

**create_data.sh**修改5

1.root_dir=$cur_dir/../..                                  

改写为 root_dir=$HOME/caffe

2.data_root_dir="$HOME/data/VOCdevkit"    

改写为 data_root_dir="$HOME/caffe/data/VOCdevkit"

3.dataset_name="VOC0712"                         

改写为 dataset_name="VOC2007"

4.mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt"

改写为 mapfile="$root_dir/data/VOCdevkit/$dataset_name/labelmap_voc.prototxt"

5.python $root_dir/scripts/create_annoset.py --anno-type=$anno_type--label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim--resize-width=$width --resize-height=$height --check-label $extra_cmd$data_root_dir $root_dir/data/$dataset_name/$subset.txt$data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$dbexamples/$dataset_name

改写为

python $root_dir/scripts/create_annoset.py --anno-type=$anno_type--label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim--resize-width=$width --resize-height=$height --check-label $extra_cmd$data_root_dir $root_dir/data/VOCdevkit/$dataset_name/$subset.txt$data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$dbexamples/$dataset_name

**labelmap_voc.prototxt**

需要注意是label需要小写,删除多余的label, 保留label=0的背景,以及自己数据的name和label

item {

  name: "none_of_the_above"

  label: 0

  display_name: "background"

}

item {

  name: "face"

  label: 1

  display_name: "face"

}

item {

  name: "pedestrian"

  label: 2

  display_name: "pedestrian"

}

2. 转换成 LMDB文件
到 caffe/examples 路径下新建VOC2007文件夹,用于创建LMDB文件软连接
然后到根目录下运行已经修改的sh文件

./data/VOCdevkit/VOC2007/create_list.sh
./data/VOCdevkit/VOC2007/create_data.sh

如果出现:    nomoudle named caffe/caffe-proto,
则在终端输入:exportPYTHONPATH=$PYTHONPATH:/home/**(服务器的名字)/caffe/python

如果依然不行,打开 ./scripts/creta_annosetpy

在import sys后添加以下代码:

import os.path asosp

defadd_path(path):

    if path not in sys.path:

        sys.path.insert(0,path)

caffe_path =osp.join('/home/****/caffe/python')

add_path(caffe_path)

3. 如果是直接使用他人已经制作好的LMDB 文件,则只需创建链接文件
到 ./scripts 创建 create_link.py 文件,并粘贴如下代码:

import argparse

import os

import shutil

import subprocess

import sys

from caffe.protoimport caffe_pb2

fromgoogle.protobuf import text_format

example_dir ='/home/li/caffe/examples/VOC2007'

out_dir ='/home/***/caffe/data/VOCdevkit/VOC2007/lmdb'

lmdb_name =['VOC2007_test_lmdb', 'VOC2007_trainval_lmdb']

# checkexample_dir is exist

if notos.path.exists(example_dir):

    os.makedirs(example_dir)

for lmdb_sub inlmdb_name:

    link_dir = os.path.join(example_dir,lmdb_sub)

    # check lin_dir is exist

    if os.path.exists(link_dir):

        os.unlink(link_dir)

    os.symlink(os.path.join(out_dir,lmdb_sub),link_dir)

4. 下载预训练模型
下载预训练模型VGG_ILSVRC_16_layers_fc_reduced.caffemodel,放在 ./models/VGGNet/路径下

5. 修改./examples/ssd/ssd_pascal.py脚本
需要修改的地方在对应行之后用######标注出来了
 

from __future__import print_function

import sys######

sys.path.insert(0,'/XXX/caffe/python')######添加路径SSD/caffe/python路径,防止找不到caffe

import caffe

fromcaffe.model_libs import *

fromgoogle.protobuf import text_format

import math

import os

import shutil

import stat

import subprocess

# Add extra layerson top of a "base" network (e.g. VGGNet or Inception).

defAddExtraLayers(net, use_batchnorm=True, lr_mult=1):

    use_relu = True

    # Add additional convolutional layers.

    # 19 x 19

    from_layer = net.keys()[-1]

    # TODO(weiliu89): Construct the name usingthe last layer to avoid duplication.

    # 10 x 10

    out_layer = "conv6_1"

    ConvBNLayer(net, from_layer, out_layer, use_batchnorm,use_relu, 256, 1, 0, 1,

        lr_mult=lr_mult)

    from_layer = out_layer

    out_layer = "conv6_2"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 512, 3, 1, 2,

        lr_mult=lr_mult)

    # 5 x 5

    from_layer = out_layer

    out_layer = "conv7_1"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,

      lr_mult=lr_mult)

    from_layer = out_layer

    out_layer = "conv7_2"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 1, 2,

      lr_mult=lr_mult)

    # 3 x 3

    from_layer = out_layer

    out_layer = "conv8_1"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,

      lr_mult=lr_mult)

    from_layer = out_layer

    out_layer = "conv8_2"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 0, 1,

      lr_mult=lr_mult)

    # 1 x 1

    from_layer = out_layer

    out_layer = "conv9_1"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,

      lr_mult=lr_mult)

    from_layer = out_layer

    out_layer = "conv9_2"

    ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 0, 1,

      lr_mult=lr_mult)

    return net

### Modify thefollowing parameters accordingly ###

# The directorywhich contains the caffe code.

# We assume youare running the script at the CAFFE_ROOT.

caffe_root =os.getcwd()

# Set true if youwant to start training right after generating all files.

run_soon = True

# Set true if youwant to load from most recently saved snapshot.

# Otherwise, wewill load from the pretrain_model defined below.

resume_training =True

# If true, Removeold model files.

remove_old_models= False

# The databasefile for training data. Created by data/VOC0712/create_data.sh

train_data ="examples/VOC2007/VOC2007_trainval_lmdb"######

# The databasefile for testing data. Created by data/VOC0712/create_data.sh

test_data ="examples/VOC2007/VOC2007_test_lmdb"######

# Specify thebatch sampler.

resize_width = 300######

resize_height =300######

resize ="{}x{}".format(resize_width, resize_height)

batch_sampler = [

        {

                'sampler': {

                        },

                'max_trials': 1,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'min_jaccard_overlap':0.1,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'min_jaccard_overlap':0.3,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'min_jaccard_overlap':0.5,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'min_jaccard_overlap':0.7,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'min_jaccard_overlap': 0.9,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        {

                'sampler': {

                        'min_scale': 0.3,

                        'max_scale': 1.0,

                        'min_aspect_ratio':0.5,

                        'max_aspect_ratio':2.0,

                        },

                'sample_constraint': {

                        'max_jaccard_overlap':1.0,

                        },

                'max_trials': 50,

                'max_sample': 1,

        },

        ]

train_transform_param= {

        'mirror': True,

        'mean_value': [104, 117, 123],

        'resize_param': {

                'prob': 1,

                'resize_mode': P.Resize.WARP,

                'height': resize_height,

                'width': resize_width,

                'interp_mode': [

                        P.Resize.LINEAR,

                        P.Resize.AREA,

                        P.Resize.NEAREST,

                        P.Resize.CUBIC,

                        P.Resize.LANCZOS4,

                        ],

                },

        'distort_param': {

                'brightness_prob': 0.5,

                'brightness_delta': 32,

                'contrast_prob': 0.5,

                'contrast_lower': 0.5,

                'contrast_upper': 1.5,

                'hue_prob': 0.5,

                'hue_delta': 18,

                'saturation_prob': 0.5,

                'saturation_lower': 0.5,

                'saturation_upper': 1.5,

                'random_order_prob': 0.0,

                },

        'expand_param': {

                'prob': 0.5,

                'max_expand_ratio': 4.0,

                },

        'emit_constraint': {

            'emit_type':caffe_pb2.EmitConstraint.CENTER,

            }

        }

test_transform_param= {

        'mean_value': [104, 117, 123],

        'resize_param': {

                'prob': 1,

                'resize_mode': P.Resize.WARP,

                'height': resize_height,

                'width': resize_width,

                'interp_mode':[P.Resize.LINEAR],

                },

        }

# If true, usebatch norm for all newly added layers.

# Currently onlythe non batch norm version has been tested.

use_batchnorm =False

lr_mult = 1

# Use differentinitial learning rate.

if use_batchnorm:

    base_lr = 0.0004

else:

    # A learning rate for batch_size = 1,num_gpus = 1.

    base_lr = 0.000004######

# Modify the jobname if you want.

job_name ="SSD_{}".format(resize)

# The name of themodel. Modify it if you want.

model_name ="VGG_VOC2007_{}".format(job_name)######

# Directory whichstores the model .prototxt file.

save_dir ="models/VGGNet/VOC2007/{}".format(job_name)######

# Directory whichstores the snapshot of models.

snapshot_dir ="models/VGGNet/VOC2007/{}".format(job_name)######

# Directory whichstores the job script and log file.

job_dir ="jobs/VGGNet/VOC2007/{}".format(job_name)######

# Directory whichstores the detection results.

output_result_dir= "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ['HOME'],job_name)######

# model definitionfiles.

train_net_file ="{}/train.prototxt".format(save_dir)

test_net_file ="{}/test.prototxt".format(save_dir)

deploy_net_file ="{}/deploy.prototxt".format(save_dir)

solver_file ="{}/solver.prototxt".format(save_dir)

# snapshot prefix.

snapshot_prefix ="{}/{}".format(snapshot_dir, model_name)

# job script path.

job_file ="{}/{}.sh".format(job_dir, model_name)

# Stores the testimage names and sizes. Created by data/VOC0712/create_list.sh

name_size_file ="data/VOCdevkit/VOC2007/test_name_size.txt"######

# The pretrainedmodel. We use the Fully convolutional reduced (atrous) VGGNet.

pretrain_model ="models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"######

# StoresLabelMapItem.

label_map_file ="data/VOCdevkit/VOC2007/labelmap_voc.prototxt"######

# MultiBoxLossparameters.

num_classes = 2######

share_location =True

background_label_id=0

train_on_diff_gt =True

normalization_mode= P.Loss.VALID

code_type =P.PriorBox.CENTER_SIZE

ignore_cross_boundary_bbox= False

mining_type =P.MultiBoxLoss.MAX_NEGATIVE

neg_pos_ratio = 3.

loc_weight =(neg_pos_ratio + 1.) / 4.

multibox_loss_param= {

    'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,

    'conf_loss_type': P.MultiBoxLoss.SOFTMAX,

    'loc_weight': loc_weight,

    'num_classes': num_classes,

    'share_location': share_location,

    'match_type':P.MultiBoxLoss.PER_PREDICTION,

    'overlap_threshold': 0.5,

    'use_prior_for_matching': True,

    'background_label_id': background_label_id,

    'use_difficult_gt': train_on_diff_gt,

    'mining_type': mining_type,

    'neg_pos_ratio': neg_pos_ratio,

    'neg_overlap': 0.5,

    'code_type': code_type,

    'ignore_cross_boundary_bbox':ignore_cross_boundary_bbox,

    }

loss_param = {

    'normalization': normalization_mode,

    }

# parameters forgenerating priors.

# minimumdimension of input image

min_dim = 300

# conv4_3 ==>38 x 38

# fc7 ==> 19 x19

# conv6_2 ==>10 x 10

# conv7_2 ==> 5x 5

# conv8_2 ==> 3x 3

# conv9_2 ==> 1x 1

mbox_source_layers= ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']

# in percent %

min_ratio = 20

max_ratio = 90

step =int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))

min_sizes = []

max_sizes = []

for ratio inxrange(min_ratio, max_ratio + 1, step):

  min_sizes.append(min_dim * ratio / 100.)

  max_sizes.append(min_dim * (ratio + step) /100.)

min_sizes =[min_dim * 10 / 100.] + min_sizes

max_sizes =[min_dim * 20 / 100.] + max_sizes

steps = [8, 16,32, 64, 100, 300]

aspect_ratios =[[2], [2, 3], [2, 3], [2, 3], [2], [2]]

# L2 normalizeconv4_3.

normalizations =[20, -1, -1, -1, -1, -1]

# variance used toencode/decode prior bboxes.

if code_type ==P.PriorBox.CENTER_SIZE:

  prior_variance = [0.1, 0.1, 0.2, 0.2]

else:

  prior_variance = [0.1]

flip = True

clip = False

# Solverparameters.

# Defining whichGPUs to use.

gpus ="0"######

gpulist =gpus.split(",")

num_gpus =len(gpulist)

# Divide themini-batch to different GPUs.

batch_size = 32######

accum_batch_size =32######

iter_size =accum_batch_size / batch_size

solver_mode =P.Solver.CPU

device_id = 0

batch_size_per_device= batch_size

if num_gpus >0:

  batch_size_per_device =int(math.ceil(float(batch_size) / num_gpus))

  iter_size =int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))

  solver_mode = P.Solver.GPU

  device_id = int(gpulist[0])

ifnormalization_mode == P.Loss.NONE:

  base_lr /= batch_size_per_device

elifnormalization_mode == P.Loss.VALID:

  base_lr *= 25. / loc_weight

elifnormalization_mode == P.Loss.FULL:

  # Roughly there are 2000 prior bboxes perimage.

  # TODO(weiliu89): Estimate the exact # ofpriors.

  base_lr *= 2000.

# Evaluate onwhole test set.

num_test_image =15439######

test_batch_size =8######

test_iter =num_test_image / test_batch_size

solver_param = {

    # Train parameters

    'base_lr': base_lr,

    'weight_decay': 0.0005,

    'lr_policy': "multistep",

    'stepvalue': [80000, 100000, 120000],

    'gamma': 0.1,

    'momentum': 0.9,

    'iter_size': iter_size,

    'max_iter': 120000,

    'snapshot': 80000,

    'display': 10,

    'average_loss': 10,

    'type': "SGD",

    'solver_mode': solver_mode,

    'device_id': device_id,

    'debug_info': False,

    'snapshot_after_train': True,

    # Test parameters

    'test_iter': [test_iter],

    'test_interval': 10000,

    'eval_type': "detection",

    'ap_version': "11point",

    'test_initialization': False,

    }

# parameters forgenerating detection output.

det_out_param = {

    'num_classes': num_classes,

    'share_location': share_location,

    'background_label_id': background_label_id,

    'nms_param': {'nms_threshold': 0.45,'top_k': 400},

    'save_output_param': {

        'output_directory': output_result_dir,

        'output_name_prefix':"comp4_det_test_",

        'output_format': "VOC",

        'label_map_file': label_map_file,

        'name_size_file': name_size_file,

        'num_test_image': num_test_image,

        },

    'keep_top_k': 200,

    'confidence_threshold': 0.01,

    'code_type': code_type,

    }

# parameters forevaluating detection results.

det_eval_param = {

    'num_classes': num_classes,

    'background_label_id': background_label_id,

    'overlap_threshold': 0.5,

    'evaluate_difficult_gt': False,

    'name_size_file': name_size_file,

    }

### Hopefully youdon't need to change the following ###

# Check file.

check_if_exist(train_data)

check_if_exist(test_data)

check_if_exist(label_map_file)

check_if_exist(pretrain_model)

make_if_not_exist(save_dir)

make_if_not_exist(job_dir)

make_if_not_exist(snapshot_dir)

# Create trainnet.

net =caffe.NetSpec()

net.data,net.label = CreateAnnotatedDataLayer(train_data,batch_size=batch_size_per_device,

        train=True, output_label=True,label_map_file=label_map_file,

        transform_param=train_transform_param,batch_sampler=batch_sampler)

VGGNetBody(net,from_layer='data', fully_conv=True, reduced=True, dilated=True,

    dropout=False)

AddExtraLayers(net,use_batchnorm, lr_mult=lr_mult)

mbox_layers =CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,

        use_batchnorm=use_batchnorm,min_sizes=min_sizes, max_sizes=max_sizes,

        aspect_ratios=aspect_ratios,steps=steps, normalizations=normalizations,

        num_classes=num_classes,share_location=share_location, flip=flip, clip=clip,

        prior_variance=prior_variance,kernel_size=3, pad=1, lr_mult=lr_mult)

# Create theMultiBoxLossLayer.

name ="mbox_loss"

mbox_layers.append(net.label)

net[name] =L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,

        loss_param=loss_param,include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),

        propagate_down=[True, True, False,False])

withopen(train_net_file, 'w') as f:

    print('name:"{}_train"'.format(model_name), file=f)

    print(net.to_proto(), file=f)

shutil.copy(train_net_file,job_dir)

# Create test net.

net =caffe.NetSpec()

net.data,net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,

        train=False, output_label=True,label_map_file=label_map_file,

        transform_param=test_transform_param)

VGGNetBody(net,from_layer='data', fully_conv=True, reduced=True, dilated=True,

    dropout=False)

AddExtraLayers(net,use_batchnorm, lr_mult=lr_mult)

mbox_layers =CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,

        use_batchnorm=use_batchnorm,min_sizes=min_sizes, max_sizes=max_sizes,

        aspect_ratios=aspect_ratios,steps=steps, normalizations=normalizations,

        num_classes=num_classes,share_location=share_location, flip=flip, clip=clip,

        prior_variance=prior_variance,kernel_size=3, pad=1, lr_mult=lr_mult)

conf_name ="mbox_conf"

ifmultibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:

  reshape_name ="{}_reshape".format(conf_name)

  net[reshape_name] = L.Reshape(net[conf_name],shape=dict(dim=[0, -1, num_classes]))

  softmax_name ="{}_softmax".format(conf_name)

  net[softmax_name] =L.Softmax(net[reshape_name], axis=2)

  flatten_name ="{}_flatten".format(conf_name)

  net[flatten_name] =L.Flatten(net[softmax_name], axis=1)

  mbox_layers[1] = net[flatten_name]

elifmultibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:

  sigmoid_name ="{}_sigmoid".format(conf_name)

  net[sigmoid_name] = L.Sigmoid(net[conf_name])

  mbox_layers[1] = net[sigmoid_name]

net.detection_out= L.DetectionOutput(*mbox_layers,

    detection_output_param=det_out_param,

   include=dict(phase=caffe_pb2.Phase.Value('TEST')))

net.detection_eval= L.DetectionEvaluate(net.detection_out, net.label,

    detection_evaluate_param=det_eval_param,

   include=dict(phase=caffe_pb2.Phase.Value('TEST')))

withopen(test_net_file, 'w') as f:

    print('name:"{}_test"'.format(model_name), file=f)

    print(net.to_proto(), file=f)

shutil.copy(test_net_file,job_dir)

# Create deploynet.

# Remove the firstand last layer from test net.

deploy_net = net

withopen(deploy_net_file, 'w') as f:

    net_param = deploy_net.to_proto()

    # Remove the first (AnnotatedData) and last(DetectionEvaluate) layer from test net.

    del net_param.layer[0]

    del net_param.layer[-1]

    net_param.name ='{}_deploy'.format(model_name)

    net_param.input.extend(['data'])

    net_param.input_shape.extend([

        caffe_pb2.BlobShape(dim=[1, 3,resize_height, resize_width])])

    print(net_param, file=f)

shutil.copy(deploy_net_file,job_dir)

# Create solver.

solver =caffe_pb2.SolverParameter(

        train_net=train_net_file,

        test_net=[test_net_file],

        snapshot_prefix=snapshot_prefix,

        **solver_param)

withopen(solver_file, 'w') as f:

    print(solver, file=f)

shutil.copy(solver_file,job_dir)

max_iter = 0

# Find most recentsnapshot.

for file inos.listdir(snapshot_dir):

  if file.endswith(".solverstate"):

    basename = os.path.splitext(file)[0]

    iter =int(basename.split("{}_iter_".format(model_name))[1])

    if iter > max_iter:

      max_iter = iter

train_src_param ='--weights="{}" \\\n'.format(pretrain_model)

ifresume_training:

  if max_iter > 0:

    train_src_param ='--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix,max_iter)

ifremove_old_models:

  # Remove any snapshots smaller than max_iter.

  for file in os.listdir(snapshot_dir):

    if file.endswith(".solverstate"):

      basename = os.path.splitext(file)[0]

      iter =int(basename.split("{}_iter_".format(model_name))[1])

      if max_iter > iter:

       os.remove("{}/{}".format(snapshot_dir, file))

    if file.endswith(".caffemodel"):

      basename = os.path.splitext(file)[0]

      iter =int(basename.split("{}_iter_".format(model_name))[1])

      if max_iter > iter:

       os.remove("{}/{}".format(snapshot_dir, file))

# Create job file.

withopen(job_file, 'w') as f:

  f.write('cd {}\n'.format(caffe_root))

  f.write('./build/tools/caffe train \\\n')

  f.write('--solver="{}"\\\n'.format(solver_file))

  f.write(train_src_param)

  if solver_param['solver_mode'] ==P.Solver.GPU:

    f.write('--gpu {} 2>&1 | tee{}/{}.log\n'.format(gpus, job_dir, model_name))

  else:

    f.write('2>&1 | tee{}/{}.log\n'.format(job_dir, model_name))

# Copy the pythonscript to job_dir.

py_file =os.path.abspath(__file__)

shutil.copy(py_file,job_dir)

# Run the job.

os.chmod(job_file,stat.S_IRWXU)

if run_soon:

  subprocess.call(job_file, shell=True)

train\test\deploy\solver.prototxt等都是运行这个脚本自动生成的。

gpus='0,1,2,3',如果有一块GPU,则删除123,有两块则删除23

如果没有GPU,需要注释以下几行,程序会以cpu形式训练:(这个是解决 cudasucess(10vs0)的方法)

#ifnum_gpus >0:

#batch_size_per_device=int(math.ceil(float(batch_size)/num_gpus))

#iter_size = int(math.ceil(float(accum_batch_size)/(batch_size_per_device*num_gpus)))

#solver_model=P.Solver.GPU

#device_id=int(gpulist[0])

6. 修改 ./examples/ssd/ssd_pascal_webcam.py脚本

对应修改就行了

7. 训练

在根目录下运行

python ./examples/ssd/ssd_pascal.py 2>&1 | tee ssd_train_log.txt

如果出现 cudasucess(2vs0):说明显卡的计算能力有限,需要更改 caffe/examples/sdd/ssd_pascal.py 中的batch_size. 默认的32变小成16、8、4。

8. 测试单张图片,并显示框的坐标信息

# coding: utf-8
# Note: this file is expected to be in {caffe_root}/examples
# ### 1. Setup
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import pylab


plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'


caffe_root = '../'
import os
os.chdir(caffe_root)
import sys
sys.path.insert(0, '/home/lilai/LL/caffe/python')
import caffe
from google.protobuf import text_format
from caffe.proto import caffe_pb2


caffe.set_device(0)
caffe.set_mode_gpu()
labelmap_file = '/home/lilai/LL/caffe/data/VOC0712/labelmap_voc.prototxt'
file = open(labelmap_file, 'r')
labelmap = caffe_pb2.LabelMap()
text_format.Merge(str(file.read()), labelmap)


def get_labelname(labelmap, labels):
    num_labels = len(labelmap.item)
    labelnames = []
    if type(labels) is not list:
        labels = [labels]
    for label in labels:
        found = False
        for i in xrange(0, num_labels):
            if label == labelmap.item[i].label:
                found = True
                labelnames.append(labelmap.item[i].display_name)
                break
        assert found == True
    return labelnames


model_def = '/home/lilai/LL/caffe/models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt'
model_weights = '/home/lilai/LL/caffe/models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel'

net = caffe.Net(model_def, model_weights, caffe.TEST)

# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_mean('data', np.array([104, 117, 123]))  # mean pixel
transformer.set_raw_scale('data', 255)  # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2, 1, 0))  # the reference model has channels in BGR order instead of RGB

# ### 2. SSD detection

# Load an image.

image_resize = 300
net.blobs['data'].reshape(1, 3, image_resize, image_resize)

image = caffe.io.load_image('/home/lilai/LL/caffe/examples/images/fish-bike.jpg')
plt.imshow(image)

# Run the net and examine the top_k results

transformed_image = transformer.preprocess('data', image)
net.blobs['data'].data[...] = transformed_image

# Forward pass.
detections = net.forward()['detection_out']

# Parse the outputs.
det_label = detections[0, 0, :, 1]
det_conf = detections[0, 0, :, 2]
det_xmin = detections[0, 0, :, 3]
det_ymin = detections[0, 0, :, 4]
det_xmax = detections[0, 0, :, 5]
det_ymax = detections[0, 0, :, 6]

# Get detections with confidence higher than 0.6.
top_indices = [i for i, conf in enumerate(det_conf) if conf >= 0.6]

top_conf = det_conf[top_indices]
top_label_indices = det_label[top_indices].tolist()
top_labels = get_labelname(labelmap, top_label_indices)
top_xmin = det_xmin[top_indices]
top_ymin = det_ymin[top_indices]
top_xmax = det_xmax[top_indices]
top_ymax = det_ymax[top_indices]

# Plot the boxes

colors = plt.cm.hsv(np.linspace(0, 1, 21)).tolist()

currentAxis = plt.gca()

for i in xrange(top_conf.shape[0]):
    # bbox value
    xmin = int(round(top_xmin[i] * image.shape[1]))
    ymin = int(round(top_ymin[i] * image.shape[0]))
    xmax = int(round(top_xmax[i] * image.shape[1]))
    ymax = int(round(top_ymax[i] * image.shape[0]))
    # score
    score = top_conf[i]
    # label
    label = int(top_label_indices[i])
    label_name = top_labels[i]
    # display info: label score xmin ymin xmax ymax
    display_txt = '%s: %.2f %d %d %d %d' % (label_name, score,xmin, ymin, xmax, ymax)
    # display_bbox_value = '%d %d %d %d' % (xmin, ymin, xmax, ymax)
    coords = (xmin, ymin), xmax - xmin + 1, ymax - ymin + 1
    color = colors[label]
    currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor=color, linewidth=2))
    currentAxis.text(xmin, ymin, display_txt, bbox={'facecolor': color, 'alpha': 0.5})
    # currentAxis.text((xmin+xmax)/2, (ymin+ymax)/2, display_bbox_value, bbox={'facecolor': color, 'alpha': 0.5})
plt.imshow(image)
pylab.show()

9. 关于aspect_ratios问题

SSD算法中aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]],这句话具体是什么意思

[2, 3] means using default box of aspect ratio of 2 and 3. And since we set flip=True at here, it will also use default box of aspect ratio of 1/2 and 1/3. 

举例说明:[2]表示ar = {1,2,1/2};[2,3]表示ar = {1,2,3,1/2,1/3}。当等于1的时候会在增加一个默认框。

aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]总计有6个元素,每一个元素对应相应的feature map

第一个元素表示第一个feature map上每个像素点上有多少个box:【2】:表明ar = {1,2,1/2}。等于1的时候会再增加一个(论文中有说明)

第二个元素同理:【2,3】:表明ar={1,2,3,1/2,1/3}.等于1的时候会再增加一个(论文中有说明)

不明白的直接看看prior_box_layer.cpp代码。里面有具体操作。一看就懂。

猜你喜欢

转载自blog.csdn.net/lilai619/article/details/53791420