ssd网络详解之detection out layer

本文原创，转载请引用https://blog.csdn.net/dan_teng/article/details/81561783

闲话少说，本文基本结构：首先介绍detection out 这一层的基本理解，之后给出ssd所有代码的详细注释，最后给出caffe中该层各个参数的定义和默认值。
这里写图片描述
detection out layer是ssd网络最后一层，用于整合预选框、预选框偏移以及得分三项结果，最终输出满足条件的目标检测框、目标的label和得分。
输入方面，mbox_priorbox是网络各个priorbox层输出concat后的结果（priorbox解析点这里），相当于把所有预选框放到一起；mbox_loc是在预选框的基础上的偏移量；mbox_conf_flatten就是每个类别在各个框上的得分。
输出大小为[1, 1, x, 7]，其中x是最后保留的框的个数，最后一维存放的数据为：
[image_id, label, confidence, xmin, ymin, xmax, ymax]

计算思路：
1）对bottom层的location、confidence和priorbox进行解析，放到vector中
2）对每个priorbox进行解码。所谓解码其实就是整合输入层。前面说到过了，输出需要给出每个目标的检测框，但是输入是预选框和偏移量，这里要做的就是计算出最终的检测框。解码需要考虑priorbox编码方式，共三种情况。

假设检测框用b表示(存储内容：b_xmin, b_ymin, b_xmax, b_ymax)，预选框用p表示(存储内容：p_xmin, p_ymin, p_xmax, p_ymax)，偏移量用t表示(存储内容：t_x, t_y, t_height, t_width)。
b和p的宽高分别用x和y的最大最小值减一下得到，中心点的值用最大最小值相加除以2得到。
那么在每种类型中，编码公式分别为：

CodeType_CORNER：

                                        t = b - p（每个维度一样）

      
      
       
       1
       
       2

CodeType_CENTER_SIZE：

                                        t_x = (b_center_x – p_center_x) / p_width （t_y同理）
                                        t_height = log(b_height / p_height) （t_width同理）

      
      
       
       1
       
       2
       
       3

CodeType_CORNER_SIZE：

                                        t_x = (b_x – p_x) / p_width
                                        t_y = (b_y – p_y) / p_height

      
      
       
       1
       
       2
       
       3

解码时求取b的各个值就可以。如果需要添加variance的值，将t与variance相乘即可。
以center_size编码方式为例：

                                     b_center_x = t_x * p_width + p_center_x
                                     b_center_y = t_y * p_height + p_center_y
                                     b_width = exp(t_x) * p_width
                                     b_height = exp(t_y) * p_height

    
    
     
     1
     
     2
     
     3
     
     4
     
     5

如果需要添加variance：

                     b_center_x = t_x *prior_variance[0]* p_width + p_center_x
                     b_center_y = t_y *prior_variance[1] * p_height + p_center_y
                     b_width = exp(prior_variance[2] * t_x) * p_width
                     b_height = exp(prior_variance[3] * t_y) * p_height

    
    
     
     1
     
     2
     
     3
     
     4
     
     5

据此分别计算出b_xmin, b_ymin, b_xmax, b_ymax即可。详细可参见代码

3） Non-Maximum Suppression非极大值抑制
检测算法给出的box往往有很多，如下图所示，多个检测框其实框出的是一个目标，nms就是一个目标保留一个最优框。抑制的过程是一个迭代-遍历-消除的过程。
这里写图片描述
（图片来源：https://blog.csdn.net/shuzfan/article/details/52711706）

给定处理前的集合：预选结合，处理后的集合keep集合

首先，将预选集合所有框按照得分高低进行排序，选中得分最高的框，从预选集合移出放到keep集合中；

接下来进行迭代：
*从当前预选集合移出得分最高的框，用它与keep集合每个框计算交并比：
*超过阈值说明二者重复很多，框住的应该是同一个东西，不放到keep集合中；
*如果与keep集合中每个框交并比都小于阈值，说明当前框框住的是一个新目标，应该放到keep中。

迭代下去，直到预选集合为空，那么keep集合中留下的就是检出的所有目标的检测框。

jaccard overlap
这里补充介绍一下ssd网络中的jaccard overlap。
jaccard overlap其实就是交并比，简单说起来就是两个检测框重合的面积（相交的部分）除以两个检测框并在一起的面积（面积之和减去重合部分），用公式表示为

J(A,B)=|A&#x2229;B||A&#x222A;B|” role=”presentation” style=”text-align: center; position: relative;”> J (A, B) = | A \cap B | | A \cup B |

J为0说明两个框一点没有重合，为1说明完全重合

4）按照输出大小要求输出结果

代码详解：

注意：这里给出了detection_output_layer.cpp中的代码，但是代码里用到了一些函数，这些函数放在了
ssd/src/caffe/util/bbox_util.cpp中

#include <algorithm>
#include <fstream>  // NOLINT(readability/streams)
#include <map>
#include <string>
#include <utility>
#include <vector>

#include "boost/filesystem.hpp"
#include "boost/foreach.hpp"

#include "caffe/layers/detection_output_layer.hpp"

namespace caffe {

template <typename Dtype>
void DetectionOutputLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const DetectionOutputParameter& detection_output_param =
      this->layer_param_.detection_output_param();
  CHECK(detection_output_param.has_num_classes()) << "Must specify num_classes";
  num_classes_ = detection_output_param.num_classes();// 类别数量
  share_location_ = detection_output_param.share_location();
  num_loc_classes_ = share_location_ ? 1 : num_classes_;
  background_label_id_ = detection_output_param.background_label_id();
  code_type_ = detection_output_param.code_type();// 编码类型
  variance_encoded_in_target_ =
      detection_output_param.variance_encoded_in_target();
  keep_top_k_ = detection_output_param.keep_top_k(); // 保留框最大数量
  confidence_threshold_ = detection_output_param.has_confidence_threshold() ?
      detection_output_param.confidence_threshold() : -FLT_MAX;// 得分阈值
  // Parameters used in nms.
  nms_threshold_ = detection_output_param.nms_param().nms_threshold();
  CHECK_GE(nms_threshold_, 0.) << "nms_threshold must be non negative.";
  eta_ = detection_output_param.nms_param().eta();
  CHECK_GT(eta_, 0.);
  CHECK_LE(eta_, 1.);
  top_k_ = -1;
  if (detection_output_param.nms_param().has_top_k()) {
    top_k_ = detection_output_param.nms_param().top_k();
  }
  const SaveOutputParameter& save_output_param =
      detection_output_param.save_output_param();
  output_directory_ = save_output_param.output_directory();
  if (!output_directory_.empty()) {
    if (boost::filesystem::is_directory(output_directory_)) {
      boost::filesystem::remove_all(output_directory_);
    }
    if (!boost::filesystem::create_directories(output_directory_)) {
        LOG(WARNING) << "Failed to create directory: " << output_directory_;
    }
  }
  output_name_prefix_ = save_output_param.output_name_prefix();
  need_save_ = output_directory_ == "" ? false : true;
  output_format_ = save_output_param.output_format();
  if (save_output_param.has_label_map_file()) {
    string label_map_file = save_output_param.label_map_file();
    if (label_map_file.empty()) {
      // Ignore saving if there is no label_map_file provided.
      LOG(WARNING) << "Provide label_map_file if output results to files.";
      need_save_ = false;
    } else {
      LabelMap label_map;
      CHECK(ReadProtoFromTextFile(label_map_file, &label_map))
          << "Failed to read label map file: " << label_map_file;
      CHECK(MapLabelToName(label_map, true, &label_to_name_))
          << "Failed to convert label to name.";
      CHECK(MapLabelToDisplayName(label_map, true, &label_to_display_name_))
          << "Failed to convert label to display name.";
    }
  } else {
    need_save_ = false;
  }
  if (save_output_param.has_name_size_file()) {
    string name_size_file = save_output_param.name_size_file();
    if (name_size_file.empty()) {
      // Ignore saving if there is no name_size_file provided.
      LOG(WARNING) << "Provide name_size_file if output results to files.";
      need_save_ = false;
    } else {
      std::ifstream infile(name_size_file.c_str());
      CHECK(infile.good())
          << "Failed to open name size file: " << name_size_file;
      // The file is in the following format:
      //    name height width
      //    ...
      string name;
      int height, width;
      while (infile >> name >> height >> width) {
        names_.push_back(name);
        sizes_.push_back(std::make_pair(height, width));
      }
      infile.close();
      if (save_output_param.has_num_test_image()) {
        num_test_image_ = save_output_param.num_test_image();
      } else {
        num_test_image_ = names_.size();
      }
      CHECK_LE(num_test_image_, names_.size());
    }
  } else {
    need_save_ = false;
  }
  has_resize_ = save_output_param.has_resize_param();
  if (has_resize_) {
    resize_param_ = save_output_param.resize_param();
  }
  name_count_ = 0;
  visualize_ = detection_output_param.visualize();
  if (visualize_) {
    visualize_threshold_ = 0.6;
    if (detection_output_param.has_visualize_threshold()) {
      visualize_threshold_ = detection_output_param.visualize_threshold();
    }
    data_transformer_.reset(
        new DataTransformer<Dtype>(this->layer_param_.transform_param(),
                                   this->phase_));
    data_transformer_->InitRand();
    save_file_ = detection_output_param.save_file();
  }
  bbox_preds_.ReshapeLike(*(bottom[0]));
  if (!share_location_) {
    bbox_permute_.ReshapeLike(*(bottom[0]));
  }
  conf_permute_.ReshapeLike(*(bottom[1]));
}
// 输出大小为[1, 1, x, 7]
// 最后一维7指的是：[image_id, label, confidence, xmin, ymin, xmax, ymax]
template <typename Dtype>
void DetectionOutputLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  if (need_save_) {
    CHECK_LE(name_count_, names_.size());
    if (name_count_ % num_test_image_ == 0) {
      // Clean all outputs.
      if (output_format_ == "VOC") {
        boost::filesystem::path output_directory(output_directory_);
        for (map<int, string>::iterator it = label_to_name_.begin();
             it != label_to_name_.end(); ++it) {
          if (it->first == background_label_id_) {
            continue;
          }
          std::ofstream outfile;
          boost::filesystem::path file(
              output_name_prefix_ + it->second + ".txt");
          boost::filesystem::path out_file = output_directory / file;
          outfile.open(out_file.string().c_str(), std::ofstream::out);
        }
      }
    }
  }
  CHECK_EQ(bottom[0]->num(), bottom[1]->num());
  if (bbox_preds_.num() != bottom[0]->num() ||
      bbox_preds_.count(1) != bottom[0]->count(1)) {
    bbox_preds_.ReshapeLike(*(bottom[0]));
  }
  if (!share_location_ && (bbox_permute_.num() != bottom[0]->num() ||
      bbox_permute_.count(1) != bottom[0]->count(1))) {
    bbox_permute_.ReshapeLike(*(bottom[0]));
  }
  if (conf_permute_.num() != bottom[1]->num() ||
      conf_permute_.count(1) != bottom[1]->count(1)) {
    conf_permute_.ReshapeLike(*(bottom[1]));
  }
  num_priors_ = bottom[2]->height() / 4;
  CHECK_EQ(num_priors_ * num_loc_classes_ * 4, bottom[0]->channels())
      << "Number of priors must match number of location predictions.";
  CHECK_EQ(num_priors_ * num_classes_, bottom[1]->channels())
      << "Number of priors must match number of confidence predictions.";
  // num() and channels() are 1.
  vector<int> top_shape(2, 1);
  // Since the number of bboxes to be kept is unknown before nms, we manually
  // set it to (fake) 1.
  top_shape.push_back(1);
  // Each row is a 7 dimension vector, which stores
  // [image_id, label, confidence, xmin, ymin, xmax, ymax]
  top_shape.push_back(7);
  top[0]->Reshape(top_shape);
}

template <typename Dtype>
void DetectionOutputLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  const Dtype* loc_data = bottom[0]->cpu_data();
  const Dtype* conf_data = bottom[1]->cpu_data();
  const Dtype* prior_data = bottom[2]->cpu_data();
  const int num = bottom[0]->num();

  // Retrieve all location predictions.
  vector<LabelBBox> all_loc_preds;
  // 处理偏移量数据
  GetLocPredictions(loc_data, num, num_priors_, num_loc_classes_,
                    share_location_, &all_loc_preds);

  // Retrieve all confidences.
  vector<map<int, vector<float> > > all_conf_scores;
  // 处理得分数据
  GetConfidenceScores(conf_data, num, num_priors_, num_classes_,
                      &all_conf_scores);

  // Retrieve all prior bboxes. It is same within a batch since we assume all
  // images in a batch are of same dimension.
  vector<NormalizedBBox> prior_bboxes;
  vector<vector<float> > prior_variances;
  // 处理预选框数据
  GetPriorBBoxes(prior_data, num_priors_, &prior_bboxes, &prior_variances);

  // Decode all loc predictions to bboxes.
  vector<LabelBBox> all_decode_bboxes;
  const bool clip_bbox = false;
  // 解码
  DecodeBBoxesAll(all_loc_preds, prior_bboxes, prior_variances, num,
                  share_location_, num_loc_classes_, background_label_id_,
                  code_type_, variance_encoded_in_target_, clip_bbox,
                  &all_decode_bboxes);

  int num_kept = 0;
  vector<map<int, vector<int> > > all_indices;
  for (int i = 0; i < num; ++i) {
    const LabelBBox& decode_bboxes = all_decode_bboxes[i];
    const map<int, vector<float> >& conf_scores = all_conf_scores[i];
    map<int, vector<int> > indices;
    int num_det = 0;
    for (int c = 0; c < num_classes_; ++c) {
      if (c == background_label_id_) {
        // Ignore background class.
        continue;
      }
      if (conf_scores.find(c) == conf_scores.end()) {
        // Something bad happened if there are no predictions for current label.
        LOG(FATAL) << "Could not find confidence predictions for label " << c;
      }
      const vector<float>& scores = conf_scores.find(c)->second;
      int label = share_location_ ? -1 : c;
      if (decode_bboxes.find(label) == decode_bboxes.end()) {
        // Something bad happened if there are no predictions for current label.
        LOG(FATAL) << "Could not find location predictions for label " << label;
        continue;
      }
      const vector<NormalizedBBox>& bboxes = decode_bboxes.find(label)->second;
      // 非极大值抑制
      ApplyNMSFast(bboxes, scores, confidence_threshold_, nms_threshold_, eta_,
          top_k_, &(indices[c]));
      num_det += indices[c].size();
    }
    // 处理后有效数据量大于最后输出量，那就取得分最高的前keep_top_k个检测框
    if (keep_top_k_ > -1 && num_det > keep_top_k_) {
      vector<pair<float, pair<int, int> > > score_index_pairs;
      for (map<int, vector<int> >::iterator it = indices.begin();
           it != indices.end(); ++it) {
        int label = it->first;
        const vector<int>& label_indices = it->second;
        if (conf_scores.find(label) == conf_scores.end()) {
          // Something bad happened for current label.
          LOG(FATAL) << "Could not find location predictions for " << label;
          continue;
        }
        const vector<float>& scores = conf_scores.find(label)->second;
        for (int j = 0; j < label_indices.size(); ++j) {
          int idx = label_indices[j];
          CHECK_LT(idx, scores.size());
          score_index_pairs.push_back(std::make_pair(
                  scores[idx], std::make_pair(label, idx)));
        }
      }
      // Keep top k results per image.
      std::sort(score_index_pairs.begin(), score_index_pairs.end(),
                SortScorePairDescend<pair<int, int> >);
      score_index_pairs.resize(keep_top_k_);
      // Store the new indices.
      map<int, vector<int> > new_indices;
      for (int j = 0; j < score_index_pairs.size(); ++j) {
        int label = score_index_pairs[j].second.first;
        int idx = score_index_pairs[j].second.second;
        new_indices[label].push_back(idx);
      }
      all_indices.push_back(new_indices);
      num_kept += keep_top_k_;
    } else {
      all_indices.push_back(indices);
      num_kept += num_det;
    }
  }

  vector<int> top_shape(2, 1);
  top_shape.push_back(num_kept);
  top_shape.push_back(7);
  Dtype* top_data;
  // 没有检测到目标
  if (num_kept == 0) {
    LOG(INFO) << "Couldn't find any detections";
    top_shape[2] = num;
    top[0]->Reshape(top_shape);
    top_data = top[0]->mutable_cpu_data();
    caffe_set<Dtype>(top[0]->count(), -1, top_data);
    // Generate fake results per image.
    for (int i = 0; i < num; ++i) {
      top_data[0] = i;
      top_data += 7;
    }
  } else {// 检测到目标
    top[0]->Reshape(top_shape);
    top_data = top[0]->mutable_cpu_data();
  }
  // 检测到目标的处理
  int count = 0;
  boost::filesystem::path output_directory(output_directory_);
  for (int i = 0; i < num; ++i) {
    const map<int, vector<float> >& conf_scores = all_conf_scores[i];
    const LabelBBox& decode_bboxes = all_decode_bboxes[i];
    for (map<int, vector<int> >::iterator it = all_indices[i].begin();
         it != all_indices[i].end(); ++it) {
      int label = it->first;
      if (conf_scores.find(label) == conf_scores.end()) {
        // Something bad happened if there are no predictions for current label.
        LOG(FATAL) << "Could not find confidence predictions for " << label;
        continue;
      }
      const vector<float>& scores = conf_scores.find(label)->second;
      int loc_label = share_location_ ? -1 : label;
      if (decode_bboxes.find(loc_label) == decode_bboxes.end()) {
        // Something bad happened if there are no predictions for current label.
        LOG(FATAL) << "Could not find location predictions for " << loc_label;
        continue;
      }
      const vector<NormalizedBBox>& bboxes =
          decode_bboxes.find(loc_label)->second;
      vector<int>& indices = it->second;
      if (need_save_) {
        CHECK(label_to_name_.find(label) != label_to_name_.end())
          << "Cannot find label: " << label << " in the label map.";
        CHECK_LT(name_count_, names_.size());
      }
      // 将数据放入输出数据域中
      for (int j = 0; j < indices.size(); ++j) {
        int idx = indices[j];
        top_data[count * 7] = i;
        top_data[count * 7 + 1] = label;
        top_data[count * 7 + 2] = scores[idx];
        const NormalizedBBox& bbox = bboxes[idx];
        top_data[count * 7 + 3] = bbox.xmin();
        top_data[count * 7 + 4] = bbox.ymin();
        top_data[count * 7 + 5] = bbox.xmax();
        top_data[count * 7 + 6] = bbox.ymax();
        if (need_save_) {
          NormalizedBBox out_bbox;
          OutputBBox(bbox, sizes_[name_count_], has_resize_, resize_param_,
                     &out_bbox);
          float score = top_data[count * 7 + 2];
          float xmin = out_bbox.xmin();
          float ymin = out_bbox.ymin();
          float xmax = out_bbox.xmax();
          float ymax = out_bbox.ymax();
          ptree pt_xmin, pt_ymin, pt_width, pt_height;
          pt_xmin.put<float>("", round(xmin * 100) / 100.);
          pt_ymin.put<float>("", round(ymin * 100) / 100.);
          pt_width.put<float>("", round((xmax - xmin) * 100) / 100.);
          pt_height.put<float>("", round((ymax - ymin) * 100) / 100.);

          ptree cur_bbox;
          cur_bbox.push_back(std::make_pair("", pt_xmin));
          cur_bbox.push_back(std::make_pair("", pt_ymin));
          cur_bbox.push_back(std::make_pair("", pt_width));
          cur_bbox.push_back(std::make_pair("", pt_height));

          ptree cur_det;
          cur_det.put("image_id", names_[name_count_]);
          if (output_format_ == "ILSVRC") {
            cur_det.put<int>("category_id", label);
          } else {
            cur_det.put("category_id", label_to_name_[label].c_str());
          }
          cur_det.add_child("bbox", cur_bbox);
          cur_det.put<float>("score", score);

          detections_.push_back(std::make_pair("", cur_det));
        }
        ++count;
      }
    }
    if (need_save_) {
      ++name_count_;
      if (name_count_ % num_test_image_ == 0) {
        if (output_format_ == "VOC") {
          map<string, std::ofstream*> outfiles;
          for (int c = 0; c < num_classes_; ++c) {
            if (c == background_label_id_) {
              continue;
            }
            string label_name = label_to_name_[c];
            boost::filesystem::path file(
                output_name_prefix_ + label_name + ".txt");
            boost::filesystem::path out_file = output_directory / file;
            outfiles[label_name] = new std::ofstream(out_file.string().c_str(),
                std::ofstream::out);
          }
          BOOST_FOREACH(ptree::value_type &det, detections_.get_child("")) {
            ptree pt = det.second;
            string label_name = pt.get<string>("category_id");
            if (outfiles.find(label_name) == outfiles.end()) {
              std::cout << "Cannot find " << label_name << std::endl;
              continue;
            }
            string image_name = pt.get<string>("image_id");
            float score = pt.get<float>("score");
            vector<int> bbox;
            BOOST_FOREACH(ptree::value_type &elem, pt.get_child("bbox")) {
              bbox.push_back(static_cast<int>(elem.second.get_value<float>()));
            }
            *(outfiles[label_name]) << image_name;
            *(outfiles[label_name]) << " " << score;
            *(outfiles[label_name]) << " " << bbox[0] << " " << bbox[1];
            *(outfiles[label_name]) << " " << bbox[0] + bbox[2];
            *(outfiles[label_name]) << " " << bbox[1] + bbox[3];
            *(outfiles[label_name]) << std::endl;
          }
          for (int c = 0; c < num_classes_; ++c) {
            if (c == background_label_id_) {
              continue;
            }
            string label_name = label_to_name_[c];
            outfiles[label_name]->flush();
            outfiles[label_name]->close();
            delete outfiles[label_name];
          }
        } else if (output_format_ == "COCO") {
          boost::filesystem::path output_directory(output_directory_);
          boost::filesystem::path file(output_name_prefix_ + ".json");
          boost::filesystem::path out_file = output_directory / file;
          std::ofstream outfile;
          outfile.open(out_file.string().c_str(), std::ofstream::out);

          boost::regex exp("\"(null|true|false|-?[0-9]+(\\.[0-9]+)?)\"");
          ptree output;
          output.add_child("detections", detections_);
          std::stringstream ss;
          write_json(ss, output);
          std::string rv = boost::regex_replace(ss.str(), exp, "$1");
          outfile << rv.substr(rv.find("["), rv.rfind("]") - rv.find("["))
              << std::endl << "]" << std::endl;
        } else if (output_format_ == "ILSVRC") {
          boost::filesystem::path output_directory(output_directory_);
          boost::filesystem::path file(output_name_prefix_ + ".txt");
          boost::filesystem::path out_file = output_directory / file;
          std::ofstream outfile;
          outfile.open(out_file.string().c_str(), std::ofstream::out);

          BOOST_FOREACH(ptree::value_type &det, detections_.get_child("")) {
            ptree pt = det.second;
            int label = pt.get<int>("category_id");
            string image_name = pt.get<string>("image_id");
            float score = pt.get<float>("score");
            vector<int> bbox;
            BOOST_FOREACH(ptree::value_type &elem, pt.get_child("bbox")) {
              bbox.push_back(static_cast<int>(elem.second.get_value<float>()));
            }
            outfile << image_name << " " << label << " " << score;
            outfile << " " << bbox[0] << " " << bbox[1];
            outfile << " " << bbox[0] + bbox[2];
            outfile << " " << bbox[1] + bbox[3];
            outfile << std::endl;
          }
        }
        name_count_ = 0;
        detections_.clear();
      }
    }
  }
  if (visualize_) {
#ifdef USE_OPENCV
    vector<cv::Mat> cv_imgs;
    this->data_transformer_->TransformInv(bottom[3], &cv_imgs);
    vector<cv::Scalar> colors = GetColors(label_to_display_name_.size());
    VisualizeBBox(cv_imgs, top[0], visualize_threshold_, colors,
        label_to_display_name_, save_file_);
#endif  // USE_OPENCV
  }
}

#ifdef CPU_ONLY
STUB_GPU_FORWARD(DetectionOutputLayer, Forward);
#endif

INSTANTIATE_CLASS(DetectionOutputLayer);
REGISTER_LAYER_CLASS(DetectionOutput);

}  // namespace caffe

    
    
     
     1
     
     2
     
     3
     
     4
     
     5
     
     6
     
     7
     
     8
     
     9
     
     10
     
     11
     
     12
     
     13
     
     14
     
     15
     
     16
     
     17
     
     18
     
     19
     
     20
     
     21
     
     22
     
     23
     
     24
     
     25
     
     26
     
     27
     
     28
     
     29
     
     30
     
     31
     
     32
     
     33
     
     34
     
     35
     
     36
     
     37
     
     38
     
     39
     
     40
     
     41
     
     42
     
     43
     
     44
     
     45
     
     46
     
     47
     
     48
     
     49
     
     50
     
     51
     
     52
     
     53
     
     54
     
     55
     
     56
     
     57
     
     58
     
     59
     
     60
     
     61
     
     62
     
     63
     
     64
     
     65
     
     66
     
     67
     
     68
     
     69
     
     70
     
     71
     
     72
     
     73
     
     74
     
     75
     
     76
     
     77
     
     78
     
     79
     
     80
     
     81
     
     82
     
     83
     
     84
     
     85
     
     86
     
     87
     
     88
     
     89
     
     90
     
     91
     
     92
     
     93
     
     94
     
     95
     
     96
     
     97
     
     98
     
     99
     
     100
     
     101
     
     102
     
     103
     
     104
     
     105
     
     106
     
     107
     
     108
     
     109
     
     110
     
     111
     
     112
     
     113
     
     114
     
     115
     
     116
     
     117
     
     118
     
     119
     
     120
     
     121
     
     122
     
     123
     
     124
     
     125
     
     126
     
     127
     
     128
     
     129
     
     130
     
     131
     
     132
     
     133
     
     134
     
     135
     
     136
     
     137
     
     138
     
     139
     
     140
     
     141
     
     142
     
     143
     
     144
     
     145
     
     146
     
     147
     
     148
     
     149
     
     150
     
     151
     
     152
     
     153
     
     154
     
     155
     
     156
     
     157
     
     158
     
     159
     
     160
     
     161
     
     162
     
     163
     
     164
     
     165
     
     166
     
     167
     
     168
     
     169
     
     170
     
     171
     
     172
     
     173
     
     174
     
     175
     
     176
     
     177
     
     178
     
     179
     
     180
     
     181
     
     182
     
     183
     
     184
     
     185
     
     186
     
     187
     
     188
     
     189
     
     190
     
     191
     
     192
     
     193
     
     194
     
     195
     
     196
     
     197
     
     198
     
     199
     
     200
     
     201
     
     202
     
     203
     
     204
     
     205
     
     206
     
     207
     
     208
     
     209
     
     210
     
     211
     
     212
     
     213
     
     214
     
     215
     
     216
     
     217
     
     218
     
     219
     
     220
     
     221
     
     222
     
     223
     
     224
     
     225
     
     226
     
     227
     
     228
     
     229
     
     230
     
     231
     
     232
     
     233
     
     234
     
     235
     
     236
     
     237
     
     238
     
     239
     
     240
     
     241
     
     242
     
     243
     
     244
     
     245
     
     246
     
     247
     
     248
     
     249
     
     250
     
     251
     
     252
     
     253
     
     254
     
     255
     
     256
     
     257
     
     258
     
     259
     
     260
     
     261
     
     262
     
     263
     
     264
     
     265
     
     266
     
     267
     
     268
     
     269
     
     270
     
     271
     
     272
     
     273
     
     274
     
     275
     
     276
     
     277
     
     278
     
     279
     
     280
     
     281
     
     282
     
     283
     
     284
     
     285
     
     286
     
     287
     
     288
     
     289
     
     290
     
     291
     
     292
     
     293
     
     294
     
     295
     
     296
     
     297
     
     298
     
     299
     
     300
     
     301
     
     302
     
     303
     
     304
     
     305
     
     306
     
     307
     
     308
     
     309
     
     310
     
     311
     
     312
     
     313
     
     314
     
     315
     
     316
     
     317
     
     318
     
     319
     
     320
     
     321
     
     322
     
     323
     
     324
     
     325
     
     326
     
     327
     
     328
     
     329
     
     330
     
     331
     
     332
     
     333
     
     334
     
     335
     
     336
     
     337
     
     338
     
     339
     
     340
     
     341
     
     342
     
     343
     
     344
     
     345
     
     346
     
     347
     
     348
     
     349
     
     350
     
     351
     
     352
     
     353
     
     354
     
     355
     
     356
     
     357
     
     358
     
     359
     
     360
     
     361
     
     362
     
     363
     
     364
     
     365
     
     366
     
     367
     
     368
     
     369
     
     370
     
     371
     
     372
     
     373
     
     374
     
     375
     
     376
     
     377
     
     378
     
     379
     
     380
     
     381
     
     382
     
     383
     
     384
     
     385
     
     386
     
     387
     
     388
     
     389
     
     390
     
     391
     
     392
     
     393
     
     394
     
     395
     
     396
     
     397
     
     398
     
     399
     
     400
     
     401
     
     402
     
     403
     
     404
     
     405
     
     406
     
     407
     
     408
     
     409
     
     410
     
     411
     
     412
     
     413
     
     414
     
     415
     
     416
     
     417
     
     418
     
     419
     
     420
     
     421
     
     422
     
     423
     
     424
     
     425
     
     426
     
     427
     
     428
     
     429
     
     430
     
     431
     
     432
     
     433
     
     434
     
     435
     
     436
     
     437
     
     438
     
     439
     
     440
     
     441
     
     442
     
     443
     
     444
     
     445
     
     446
     
     447
     
     448
     
     449
     
     450
     
     451
     
     452
     
     453
     
     454
     
     455
     
     456
     
     457
     
     458
     
     459
     
     460
     
     461
     
     462
     
     463
     
     464
     
     465
     
     466
     
     467
     
     468
     
     469
     
     470
     
     471
     
     472
     
     473
     
     474
     
     475
     
     476
     
     477
     
     478
     
     479
     
     480
     
     481
     
     482
     
     483
     
     484
     
     485
     
     486
     
     487

caffe定义

message DetectionOutputParameter {
  // 预测种类
  optional uint32 num_classes = 1;
  // 不同类别之间是否共享框位置
  optional bool share_location = 2 [default = true];
  // Background label id. 无则为 -1.
  optional int32 background_label_id = 3 [default = 0];
  // nms参数
  optional NonMaximumSuppressionParameter nms_param = 4;
  // Parameters used for saving detection results.
  optional SaveOutputParameter save_output_param = 5;
  // bbox的编解码方式
  optional PriorBoxParameter.CodeType code_type = 6 [default = CORNER];
  // variance是否被编码
  optional bool variance_encoded_in_target = 8 [default = false];
  // 每张图片在nms处理后保留框的数量
  // -1 表示保留所有框
  optional int32 keep_top_k = 7 [default = -1];
  // 得分阈值
  optional float confidence_threshold = 9;
  // If true, visualize the detection results.
  optional bool visualize = 10 [default = false];
  // The threshold used to visualize the detection results.
  optional float visualize_threshold = 11;
  // If provided, save outputs to video file.
  optional string save_file = 12;
}
    
    
     
     1
     
     2
     
     3
     
     4
     
     5
     
     6
     
     7
     
     8
     
     9
     
     10
     
     11
     
     12
     
     13
     
     14
     
     15
     
     16
     
     17
     
     18
     
     19
     
     20
     
     21
     
     22
     
     23
     
     24
     
     25
     
     26
     
     27

        <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/markdown_views-ea0013b516.css">
            </div>