LMDB分析和caffe-SSD模型中一些思考[正样本(R红Y黄G绿B黑+background)+负样本]

参考博客：

SSD目标检测lmdb数据结构剖析
https://blog.csdn.net/Touch_Dream/article/details/80598901
基于caffe的SSD目标检测——训练集生成和lmdb文件的制作
https://blog.csdn.net/edogawachia/article/details/81669834

LMDB是基于二叉树的数据库管理库，建模基于伯克利数据库的应用程序接口，但做了大幅精简。整个数据库都是内存映射型的，所有数据获取返回数据都是直接从映射的内存中返回，所以获取数据时没有malloc或memcpy发生。因此该数据库仍是非常简单的，因为它不需要自己的页面缓存层，并且非常高效、省内存。它在语义上完全符合ACID（原子性、一致性、隔离性、持久性）。当内存映射为只读时，数据库完整性不会被应用程序的迷失指针写破坏。

该库也是线程可见的，支持来自多进程/线程的并发读/写访问。数据页使用写时复制策略，故没有活动数据页被覆盖写入。这也提供了保护机制，经历系统崩溃后不需要特殊恢复过程。写入过程为完全串行的；一次只有一个写会话是活动的，这保证了写入者不可能死锁。数据库结构是多个版本，所以读出者运行时不加锁。写入这不会阻塞读出者，读出者也不会阻塞写入者。

刚开始以为LMDB数据库要和图片放在一起，后来了解多了，发现他是一个单独的存在，图片，标签删除不删除，和他已经没有关系了，他内部就有标签和图像，我们以一个程序去解析它

这里我使用的是pycharm，而且在

sudo gedit ~/.bashrc
source ~/.bashrc

已经添加了caffe的python路径，在代码中，也进行了

caffe_root = '/home/boyun/ynh/code/caffe-ssd'
os.chdir(caffe_root)
sys.path.insert(0, os.path.join(caffe_root, 'python'))
import caffe
from caffe.proto import caffe_pb2

但是程序还是提示，没有caffe这个模块，

后来发现，必须是.sh打开

cd /home/boyun/pycharm-2018.3.2/bin #此处按照自己的pycharm解压文件夹的路径修改 
sh ./pycharm.sh

我之前是打开过pycharm，然后锁定到启动器，每次也能打开，运行正常的python程序，今天遇到这种问题，

可能是我设置桌面快捷启动方式不对

然后上代码，分析VOC数据集：

# -*- coding: utf-8 -*
import lmdb
import numpy as np
import cv2
import sys
import os

caffe_root = '/home/boyun/ynh/code/caffe-ssd'
os.chdir(caffe_root)
sys.path.insert(0, os.path.join(caffe_root, 'python'))
import caffe
from caffe.proto import caffe_pb2
 
#lmdb_env = lmdb.open('/home/boyun/PycharmProjects/TrafficLight/Demo1/trafficLightAnalyze/traffic_Light_detect/train/lmdb/train_test_lmdb')
lmdb_env = lmdb.open('/home/boyun/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb')
 
lmdb_txn = lmdb_env.begin()                                 # 生成处理句柄
lmdb_cursor = lmdb_txn.cursor()                             # 生成迭代器指针
annotated_datum = caffe_pb2.AnnotatedDatum()                # AnnotatedDatum结构
 
for key, value in lmdb_cursor:
    print key
 
    annotated_datum.ParseFromString(value)
    datum = annotated_datum.datum                           # Datum结构
    grps = annotated_datum.annotation_group                 # AnnotationGroup结构
    type = annotated_datum.type
 
    for grp in grps:
        xmin = grp.annotation[0].bbox.xmin * datum.width           # Annotation结构
        ymin = grp.annotation[0].bbox.ymin * datum.height
        xmax = grp.annotation[0].bbox.xmax * datum.width
        ymax = grp.annotation[0].bbox.ymax * datum.height
 
        print "label:", grp.group_label                            # object的name标签
        print "bbox:", xmin, ymin, xmax, ymax                      # object的bbox标签
 
    label = datum.label                                      # Datum结构label以及三个维度   
    channels = datum.channels
    height = datum.height
    width = datum.width
 
    #print "label:", label
    print "channels:", channels
    print "height:", height
    print "width:", width
 
    image_x = np.fromstring(datum.data, dtype=np.uint8)      # 字符串转换为矩阵
    image = cv2.imdecode(image_x, -1)                        # decode
 
    cv2.imshow("image", image)                               # 显示图片
    if cv2.waitKey(1000) & 0xFF == ord('q'):
        break
    print "\n"
    print "\n"
    print "\n"

00000001_VOC2007/JPEGImages/000002.jpg
label: 19
bbox: 138.999997824 200.00000298 206.999998987 300.999999046
channels: 3
height: 500
width: 335






00000002_VOC2007/JPEGImages/000003.jpg
label: 18
bbox: 123.000003397 154.999997467 215.000003576 194.999992847
label: 9
bbox: 238.999992609 156.000003219 307.000011206 205.000005662
channels: 3
height: 375
width: 500






00000003_VOC2007/JPEGImages/000004.jpg
label: 7
bbox: 13.0000002682 311.00000608 83.9999988675 362.000010967
channels: 3
height: 406
width: 500






00000004_VOC2007/JPEGImages/000006.jpg
label: 16
bbox: 187.000006437 135.000005364 282.000005245 242.000006139
label: 11
bbox: 153.999999166 209.000006318 368.999987841 375.0
label: 9
bbox: 254.999995232 206.999994814 365.999996662 375.0
channels: 3
height: 375
width: 500






00000005_VOC2007/JPEGImages/000008.jpg
label: 9
bbox: 192.000001669 15.9999998286 363.999992609 248.999990523
channels: 3
height: 375
width: 500






00000006_VOC2007/JPEGImages/000010.jpg
label: 13
bbox: 86.9999978542 97.0000004768 258.000010371 427.000007629
label: 15
bbox: 133.000003874 72.000002861 244.999998808 283.999986649
channels: 3
height: 480
width: 354






00000007_VOC2007/JPEGImages/000011.jpg
label: 8
bbox: 126.000002027 50.9999985695 330.000013113 308.000009537
channels: 3
height: 324
width: 500






00000008_VOC2007/JPEGImages/000013.jpg
label: 10
bbox: 298.999994993 160.000003874 446.000009775 251.999996603
channels: 3
height: 375
width: 500

LMDB解析完毕。

参考博客：

搞定目标检测(SSD篇)下

https://blog.csdn.net/cedi9117/article/details/86509004

这里解释下：

正样本中非标注的b-box，都是负样本，

而我们等会说的负样本，就是整个图的b-box，都是负样本

首先，我想先从SSD的副标题：“Single Shot MultiBox Detector”入手，用上帝视角带你从宏观上理解它：

“Single Shot”指的是单目标检测。
“Box”就像是拍摄用的取景框，“Single Shot”的范围只限于框内，框外的内容一律屏蔽。
“MultiBox”指的是用各种不同大小、形状的取景框覆盖整个图像。

综合所有因素就能得出SSD的工作原理：将图像切分为N个区域，对每个区域进行单目标检测，并汇总所有的单目标检测结果。

SSD是用Convolution来切分图像的。正如它的网络架构所示，SSD的top layers（Extra Feature Layers）由多个卷积层组成。假设某个卷积层的计算结果是：[64, 25, 4, 4]，它指的是在4x4大小的feature map中，总共有16个grid cells，每个cell映射到图像中的一个区域。

如图所示，(图中bg是Background)

图像中的网格就是grid cells映射的区域，即MultiBox。MultiBox的单目标检测预测结果就保存在

卷积矩阵的axis 1（channels维度），25 = bounding box + 分类概率 = 4 + 21(20 + “Background”)。

SSD的Extra Feature Layers通过pooling层（or stride=2）不断将网格数减半，直至为1（4x4 -> 2x2 -> 1x1），相应的，每个网格的大小也随着网格数减半而翻倍增加。

这样一来，就可以创造出不同形状大小的MultiBox（网格）来锚定不同形状大小的物体。

这是关于SSD的一些概念，

假设一张图像，如果有标注b-box的地方有类别号，没有标注的地方为Background

那么一张图像，没有任何标注呢，这个图，整个为负样本，那么他的XML怎么写？

我们以灯举例：

那么他的标注：

<?xml version="1.0" encoding="utf-8"?>

<annotation verified="no">
  <folder>待标注</folder>
  <filename>********</filename>
  <path>********</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>600</width>
    <height>600</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>G</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <Difficult>0</Difficult>
    <bndbox>
      <xmin>177</xmin>
      <ymin>79</ymin>
      <xmax>243</xmax>
      <ymax>105</ymax>
    </bndbox>
  </object>
  <object>
    <name>G</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <Difficult>0</Difficult>
    <bndbox>
      <xmin>347</xmin>
      <ymin>80</ymin>
      <xmax>423</xmax>
      <ymax>105</ymax>
    </bndbox>
  </object>
</annotation>

<?xml version="1.0" encoding="utf-8"?>

<annotation verified="no">
  <folder>待标注</folder>
  <filename>*************</filename>
  <path>*************.jpg</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>600</width>
    <height>600</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>G</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <Difficult>0</Difficult>
    <bndbox>
      <xmin>177</xmin>
      <ymin>79</ymin>
      <xmax>243</xmax>
      <ymax>105</ymax>
    </bndbox>
  </object>
  <object>
    <name>G</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <Difficult>0</Difficult>
    <bndbox>
      <xmin>347</xmin>
      <ymin>80</ymin>
      <xmax>423</xmax>
      <ymax>105</ymax>
    </bndbox>
  </object>
</annotation>

LMDB文件解析出来就是

00000006_train/JPEGImages/************.jpg
channels: 3
height: 600
width: 600






00000007_train/JPEGImages/************.jpg
label: 3
bbox: 186.000001431 97.9999959469 258.000004292 118.000003695
channels: 3
height: 600
width: 600






00000008_train/JPEGImages/************.jpg
channels: 3
height: 600
width: 600






00000009_train/JPEGImages/************.jpg
label: 4
bbox: 248.999994993 112.000000477 282.999998331 125.999996066
channels: 3
height: 600
width: 600

LMDB分析和caffe-SSD模型中一些思考[正样本(R红Y黄G绿B黑+background)+负样本]

猜你喜欢