Caffe训练个人数据并调用模型进行分类

最近有份作业，需要用到cafee做一些图片分类方面的，用惯Tensorflow了就gg，图片集用了华南理工大学的图片集。

一开始的安装由于我懒，所以让个有经验的同学帮我装了下，本来想亲力亲为的我，真香

由于我之前装了tensorflow-gpu，CUDA版本9.0，caffe现在好像支持最高8.0，用9.0是会build不出来的，嫌麻烦我直接装cpu版了。

然后想先做个简单的分类练一下手，第一眼看到的博客地址，发现跟其他博客写的也差不多，顺序也差不多，但是我自己会遇到一些问题，主要就是路径的问题。

所以，流程中的脚本文件的路径什么的，要好好注意用在哪，以及会和其他路径怎么连接。

首先，可能由于Caffe版本不同，我看到很多网上的教程，可执行exe文件都是在“/build/tools/”下，而我的是在“caffe\scripts\build\tools\Release”下，接下来跟着流程走。

我的整个训练产生的文件：

1.train.txt文件和val.txt文件以及label.txt,我的图片都一起放data里面了，一开始搞txt文本，还是用python处理的。。分出train和val文件夹，需要在之后的一些文件中加上文件夹的路径，后面会说原因。

ftw93.jpg 0
ftw94.jpg 0
ftw95.jpg 0
ftw96.jpg 0
ftw97.jpg 0
ftw98.jpg 0
ftw99.jpg 0
...
mtw1.jpg 1
mtw10.jpg 1
mtw100.jpg 1
mtw101.jpg 1
mtw102.jpg 1
mtw103.jpg 1
mtw104.jpg 1
mtw105.jpg 1
mtw106.jpg 1
mtw107.jpg 1

label.txt即所有分类

0 欧美女
1 亚洲女
2 欧美男
3 亚洲男

我的文件都是这样配置，不用绝对地址就是因为路径相关，等一下说。

(2018-12-06更新)自动生成caffe训练的训练和测试集txt脚本如下(训练和测试图片放两个文件夹)：

# /usr/bin/env sh
DATA=D:/caffe/examples/my_image
FILETYPE=jpg   #需要处理样本的图片格式
echo "Create train.txt..."
rm -rf $DATA/train.txt
array=("ftw" "fty" "mtw" "mty")    # 循环几种类别
for i in 0 1 2 3  #
do
echo ${array[i]}
find $DATA/data/train -name ${array[i]}*.$FILETYPE | cut -d '/' -f7 | sed "s/$/ $i/">>train.txt   # 写入文件
done
echo "Create test.txt..."
rm -rf $DATA/test.txt
for i in 0 1 2 3   # -f6-7 指目录第6-7层，根据上面的目录来指定，若是不加train文件夹，只需7即可
do
find $DATA/data/test -name ${array[i]}*.$FILETYPE | cut -d '/' -f7 | sed "s/$/ $i/">>val.txt  
done
echo "All done"
pause

2. 生成lmdb文件，我用的也是create_imagenet.sh文件，路径为caffe\examples\imagenet，文件里面我一开始也用相对路径，一直有出错的部分，所以我直接全用绝对路径了。在这里面就有TRAIN_DATA_ROOT和VAL_DATA_ROOT的问题了，这两个指训练和测试数据集的地址，与之前train.txt和val.txt里的路径会组合成完整的路径，我不在train.txt中使用完整路径的原因，是因为我用git bash启动sh文件，如果TRAIN_DATA_ROOT设置为 / ，则默认为git的exe文件所以路径作为TRAIN_DATA_ROOT，故而老是训练失败。一些注释我就不打了，一般都能看得懂，之前贴的那份链接也有。

#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e

EXAMPLE=D:/caffe/examples/my_image
DATA=D:/caffe/examples/my_image/data/
TOOLS=D:/caffe/scripts/build/tools/Release

TRAIN_DATA_ROOT=D:/caffe/examples/my_image/data/train/
VAL_DATA_ROOT=D:/caffe/examples/my_image/data/test/

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=true
if $RESIZE; then
  RESIZE_HEIGHT=32   # 文件中本来自动设置成256，但我为了快点出结果，先设了32
  RESIZE_WIDTH=32
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi

if [ ! -d "$TRAIN_DATA_ROOT" ]; then
  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet training data is stored."
  exit 1
fi

if [ ! -d "$VAL_DATA_ROOT" ]; then
  echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet validation data is stored."
  exit 1
fi

echo "Creating train lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/ilsvrc12_train_lmdb   #生成的lmdb路径


echo "Creating val lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/val.txt \
    $EXAMPLE/ilsvrc12_val_lmdb    #生成的lmdb路径

echo "Done."

3.生成mean_file，依然全部用绝对路径，关于这个的用处，这个博客有所描述。不过上面的链接中只计算了train的均值文件，以及我看到的教程都是只搞出了train的mean_file，然后后面在网络中一起用，这让我觉得有点奇怪，于是我多加了个test的，结果计算出来的文件大小与train的mean_file大小一样，进入网络后跟之前也没什么变化，很迷

（注：一天后，我突然意识到了，现实中只有训练数据和待预测数据，故而当然只有train的均值文件，至于为什么大小一样。。自然因为它是均值文件，不随数量变化而变化）

EXAMPLE=D:/caffe/examples/my_image
DATA=D:/caffe/examples/my_image/data/
TOOLS=D:/caffe/scripts/build/tools/Release

$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb $EXAMPLE/imagenet_train_mean.binaryproto

$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_val_lmdb $EXAMPLE/imagenet_val_mean.binaryproto

echo "Done."

4.cifar10_quick_solver.prototxt 和 cifar10_quick_train_test.prototxt，这两个都是在caffe\examples\cifar10中拷贝过来的

cifar10_quick_train_test.prototxt，可以对照一下哪些地方不同（我备注的就是不同的）

name: "CIFAR10_quick"
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "D:/caffe/examples/my_image/imagenet_train_mean.binaryproto"  #均值文件路径
  }
  data_param {
    source: "D:/caffe/examples/my_image/ilsvrc12_train_lmdb"   # lmdb文件路径
    batch_size: 20   # 图片数量比较少的话就不要设置太大了
    backend: LMDB   # 有两种，生成的是lmdb，就选LMDB
  }
}
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "D:/caffe/examples/my_image/imagenet_train_mean.binaryproto"  # 注意也是train的均值文件
  }
  data_param {
    source: "D:/caffe/examples/my_image/ilsvrc12_val_lmdb"  # lmdb文件夹
    batch_size: 20  # 测试时候batch_size
    backend: LMDB  # 同上
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 64
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 4   # 输出变量
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

5.cifar10_quick_solver.prototxt文件

# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10

# The train/test net protocol buffer definition
net: "D:/caffe/examples/my_image/cifar10_quick_train_test.prototxt"   # 网络路径
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 20  # 训练几次，这里的数字应该和batch_size相乘后等于训练集总数
# Carry out testing every 500 training iterations.
test_interval: 10  # 每隔几次测试一次
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 4000
# snapshot intermediate results
snapshot: 1000
snapshot_prefix: "D:/caffe/examples/my_image/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: CPU

6.开始训练，train.sh文件

D:/caffe/scripts/build/tools/Release/caffe train --solver=D:/caffe/examples/my_image/cifar10_quick_solver.prototxt

然后运行train.sh文件即可

7.(2018-12-06更新)对自己的图片进行分类，注意其中有个deploy.protxt，需要跟之前的网络是适配的

我创建的deploy.protxt（根据前面的网络进行设置的）和test.sh如下：

name: "CIFAR10_quick"
layer {
  name: "cifar"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 10 dim: 3 dim: 32 dim: 32 } }  #这里dim 227 两个地方需要对应上你自己训练时候的尺寸,否则会出现以下描述的异常
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
	pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  inner_product_param {
    num_output: 64
  }
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 4  #配置的标签个数
  }
}

layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

for i in 1 2 3 4
do
D:/caffe/scripts/build/examples/cpp_classification/Release/classification.exe D:/caffe/examples/my_image/deploy.prototxt D:/caffe/examples/my_image/cifar10_quick_iter_4000.caffemodel.h5 D:/caffe/examples/my_image/imagenet_train_mean.binaryproto D:/caffe/examples/my_image/label.txt  C:/Users/xxx/Desktop/$i.jpg
# 一共五个路径，应该都能看出指什么
done

8.由于这篇一开始我只用了两个分类，一开始的步骤写的也都很简单，后来才逐步修改上去的，所以可能有些地方承接的不好，之后我重新训练了一份，保留了整个流程的脚本和网络结构以及训练出来的模型，可以直接下载。

9.在vs2015中创建自己的工程调用模型进行分类，这步我还在探索中

Caffe训练个人数据并调用模型进行分类

猜你喜欢