为Torch创建hdf5训练文件

Torch 是用C/CUDA作为底层实现，用LuaJIT作为接口的机器学习算法框架。

HDF5是用于海量复杂数据集管理的技术,能够支持多种平台与多种语言接口（C，C++，Python等）。

Torch的tutorial只提供了处理images和random tensors的方法，并没有对其他格式提供示例。本文使用将对如何创建HDF5数据集以及如何在Torch中使用HDF5文件格式做一个梳理。

一.安装 hdf5

1. torch 平台上安装

git clone https://github.com/anibali/torch-hdf5.git
cd torch-hdf5
git checkout hdf5-1.10 
luarocks make hdf5-0-0.rockspec

2.python安装

sudo pip install h5py

二. 使用Python创建HDF文件

1.创建HDF5对象

import h5py
import os
f=h5py.File('train.h5','w') #以'w'模式创建一个名为'train.h5'的HDF5对象
f.create_dataset('data',(100,3,32,32),dtype='f8')  #有100个样本，每个样本有三个通道
f.create_dataset('label',(100,1),dtype='i') #创建存放label的dataset，尺寸是100*1

2.写入数据
写入数据其实很简单，只需要对dataset中的每个对象赋值即可。我们使用numpy随机生成的数据为例进行赋值。

import numpy as np
for i in range(100):
    temp=np.random.random((3,32,32))
    f['data'][i]=temp #写入data
    f['label'][i]=i%4 #写入label
f.close()

以上，即可生成一个用于训练的train.hdf5文件。

三. 在torch中使用HDF5文件

在torch中读取HDF5文件需要用到torch-hdf5。

require 'hdf5';
myFile=hdf5.open('path_to_hdf5_file','r') --读入HDF5文件
trainset={data=myFile：read（'data'）:all(),label=myFile:read('label'):all():byte()}

trainset是与官方tutorial读取’cifar10-train.t7’ 后同样的对象。

trainset --输入trainset并执行后可以看到trainset的信息如下
 ----------------执行输出的信息-------------------------------------------------
｛
data : DoubleTensor - size: 100×3×32×32
label : ByteTensor -size: 100×1
｝

参考：http://withwsf.github.io/2015/12/23/torch-hdf5/
https://blog.csdn.net/u013548568/article/details/79732856

为Torch创建hdf5训练文件

一.安装 hdf5

二. 使用Python创建HDF文件

三. 在torch中使用HDF5文件

猜你喜欢