背景:为了更改运行通相应代码,需要先在小数据集上运行代码方便实现。
目的:在小数据集上运行代码。
目录
一、生成训练集代码
dataset_tool_tf.py此代码用于将原始数据集生成tfrecord格式的文件给网络训练。
1.1 命令行与输入参数
相应命令行:
# This should run through roughly 50K images and output a file called `datasets/imagenet_val_raw.tfrecords`.
python dataset_tool_tf.py
--input-dir "<path_to_imagenet>/ILSVRC2012_img_val"
--out=datasets/imagenet_val_raw.tfrecords
或者
python dataset_tool_tf.py
--input-dir datasets/BSDS300-images/BSDS300/images/train
--out=datasets/bsd300.tfrecords
命令行涉及两个参数,--input-dir输入的数据集的位置 ,--out输出的tfrecords格式文档的位置。
def main():
parser = argparse.ArgumentParser(
description='Convert a set of image files into a TensorFlow tfrecords training set.',
epilog=examples,
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument("--input-dir", help="Directory containing ImageNet images")
parser.add_argument("--out", help="Filename of the output tfrecords file")
args = parser.parse_args()
if args.input_dir is None:
print ('Must specify input file directory with --input-dir')
sys.exit(1)
if args.out is None:
print ('Must specify output filename with --out')
sys.exit(1)
1.2 输入图像
print ('Loading image list from %s' % args.input_dir)
images = sorted(glob.glob(os.path.join(args.input_dir, '*.JPEG')))
images += sorted(glob.glob(os.path.join(args.input_dir, '*.jpg')))
images += sorted(glob.glob(os.path.join(args.input_dir, '*.png')))
np.random.RandomState(0x1234f00d).shuffle(images)
所有JPEG jpg png格式的图片存入images
1.3 转换与存入
#------Convert the data into tfrecords--------------------------
outdir = os.path.dirname(args.out)
os.makedirs(outdir, exist_ok=True)
writer = tf.python_io.TFRecordWriter(args.out)
for (idx, imgname) in enumerate(images):
print (idx, imgname)
image = load_image(imgname)
feature = {
'shape': shape_feature(image.shape),
'data': bytes_feature(tf.compat.as_bytes(image.tostring()))
}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
1.4 输出相关信息
print ('Dataset statistics:')
print (' Formats:')
for key in format_stats:
print (' %s: %d images' % (key, format_stats[key]))
print (' width,height buckets:')
for key in size_stats:
print (' %s: %d images' % (key, size_stats[key]))
writer.close()
根据此信息得出,只要图像在相应文件夹之中,多少等等并不重要。我们可以将相应的数据集改小。
二、一个版本原因的错误(已调通可不看)
BSD300文件夹拷出来,然后train文件夹删除部分照片,留20张,test文件夹留10张。
部分图片生成tfrecord的命令行:
python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
所有图片生成的代码:
python dataset_tool_tf.py --input-dir datasets/BSDS300/images/train --out=datasets/bsd300.tfrecords
2.1 版本导致的报错
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
Loading image list from datasets/part_BSDS300/images/train
Traceback (most recent call last):
File "dataset_tool_tf.py", line 94, in <module>
main()
File "dataset_tool_tf.py", line 70, in main
os.makedirs(outdir, exist_ok=True)
TypeError: makedirs() got an unexpected keyword argument 'exist_ok'
出现此错误,原来作者要求版本为python 3.6 ,这样报错的版本为python 2.7.6
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
需要按照要求在服务器上安装Anaconda并且配置环境
python3的版本输入的时候,报错为tensorflow配置不对。
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python3 dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
Traceback (most recent call last):
File "dataset_tool_tf.py", line 12, in <module>
import tensorflow as tf
ImportError: No module named 'tensorflow'
jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python3
Python 3.5.2 (default, May 23 2017, 10:15:40)
[GCC 5.4.1 20160904] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
2.2 解决方法
虚拟环境中用Anaconda安装显卡CUDA驱动与CUDA运行版本匹配 https://blog.csdn.net/weixin_36474809/article/details/87820314
运用Anaconda对python 3.6与tensorflow-gpu与pip环境配置 https://blog.csdn.net/weixin_36474809/article/details/87714182
Ubuntu14.04安装Anaconda3-2018.12-x86_64 https://blog.csdn.net/weixin_36474809/article/details/87804903
解决之后tensorboard依然有问题,但不影响我们的实验
(n2n) jcx@smart-dsp:~$ tensorboard
Traceback (most recent call last):
File "/home/jcx/.conda/envs/n2n/bin/tensorboard", line 11, in <module>
sys.exit(run_main())
File "/home/jcx/.conda/envs/n2n/lib/python3.6/site-packages/tensorboard/main.py", line 36, in run_main
tf.app.run(main)
File "/home/jcx/.conda/envs/n2n/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/jcx/.conda/envs/n2n/lib/python3.6/site-packages/tensorboard/main.py", line 45, in main
default.get_assets_zip_provider())
File "/home/jcx/.conda/envs/n2n/lib/python3.6/site-packages/tensorboard/program.py", line 166, in main
tb = create_tb_app(plugins, assets_zip_provider)
File "/home/jcx/.conda/envs/n2n/lib/python3.6/site-packages/tensorboard/program.py", line 190, in create_tb_app
raise ValueError('A logdir must be specified when db is not specified. '
ValueError: A logdir must be specified when db is not specified. Run `tensorboard --help` for details and examples.
三、生成训练集
dataset_tool_tf.py此代码用于将图片生成训练集,供后面网络训练使用。
3.1 训练集与测试集的更改
BSD300文件夹拷出来,然后train文件夹删除部分照片,留10张,test文件夹留5张。
但是,这样相应的文件夹/datasets/part_BSDS300/之中的文件
iids_test.txt和iids_train.txt中的名称暂时没有更改。
3.2 命令行
部分图片生成tfrecord的命令行:
python dataset_tool_tf.py --input-dir datasets/part_BSDS300/images/train --out=datasets/part_bsd300.tfrecords
所有图片生成的代码(暂时不用):
python dataset_tool_tf.py --input-dir datasets/BSDS300/images/train --out=datasets/bsd300.tfrecords
只运用部分生成相应的数据集后:
。。。
17 datasets/part_BSDS300/images/train/43083.jpg
18 datasets/part_BSDS300/images/train/60079.jpg
19 datasets/part_BSDS300/images/train/16052.jpg
Dataset statistics:
Formats:
RGB: 20 images
width,height buckets:
>= 256x256: 20 images
3.3 validate set下载
命令行
python download_kodak.py --output-dir=datasets/kodak
代码
import os
import sys
import argparse
from urllib.request import urlretrieve
examples='''examples:
python %(prog)s --output-dir=./tmp
'''
def main():
parser = argparse.ArgumentParser(
description='Download the Kodak dataset .PNG image files.',
epilog=examples,
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument("--output-dir", help="Directory where to save the Kodak dataset .PNGs")
args = parser.parse_args()
if args.output_dir is None:
print ('Must specify output directory where to store tfrecords with --output-dir')
sys.exit(1)
os.makedirs(args.output_dir, exist_ok=True)
for i in range(1, 25):
imgname = 'kodim%02d.png' % i
url = "http://r0k.us/graphics/kodak/kodak/" + imgname
print ('Downloading', url)
urlretrieve(url, os.path.join(args.output_dir, imgname))
print ('Kodak validation set successfully downloaded.')
if __name__ == "__main__":
main()
运行结果,只有24张
Downloading http://r0k.us/graphics/kodak/kodak/kodim22.png
Downloading http://r0k.us/graphics/kodak/kodak/kodim23.png
Downloading http://r0k.us/graphics/kodak/kodak/kodim24.png
Kodak validation set successfully downloaded.
四、训练
4.1 网络训练
注意:一定要按照要求配置好相应的CUDA驱动版本,CUDA运行版本,TF版本等等。
虚拟环境中用anaconda安装显卡CUDA驱动与CUDA运行版本匹配 https://blog.csdn.net/weixin_36474809/article/details/87820314
运用config.py函数进行训练,相应参数帮助为:
(n2n) jcx@smart-dsp:~/Desktop/xxr2019/NVlabs_noise2noise$ python config.py train --help
usage: config.py train [-h] [--noise2noise [NOISE2NOISE]] [--noise NOISE]
[--long-train LONG_TRAIN]
[--train-tfrecords TRAIN_TFRECORDS]
optional arguments:
-h, --help show this help message and exit
--noise2noise [NOISE2NOISE]
Noise2noise (--noise2noise=true) or noise2clean
(--noise2noise=false). Default is noise2noise=true.
--noise NOISE Type of noise corruption (one of: gaussian, poisson)
--long-train LONG_TRAIN
Train for a very long time (500k iterations or
500k*minibatch image)
--train-tfrecords TRAIN_TFRECORDS
Filename of the training set tfrecords file
相应命令行
对于imagenet而言,相应的命令行为:
python config.py --desc='-test' train --train-tfrecords=datasets/imagenet_val_raw.tfrecords --long-train=true --noise=gaussian
后面查看源码看--desc=‘-test’ 表示什么,--train-tfrecords后面的表示训练集。
python config.py --desc='-test' train --train-tfrecords=datasets/part_bsd300.tfrecords --long-train=false --noise=gaussian
python config.py train --train-tfrecords=datasets/part_bsd300.tfrecords --noise=gaussian
相应网络结构
Setting up dataset source from datasets/part_bsd300.tfrecords
autoencoder Params OutputShape WeightShape
--- --- --- ---
x - (?, 3, 256, 256) -
enc_conv0 1344 (?, 48, 256, 256) (3, 3, 3, 48)
enc_conv1 20784 (?, 48, 256, 256) (3, 3, 48, 48)
MaxPool - (?, 48, 128, 128) -
enc_conv2 20784 (?, 48, 128, 128) (3, 3, 48, 48)
MaxPool_1 - (?, 48, 64, 64) -
enc_conv3 20784 (?, 48, 64, 64) (3, 3, 48, 48)
MaxPool_2 - (?, 48, 32, 32) -
enc_conv4 20784 (?, 48, 32, 32) (3, 3, 48, 48)
MaxPool_3 - (?, 48, 16, 16) -
enc_conv5 20784 (?, 48, 16, 16) (3, 3, 48, 48)
MaxPool_4 - (?, 48, 8, 8) -
enc_conv6 20784 (?, 48, 8, 8) (3, 3, 48, 48)
Upscale2D - (?, 48, 16, 16) -
dec_conv5 83040 (?, 96, 16, 16) (3, 3, 96, 96)
dec_conv5b 83040 (?, 96, 16, 16) (3, 3, 96, 96)
Upscale2D_1 - (?, 96, 32, 32) -
dec_conv4 124512 (?, 96, 32, 32) (3, 3, 144, 96)
dec_conv4b 83040 (?, 96, 32, 32) (3, 3, 96, 96)
Upscale2D_2 - (?, 96, 64, 64) -
dec_conv3 124512 (?, 96, 64, 64) (3, 3, 144, 96)
dec_conv3b 83040 (?, 96, 64, 64) (3, 3, 96, 96)
Upscale2D_3 - (?, 96, 128, 128) -
dec_conv2 124512 (?, 96, 128, 128) (3, 3, 144, 96)
dec_conv2b 83040 (?, 96, 128, 128) (3, 3, 96, 96)
Upscale2D_4 - (?, 96, 256, 256) -
dec_conv1a 57088 (?, 64, 256, 256) (3, 3, 99, 64)
dec_conv1b 18464 (?, 32, 256, 256) (3, 3, 64, 32)
dec_conv1 867 (?, 3, 256, 256) (3, 3, 32, 3)
--- --- --- ---
Total 991203
训练过程非常耗时。
Building TensorFlow graph...
Training...
Average PSNR: 6.56
iter 0 time 4s sec/eval 0.0 sec/iter 0.00 maintenance 4.4
Average PSNR: 28.24
iter 1000 time 2m 30s sec/eval 117.5 sec/iter 0.12 maintenance 28.4
Average PSNR: 29.85
iter 2000 time 4m 33s sec/eval 114.2 sec/iter 0.11 maintenance 8.2
Average PSNR: 29.96
iter 3000 time 6m 39s sec/eval 115.8 sec/iter 0.12 maintenance 10.3
。。。
iter 297000 time 9h 24m 11s sec/eval 99.3 sec/iter 0.10 maintenance 10.0
Average PSNR: 28.58
iter 298000 time 9h 26m 02s sec/eval 100.6 sec/iter 0.10 maintenance 10.3
Average PSNR: 28.58
iter 299000 time 9h 27m 52s sec/eval 99.1 sec/iter 0.10 maintenance 10.8
Elapsed time: 9h 29m 43s
dnnlib: Finished train.train() in 9h 29m 44s.
花费时间可能比较久,例如ImageNet作为训练集时,NVIDIA Titan V GPU花费了7.5小时。而我们用部分BSD300数据集,反而花费了更多时间。9h29m
每开始一次训练,都会在results/文件夹下,生成一个文件夹000xx-autoencoder-n2n。
训练完成时,会生成一个network_final.pickle在results/000xx-autoencoder-n2n目录下。
4.2 validate
假如训练好的网络在下面目录下:
results/00001-autoencoder-1gpu-L-n2n
. Here's how to run a set of images through this network:
python config.py validate --dataset-dir=datasets/kodak --network-snapshot=results/00001-autoencoder-1gpu-L-n2n/network_final.
我们前一步的目录为results/00009-autoencoder-n2n/network_final.pickle
python config.py validate --dataset-dir=datasets/kodak/ --network-snapshot=results/00009-autoencoder-n2n/network_final.pickle
Average PSNR: 28.58
dnnlib: Finished validation.validate() in 25s.