最近公司的8GPU的服务器就我一个人用，为了不浪费资源，自己瞎捣鼓捣鼓模型，搭个DenseNet玩玩，跑跑过大名鼎鼎的ImageNet。在实现的过程中发现了tensorflow的一个很大的缺点，希望有人感觉把这坑填上。

DenseNet的想法和结构在CNN进化史里介绍过，就不赘述了，直接说实现的细节：

architecture（结构）:

Composite function $H(x)$ : BN-ReLU-Conv(3x3). 和其他的模型不同，有点疑惑。
Pooling layers: AvePool2x2-BN-Conv(1x1)
Growth rate: the $l$ -th layer has $k_0+k\times (l-1)$ input feature-maps, where $k_0$ is the number of channels in the input layer.
Bottleneck layers: BN-ReLU-Conv(1x1)-BN-ReLU-Conv(3x3)
Compression: reduce the number of feature-maps at transition layers.

Implementation Details:

three dense blocks that each has an equal number of layers.
a convolution with twice the growth rate for DenseNet-BC output channels is performed on the input images.
For convolutional layers with kernel size 3x3, each side of the inputs is zero-padded by one pixel to keep the feature-map size fixed.
For preprocessing, we normalize the data using the channel means and standard
deviations.
We adopt a standard data augmentation scheme (mirroring/shifting).
All the networks are trained using stochastic gradient descent (SGD). On CIFAR and SVHN we train using batch size 64 for 300 and 40 epochs, respectively. The initial learning rate is set to 0.1, and is divided by 10 at 50% and 75% of the total number of training epochs.（完全仿照ResNet的训练参数）
we use a weight decay of 10^-4 and a Nesterov momentum of 0.9 without dampening.（仿照ResNet）
we add a dropout layer after each convolutional layer (except the first one) and set the dropout rate to 0.2.

Memory-Efficient Implementation

现在开始介绍我碰到的大坑，就是tensorflow的concatenate需要把完全一样在复制一遍，居然不能索引。海量的内存被浪费，坑，希望改进改进。

作者们给出的高效实现方案

Shared storage for concatenation
Shared storage for batch normalization
Shared storage for gradients
Putting these pieces together

我的实现：https://github.com/RDShi/DenseNet

顺势案例一波tflearn，感觉十分的方便，终于不用些繁琐的tensorflow了，热泪盈眶。
下面简介一下tflean：

tflearn的优点:

TFLearn是完全基于tensorflow的高级API（TFLearn相当于把直接用tensorflow需要经常写的重复代码集成），使用非常直观，更重要的是和Tensorflow完全兼容（Extending Tensorflow）
TFLearn核心features是Layers和Training functions的API

Layers:

File	Layers
core	input_data, fully_connected, dropout, custom_layer, reshape, flatten, activation, single_unit, highway, one_hot_encoding, time_distributed
conv	conv_2d, conv_2d_transpose, max_pool_2d, avg_pool_2d, upsample_2d, conv_1d, max_pool_1d, avg_pool_1d, residual_block, residual_bottleneck, conv_3d, max_pool_3d, avg_pool_3d, highway_conv_1d, highway_conv_2d, global_avg_pool, global_max_pool
recurrent	simple_rnn, lstm, gru, bidirectionnal_rnn, dynamic_rnn
embedding	embedding
normalization	batch_normalization, local_response_normalization, l2_normalize
merge	merge, merge_outputs
estimator	regression

Built-in Operations: layer中的参数，也可以不用原生的自己写

File	Ops
activations	linear, tanh, sigmoid, softmax, softplus, softsign, relu, relu6, leaky_relu, prelu, elu
objectives	softmax_categorical_crossentropy, categorical_crossentropy, binary_crossentropy, mean_square, hinge_loss, roc_auc_score, weak_cross_entropy_2d
optimizers	SGD, RMSProp, Adam, Momentum, AdaGrad, Ftrl, AdaDelta
metrics	Accuracy, Top_k, R2
initializations	zeros, uniform, uniform_scaling, normal, truncated_normal, xavier, variance_scaling
losses	l1, l2

Training:

network = ... (some layers) ...
network = regression(network, optimizer='sgd', loss='categorical_crossentropy')

model = DNN(network)
model.fit(X, Y)

Evaluating & Predicting:

network = ...

model = DNN(network)
model.load('model.tflearn')
model.predict(X)

Visualization:

TFLearn提供不同详细程度（verbose level）的可视化效果:

0: Loss & Metric (Best speed).
1: Loss, Metric & Gradients.
2: Loss, Metric, Gradients & Weights.
3: Loss, Metric, Gradients, Weights, Activations & Sparsity (Best Visualization).

通过tensorboard查看驯良情况。

Weights persistence 保存模型

# Save a model
model.save('my_model.tflearn')
# Load a model
model.load('my_model.tflearn')

load会把除了权重以外所有的参数都load进来，如果只想要参数加入字段weights_only=True。因为有时候会改变optimizer之类的东西。

Fine-tuning

# Weights will be restored by default.
fc_layer = tflearn.fully_connected(input_layer, 32)
# Weights will not be restored, if specified so.
fc_layer = tflearn.fully_connected(input_layer, 32, restore='False')

Data Preprocessing and Data Augmentation

TFLearn data stream设计了computing pipelines来加速训练(再CPU上进行数据预处理，在GPU上训练模型)。

# Real-time image preprocessing
img_prep = tflearn.ImagePreprocessing()
# Zero Center (With mean computed over the whole dataset)
img_prep.add_featurewise_zero_center()
# STD Normalization (With std computed over the whole dataset)
img_prep.add_featurewise_stdnorm()

# Real-time data augmentation
img_aug = tflearn.ImageAugmentation()
# Random flip an image
img_aug.add_random_flip_leftright()
# Random crop an image
img_aug.add_random_crop([32, 32], padding=4)

# Add these methods into an 'input_data' layer
network = input_data(shape=[None, 32, 32, 3],
                     data_preprocessing=img_prep,
                     data_augmentation=img_aug)

Graph Initialization

配置gpu使用率之类的

tflearn.init_graph(set_seed=8888, num_cores=16, gpu_memory_fraction=0.5)

tip:
设置GPU内存根据需求增长

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.add_to_collection(tf.GraphKeys.GRAPH_CONFIG, config)

tflearn实现DenseNet