【笔记】tf 三种模型保存格式总结 checkpoint(*.ckpt)

1.checkpoint(*.ckpt)

1.1文件结构介绍：

---checkpoint

---model.ckpt-240000.data-00000-of-00001

---model.ckpt-240000.index

---model.ckpt-240000.meta
如图所示，Tensorflow模型主要包括两个方面内容：1）神经网络的结构图graph；2）已训练好的变量参数。

因此Tensorflow模型主要包含两个文件：

1）元数据图（meta graph）：

它保存了tensorflow完整的网络图结构。这个文件以 *.meta为拓展名；

2）检查点文件（checkpoint file）

这是一个二进制文件，它包含权重变量

，biases变量和其他变量。这个文件以 *.ckpt 为拓展名； PS：从 0.11版本之后就不是单单一个 .ckpt文件，除此之外还有一个 .index文件

,如下例所示：

1.mymodel.data-00000-of-00001

2.mymodel.index

3.checkpoint

其中 .data文件是包含训练变量的文件； .index是描述variable中key和value的对应关系；checkpoint文件是列出保存的所有模型以及最近模型的相关信息。因此，如果tensorflow版本高于0.10的话，文件如下：

---checkpoint

---model.ckpt-240000.data-00000-of-00001

---model.ckpt-240000.index

---model.ckpt-240000.meta

1.2 使用的介绍：
1.2.1 导出ckpt文件：

训练完成后，我们希望把所有的变量和网络结构保存下来，以供后面使用。在tensorflow中要保存这些所有信息，需使用：

tf.train.Saver（）

需要注意的是，tensorflow变量的作用范围是在一个session里面，所以在保存模型的时候，应该在session里面通过save方法保存。

import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my_test_model')

如果我们希望在迭代1000次之后保存模型，可以把当前的迭代步数传进去：

saver.save(sess, 'my_test_model',global_step=1000)

在训练的时候，假设每1000次就保存一次模型，但是这些保存的文件中变化的仅仅是神经网络的variable，而网络结构没有变化，没必要重复保存.meta文件

。所以我们可以设置只让网络结构保存一次：

saver.save(sess, 'my-model', global_step=step,write_meta_graph=False)

如果只想保留最新的4个模型，并希望每2个小时保存一次，可以使用max_to_keep和keep_checkpoint_every_n_hours:

#saves a model every 2 hours and maximum 4 latest models are saved.
saver = tf.train.Saver(max_to_keep=4, keep_checkpoint_every_n_hours=2)

PS: 如果没有在tf.train.Saver()指定任何参数，这样表示默认保存所有变量。如果我们不希望保存所有变量，而只是其中的一部分，此时我们可以指点要保存的变量或者集合：我们只需在创建tf.train.Saver的时候把一个列表或者要保存变量的字典作为参数传进去。

import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver([w1,w2])
sess = tf.Session()
sess.run(tf.global_variables_initializer())

saver.save(sess, 'my_test_model',global_step=1000)

1.2.2 导入ckpt文件：

1）从 .meta文件导入原始网络结构图：

saver = tf.train.import_meta_graph('my_test_model-1000.meta')

加载了网络结构图之后还需要加载变量数据。

2）加载变量

使用restore（）方法恢复模型的变量参数。

with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('my_test_model-1000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))

在此之后， w1和w2 的tensor已经恢复：

with tf.Session() as sess: 
saver = tf.train.import_meta_graph('my-model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
print(sess.run('w1:0'))
#Model has been restored. Above statement will print the saved value of w1.

1.2.3 从ckpt文件恢复训练模式

恢复任何预先训练的模型，并用它进行inference，fine-tuning或者进一步训练。在tensorflow中，如果有占位符，那么就需要将数据传入占位符中，但是当保存tensorflow模型的时候，占位符的数据是不会被保存的（占位符本身的变量是被保存的）。

import tensorflow as tf
 
#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict={w1:4,w2:8}
 
#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply

(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
 
#Create a saver object which will save all the variables
saver = tf.train.Saver()
 
#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1 
 
#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)

所以当需要恢复它，我们不仅要恢复网络结构和相关变量参数，而且还需要准备新的feed_dic(数据)传入占位符中。通过graph,get_tensor_by_name() 方法可以恢复所保存的占位符和opertor。比如下面的W1是一个占位符，op_to_restore是一个算子。

#How to access saved variable/Tensor/placeholders 
w1 = graph.get_tensor_by_name("w1:0")
 
## How to access saved operation
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")

完整的example：
import tensorflow as tf;
import os;

model_saving_path= "./checkpoint"
model_name = 'saving_restoring';


def save():
    w1 = tf.placeholder(dtype=tf.float32, name='w1');
    w2 = tf.placeholder(dtype=tf.float32, name='w2');
    b1 = tf.Variable(2.0, name='bias');
    feed_dict = {w1:4, w2:8};

    w3 = tf.add(w1, w2)
    w4 = tf.multiply(w3, b1, name='op_to_restore');
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver();
        print(sess.run(w4, feed_dict));
        saver.save(sess, os.path.join(model_saving_path, model_name), global_step=1000);


def restore0():
    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(
            os.path.join(model_saving_path, model_name+'-1000.meta'))
        saver.restore(sess, tf.train.latest_checkpoint(model_saving_path))

        graph = tf.get_default_graph();
        w1 = graph.get_tensor_by_name('w1:0');
        w2 = graph.get_tensor_by_name('w2:0');
        feed_dict = {w1:13.0, w2:17.0};

        op_to_restore = graph.get_tensor_by_name('op_to_restore:0');
        print(sess.run(op_to_restore, feed_dict))


def restore():
"""不能以这样的方式恢复占位符，会报错：
InvalidArgumentError (see above for traceback):
 You must feed a value for placeholder tensor 'w1_1' with dtype float
因为对于一个占位符而言，它所包含的不仅仅是占位符变量的定义部分，
还包含数据，而tensorflow不保存占位符的数据部分。
应通过graph.get_tensor_by_name的方式获取，然后在feed数据进去"""

    w1 = tf.placeholder(dtype=tf.float32, name='w1');
    w2 = tf.placeholder(dtype=tf.float32, name='w2');
    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(
            os.path.join(model_saving_path, model_name+'-1000.meta'))
        saver.restore(sess, tf.train.latest_checkpoint(model_saving_path))

        graph = tf.get_default_graph();
        # w1 = graph.get_tensor_by_name('w1:0');
        # w2 = graph.get_tensor_by_name('w2:0');
        feed_dict = {w1:13.0, w2:17.0};

        op_to_restore = graph.get_tensor_by_name('op_to_restore:0');
        print(sess.run(op_to_restore, feed_dict))

save()
restore0();

1.2.4 从ckpt文件恢复训练模式，并修改模型结构：

如果想在原来的神经网络途中添加更加多的层，然后训练它，在上面的例子中修改：

def restore2():
    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(
            os.path.join(model_saving_path, model_name+'-1000.meta'))
        saver.restore(sess, tf.train.latest_checkpoint(model_saving_path))

        graph = tf.get_default_graph();
        w1 = graph.get_tensor_by_name('w1:0');
        w2 = graph.get_tensor_by_name('w2:0');
        feed_dict = {w1:13.0, w2:17.0};

        op_to_restore = graph.get_tensor_by_name('op_to_restore:0');
        # Add more to the current graph
        add_on_op = tf.multiply(op_to_restore, 2)
        print(sess.run(add_on_op, feed_dict))
        # This will print 120.

如果我只想恢复神经网络的一部分参数或者一部分算子，然后利用这一部分参数或者算子构建新的神经网络模型：我们可以使用graph.get_tensor_by_name() 方法。下面是个例子，在这里我们使用.meta加载一个预训练好的VGG网络，并做一些修改：

......
......
saver = tf.train.import_meta_graph('vgg.meta')
# Access the graph
graph = tf.get_default_graph()
## Prepare the feed_dict for feeding data for fine-tuning 

#Access the appropriate output for fine-tuning
fc7= graph.get_tensor_by_name('fc7:0')

#use this if you only want to change gradients of the last layer


fc7 = tf.stop_gradient(fc7) # It's an identity function
fc7_shape= fc7.get_shape().as_list()

new_outputs=2
weights = tf.Variable(tf.truncated_normal([fc7_shape[3], num_outputs], stddev=0.05))
biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))
output = tf.matmul(fc7, weights) + biases
pred = tf.nn.softmax(output)

# Now, you run this with fine-tuning data in sess.run()