tensorflow model persistence save and load

Save the model file


tensorflow will generate 4 files to keep the model locally:

meta file: saves the graph structure of the network, including variables, ops, sets and other information

ckpt file: binary file, which saves the values ​​of all variables such as weights and offsets in the network. It is divided into two files, one is the .data-00000-of-00001 file, and the other is the .index file

checkpoint file: a text file that records the list of the latest 5 model files maintained


Model saving in tf uses the tf.train.Saver class to save the model. How to use:

1. Generate a model save object outside the Session

saver = tf.train.Saver()

2. Use the current environment Session as a parameter in the Session to save the model to the local disk

saver.save(sess,"./model/Model_test")

Constructor definition of Saver class:

def __init__(self,
               var_list=None,
               reshape=False,
               sharded=False,
               max_to_keep=5,
               keep_checkpoint_every_n_hours=10000.0,
               name=None,
               restore_sequentially=False,
               saver_def=None,
               builder=None,
               defer_build=False,
               allow_empty=False,
               write_version=saver_pb2.SaverDef.V2,
               pad_step_number=False,
               save_relative_paths=False,
               filename=None):
Several commonly used variables:
  • var_list: Specify a sequence or dictionary of variables to save, the default is None, save all variables
  • reshape: optional parameter, if it is True, it means that the variable is allowed to be saved in different shapes, if it is False, it means that the preserved variable can only have the same shape and data type, the default is False;
  • max_to_keep: Define how many recent model files are saved at most, the default is 5;
  • keep_checkpoint_every_n_hours: Define how many hours to save the model once, the default is 10000 hours;
  • name: optional parameter, the prefix added to the operation name, the default is None;
  • restore_sequentially: Defines whether to restore variables in sequence on the device. Sequential restoration can reduce the use of internal parameters. The default is False;
  • saver_def: optional parameter, used when the Saver object needs to be rebuilt, the default is None;
  • allow_empty: whether to allow saving an empty map without any variables, default False;

saver.save function definition:

def save(self,
           sex,
           save_path,
           global_step=None,
           latest_filename=None,
           meta_graph_suffix="meta",
           write_meta_graph=True,
           write_state=True,
           strip_default_attrs=False):
Common parameters:
  • sess: the current session environment;
  • save_path: model save path;
  • global_step: training round, if added, the suffix of this round will be added to the name of the model file, the default is None, not added, it is best to set this parameter, otherwise the model file will be overwritten due to the same name.
  • latest_filename: The name of the checkpoint text file, default is 'checkpoint'
  • meta_graph_suffix: The suffix of the saved network graph structure file, the default is mata;
  • write_meta_graph: Define whether to save the network structure, the default is True to save, because the network structure will not change during the training process, so after saving once, you can set write_meta_graph to False, without saving the graph structure every time;

Simple example, X and Y in the following program are a list with 128 elements, each element is a two-dimensional array, define the formula Y = (X*w1+b1)*(w2)+b2 , use the tensorflow network to iteratively find The optimal solution of w and b, keep the model to the local model_test folder after completion.

# -*- coding: utf-8 -*-)
import tensorflow as tf
from numpy.random import RandomState

# Define the size of the training data batch
batch_size = 8

# Use None on the shape to indicate that the specific value of the dimension is uncertain
x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input')
y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')

# Define the parameters of the neural network
w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
bias1 = tf.Variable(tf.random_normal([3], stddev=1, seed=1))
bias2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))

# 定义神经网络前向传播的过程,即操作
a = tf.nn.relu(tf.matmul(x, w1) + bias1)
y = tf.nn.relu(tf.matmul(a, w2) + bias2)

# 定义损失函数和反向传播算法
loss = tf.reduce_sum(tf.pow((y - y_), 2))
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)  # 梯度下降优化算法

# produce the data,通过随机数生成一个模拟数据集
rdm = RandomState(seed=1)  # 设置seed = 1 ,使每次生成的随机数一样
dataset_size = 128
X = rdm.rand(dataset_size, 2)
Y = [[x1 + 10 * x2] for (x1, x2) in X]

# 生成一个保持模型对象
saver = tf.train.Saver()

# creare a session,创建一个会话来运行TensorFlow程序
with tf.Session() as sess:

    # 初始化变量
    sess.run(tf.global_variables_initializer())

    # 设定训练的轮数
    STEPS = 10000
    for i in range(STEPS + 1):
        # get batch_size samples data to train,每次选取batch_size个样本进行训练
        start = (i * batch_size) % dataset_size
        end = min(start + batch_size, dataset_size)

        # 通过选取的样本训练神经网络并更新参数
        sess.run(train_step, feed_dict={x: X[start: end], y_: Y[start: end]})
        if i % 500 == 0:
            # 每隔一段时间计算在所有数据上的loss并输出
            total_cross_entropy= sess.run([loss], feed_dict={x: X, y_: Y})
            print ("steps: {}, total loss: {}".format(i,total_cross_entropy))

    # 在训练结束之后,保持神经网络模型
    saver.save(sess, "./model_saved/model_test")

    print sess.run((w1,bias1))
    print('^^^^^^^^^^^^^^^^^^^^^^^^^')
    print sess.run((w2,bias2))
    
# output:
# steps: 0, total loss: [2599.938]
# steps: 500, total loss: [873.66064]
# steps: 1000, total loss: [667.79114]
# steps: 1500, total loss: [483.07538]
# steps: 2000, total loss: [300.2436]
# steps: 2500, total loss: [159.57596]
# steps: 3000, total loss: [74.0152]
# steps: 3500, total loss: [30.022282]
# steps: 4000, total loss: [10.848581]
# steps: 4500, total loss: [3.8684735]
# steps: 5000, total loss: [1.6775348]
# steps: 5500, total loss: [0.87090385]
# steps: 6000, total loss: [0.47393078]
# steps: 6500, total loss: [0.2628175]
# steps: 7000, total loss: [0.13229856]
# steps: 7500, total loss: [0.058554076]
# steps: 8000, total loss: [0.022747971]
# steps: 8500, total loss: [0.007896027]
# steps: 9000, total loss: [0.002599821]
# steps: 9500, total loss: [0.0007222026]
# steps: 10000, total loss: [0.00021833208]
# (array([[-0.8113182 ,  0.741788  , -0.06654923],
#        [-2.4427042 ,  1.7258024 ,  3.505848  ]], dtype=float32), array([-0.8113182 ,  0.9206883 , -0.00473781], dtype=float32))
# ^^^^^^^^^^^^^^^^^^^^^^^^^
# (array([[-0.8113182],
#        [ 1.5360606],
#        [ 2.0962803]], dtype=float32), array([-1.4044524], dtype=float32))

经过10000次迭代之后完成训练,在本地model_test目录下创建了模型的4个文件:





模型文件的加载


模型文件的图结构跟数据是分开保存的,加载模型时候可以先加载图结构,再加载图中的参数(在Session中操作):

saver=tf.train.import_meta_graph('./model_saved/model_test.meta')
saver.restore(sess, tf.train.latest_checkpoint('./model_saved'))

或者一次性加载:

saver = tf.train.Saver()
saver.restore(sess, './model_saved/model_test')
或:
saver.restore(sess, tf.train.latest_checkpoint('./model_saved'))

‘model_test’是保存的模型文件名称(前缀名,不带后缀)


更加安全一点的加载方式,先判断模型文件是否存在判断(推荐使用这种方式):

ckpt = tf.train.get_checkpoint_state('./model_saved')
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)

# -*- coding: utf-8 -*-)
import tensorflow as tf
from numpy.random import RandomState

# 定义训练数据batch的大小
batch_size = 8

# 在shape上使用None表示该维度的具体数值不定
x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input')
y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input')

# 定义神经网络的参数
w1 = tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
bias1 = tf.Variable(tf.random_normal([3], stddev=1, seed=1))
bias2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))

# 定义神经网络前向传播的过程,即操作
a = tf.nn.relu(tf.matmul(x, w1) + bias1)
y = tf.nn.relu(tf.matmul(a, w2) + bias2)

# produce the data,通过随机数生成一个模拟数据集
rdm = RandomState(seed=1)  # 设置seed = 1 ,使每次生成的随机数一样
dataset_size = 128
X = rdm.rand(dataset_size, 2)
Y = [[x1 + 10 * x2] for (x1, x2) in X]

# creare a session,创建一个会话来运行TensorFlow程序
with tf.Session() as sess:
    saver = tf.train.import_meta_graph('./model_saved/model_test.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./model_saved'))

    # 初始化变量
    sess.run(tf.global_variables_initializer())

    print(sess.run(y,feed_dict={x: X[0: 10], y_: Y[0: 10]}))
    
# output:
# [[2.4518511]
#  [1.4534602]
#  [1.7382364]
#  [1.8725655]
#  [2.3733683]
#  [2.4501202]
#  [2.0117776]
#  [1.582149 ]
#  [2.4224167]
#  [1.7438407]]

tf.train.Saver常用函数列表:

操作 描述
类tf.train.Saver(Saving and Restoring Variables)  
tf.train.Saver.__init__(var_list=None, reshape=False,
sharded=False, max_to_keep=5,
keep_checkpoint_every_n_hours=10000.0,
name=None, restore_sequentially=False,
saver_def=None, builder=None)
创建一个存储器Saver
var_list定义需要存储和恢复的变量
tf.train.Saver.save(sess, save_path, global_step=None,
latest_filename=None, meta_graph_suffix=’meta’,
write_meta_graph=True)
保存变量
tf.train.Saver.restore(sess, save_path) 恢复变量
tf.train.Saver.last_checkpoints 列出最近未删除的checkpoint 文件名
tf.train.Saver.set_last_checkpoints(last_checkpoints) 设置checkpoint文件名列表
tf.train.Saver.set_last_checkpoints_with_time(last_checkpoints_with_time) 设置checkpoint文件名列表和时间戳

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324825651&siteId=291194637