批量图片验证模型错误： OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv1_1/bias not found

最近测试Alexnet模型时遇到了一个问题：训练完成后想对多个图片进行检测，但是模型在计算出第一个图片后，再计算第二个就会出错（模型训练及测试代码参见：https://github.com/stephen-v/tensorflow_alexnet_classify）：

OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv1_1/bias not found in checkpoint

参考网上说的，增加tf.reset_default_graph()（tf.reset_default_graph函数用于清除默认图形堆栈并重置全局默认图形）但是仍会出错，最后发现是定义Alexnet网络中的with tf.name_scope(‘xxx’) as scope 导致的。将with…as…结构删除后，再加上tf.reset_default_graph()，重新训练后再批量测试图片就没有问题了。

原来的Alexnet 模型代码部分示例代码：

import tensorflow as tf

def alexnet(x, keep_prob, num_classes):
    # conv1
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96], dtype=tf.float32,
                                             stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(x, kernel, [1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)

    # lrn1
    with tf.name_scope('lrn1') as scope:
        lrn1 = tf.nn.local_response_normalization(conv1,
                                                  alpha=1e-4,
                                                  beta=0.75,
                                                  depth_radius=2,
                                                  bias=2.0)
	# pool1
    with tf.name_scope('pool1') as scope:
    	pool1 = tf.nn.max_pool(lrn1, 
    						 ksize=[1, 3, 3, 1],
                             strides=[1, 2, 2, 1],
                             padding='VALID')
   # 后面的省略...

修改后的代码为（把所有with tf.name_scope(‘xxx’) as scope去掉）：

import tensorflow as tf

def alexnet(x, keep_prob, num_classes):
    # conv1
    kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96], dtype=tf.float32,
                                         stddev=1e-1), name='weights')
    conv = tf.nn.conv2d(x, kernel, [1, 4, 4, 1], padding='SAME')
    biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
                         trainable=True, name='biases')
    bias = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(bias, name = 'conv1')

    # lrn1
    # with tf.name_scope('lrn1') as scope:
    lrn1 = tf.nn.local_response_normalization(conv1,
                                                  alpha=1e-4,
                                                  beta=0.75,
                                                  depth_radius=2,
                                                  bias=2.0)

    # pool1
    # with tf.name_scope('pool1') as scope:
    pool1 = tf.nn.max_pool(lrn1, 
    						 ksize=[1, 3, 3, 1],
                             strides=[1, 2, 2, 1],
                             padding='VALID')

    # 后面的省略...

批量测试代码为：

import tensorflow as tf
from alexnet import alexnet
import matplotlib.pyplot as plt
from os import walk, path
VGG_MEAN = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32)

class_name = ['dog', 'cat']

def test_image(path_image, num_class):
    img_string = tf.read_file(path_image)
    img_decoded = tf.image.decode_png(img_string, channels=3)
    img_resized = tf.image.resize_images(img_decoded, [224, 224])
    # img_centered = tf.subtract(img_resized, VGG_MEAN)
    img_resized = tf.reshape(img_resized, shape=[1, 224, 224, 3])
    # img_bgr = img_centered[:, :, ::-1]
    fc8 = alexnet(img_resized, 1, num_class)
    score = tf.nn.softmax(fc8)
    max = tf.argmax(score, 1)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver.restore(sess, "./checkpoints/model_epoch80.ckpt")
        print(sess.run(fc8))
        prob = sess.run(max)[0]
        output = class_name[prob]
        plt.imshow(img_decoded.eval())
        plt.title("Class:" + class_name[prob])
        plt.show(2)

def get_path_prex(rootdir):
    data_path = []
    prefixs = []
    for root, dirs, files in walk(rootdir, topdown=True):
        for name in files:
            pre, ending = path.splitext(name)
            if ending != ".jpg" and ending != ".png":
                continue
            else:
                data_path.append(path.join(root, name))
                prefixs.append(pre)

    return data_path, prefixs

img_path, prefix = get_path_prex('./Datasets/dog_cat/test/')
cnt_fire = 0
for i in range(len(img_path)):
    tf.reset_default_graph() #这个要加上，每次测试前要重新建立图
    output = test_image(img_path[i], num_class=2)

其中tf.reset_default_graph()是必须要增加的。

总结：
（1）由于训练代码中含有with tf.name_scope(‘xxx’) as scope去掉，导致直接增加tf.reset_default_graph()也仍报上述错误(参考：https://blog.csdn.net/LeeGe666/article/details/85806790)
（2）将代码中的with tf.name_scope(‘xxx’) as scope去掉后重新训练模型，并且在测试代码中增加tf.reset_default_graph() 错误解决。

参考链接：

https://blog.csdn.net/bc521bc/article/details/84038471
tf.name_scope()”有什么用：https://www.jianshu.com/p/635d95b34e14
tf.reset_default_graph()函数：https://blog.csdn.net/duanlianvip/article/details/98626111

南洲.

发布了62 篇原创文章 · 获赞 83 · 访问量 3万+

私信关注

批量图片验证模型错误： OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv1_1/bias not found

猜你喜欢