Tensorflow、Keras使用过程中的问题

Windows

Tensorflow:1.0.0,for gpu
操作系统:Windows 7
cuda:8.0
cudnn:5.1
Python:3.5
gpu:GeForce GT 720

1. cuda安装后找不到deviceQuery.exe

按照nvdia的官方教程(http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#driver-model) ,C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64\Release下面有一个deviceQuery程序,用于检测cuda是否正常安装。但我的文件夹下没有这个文件。

起初因为我装的是Visual Studio 2017,而官网上没说支持2017.因此换成了2015.然后Google到有人和我同样的问题(https://devtalk.nvidia.com/default/topic/831862/devicequery-exe-cannot-be-located-after-install/) 。

Ok, I think I resolved it. BTW, I am an NVIDIA employee and I’m very motivated to get this running, but it has taken me a lot of time to resolve this. Even though to most CUDA developers these are trivial, I was very close to just giving up many times. I suggest a few more updates to the getting started directions:

Install Microsoft Visual Studios 2015 or before. The default on the Microsoft website as well as the pointer that is given to us during CUDA installation points us to 2017. VS 2017 simply won’t work and will cause headaches, and VS 2015 takes some determination and digging to download and install.
deviceQuery.exe doesn’t exist, it needs to be compiled. Users need to start up Visual Studio 2015, open the corresponding Microsoft Visual Studio Solution File (e.g., “Samples_vs2015.sln”) and then “Build Solution”.
The file will not be in the “Release” directory as mentioned in the CUDA directions, it is in the “Debug” directory.

意思就是说用vs2015打开Samples_vs2015.sln然后生成解决方案,在Debug文件夹中找到了这个程序。

2. log_device_placement=True打印不出log

之前用的Python Shell,运行TensorFlow官网的验证程序

import tensorflow as tf
# Creates a graph.
# with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出台没有log信息,把IDE换成pycharm后可以正常输出。

3. TensorFlow无法使用gpu

第1、2个问题解决后,log如下

Device mapping: no known devices.
MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

还是无法检测到gpu,参照 https://stackoverflow.com/questions/42326748/tensorflow-on-gpu-no-known-devices-despite-cudas-devicequery-returning-a-pas 上面的方法,可能是之前下载了cpu版本的TensorFlow。我不记得有没有下过cpu版,可能当时搞错了吧。总之先把它卸载掉,然后重新下载gpu版,问题解决。

4. module ‘tensorflow’ has no attribute ‘constant’

这是第3个问题中卸载之后出现的问题,当时用pip安装了cpu和gpu两个版本,发现用的是cpu版之后,把cpu版卸载掉,再运行程序出现这个错误。具体原因不了解,解决方法是把gpu版也卸载掉,再重新下载gpu版。

$ pip uninstall tensorflow-gpu
$ pip install tensorflow-gpu

Ubuntu

OS:Ubuntu 16.04
keras :2.1.1
tensorflow:1.4.0-rc1
cuda:8.0
cudnn:5.1
Python:2.7
gpu:GeForce GTX 1080 Ti

1. ValueError: Tensor Tensor(“Sigmoid_2:0”, shape=(?, 17), dtype=float32) is not an element of this graph.

部署keras模型到Flask框架上,在子线程中跑算法,

thread.start_new_thread(recognize, (args,))

报错。不开线程没有问题。原因是加载模型和运行模型在不同的线程,导致graph不一致。解决方案参考:https://github.com/fchollet/keras/issues/2397

class Serve(Base):

def start_socket_server(self, port, model_path):

    # Load model files
    self.model = load_model(model_path)
    self.model._make_predict_function()
    self.graph = tf.get_default_graph()#在这里保存graph

    ...
    some code
    ...

    while True:
        t = threading.Thread(target=self.client_thread, args=(conn, ))
        t.setDaemon(True)
        t.start()

#下面是子线程的方法
def client_thread(self, conn):

    ...
    some code
    ...

    while True:
        ...
        some code
        ...
        # predict
        with self.graph.as_default():#使用保存的graph
            labels = self.model.predict(feature_segments_padded)

猜你喜欢

转载自blog.csdn.net/xiang_freedom/article/details/78627260