anaconda use and installation tensorflow-gpu

Easy to use Anaconda

Create a new environment

conda create -n rcnn python=3.6

Delete the environment

conda remove -n rcnn --all

Rename environment

conda fact, did not rename command to rename achieved by clone is done in two steps:

  • To clone a new name of the environment
  • Delete the old name of the environment

For example, trying to rename environment rcnn tf

step 1

conda create -n tf --clone rcnn
Source:      /anaconda3/envs/rcnn
Destination: /anaconda3/envs/tf
Packages: 37
Files: 8463

Step 2

conda remove -n rcnn --all

result

conda info -e
# conda environments:
#
crawl                    /anaconda3/envs/crawl
flask                    /anaconda3/envs/flask
tf                       /anaconda3/envs/tf
root                  *  /anaconda3

tensorflow gpu installation

First, the graphics card must support

I did not expect that GTX 1050TI, GTX 1070TI other mainstream graphics cards do not actually support

(Fortunately, I bought a GTX 1050)

(I did not need a hint TESLA)

Click Here to See CUDA Support List

Secondly, the need for good version number, different versions of TensorFlow corresponding CUDA driver version numbers are different

However, this is not enough, you also need to install CUDNN perfect run, CUDNN CUDA version number and version number but also on good

CUDA Offline Download URL

CUDNN download URL

But the need to register to download CUDNN NVIDIA account, then click join registered myself

Registration beginning I used the QQ mailbox, it stands to reason that nothing wrong

But to validate mailbox step by belch fart

Your e-mail it to verify, verify the e-mail, mail it? ? ? ? ?

After Baidu multi-party review, the original can not be used QQ-mail

Pit father was over three hours and it sent me, yes, that QQ mailbox, it sent me. . .

But I 163 mailboxes are registered good. . . . .

So I use 163 mailbox registered an account

She finally download

Download finished very ignorant force

Which is compressed so long:

img

The use of such things is completely beyond the scope of my abilities Yeah, how do

Ever since they Baidu, the original is placed in the installation directory CUDA Yeah. . . .

Good installed, you can use audibility Niangshuo installation directory \ extras \ bandwidthTest.exe and deviceQuery.exe be detected under demo_suite

It seems to be no problem detected

img

(Picture by a pause pause to view)

Then complete environment equipped, to the much-anticipated installation link

pip install tensorflow-gpu

Of course, you need to uninstall the previous version of tensorflow

20KB / s at high speed not know how long

Anyway, finally installed a

Something like this

img

It looks pretty good there

But run with it

。。。。。。。

The following error I bear, red one. . . . .

(The picture is too bloody, has been shielded)

Then continue to help omnipotent degree of your mother

Finally I found this post

Win10 +VS2017+ python3.66 + CUDA10 + cuDNNv7.3.1 + tensorflow-gpu 1.12.0

You do not support early CUDA10.0 Well, I spent so vigorously victims

So I looked at the postings in the installation package which is included with the creation of big brother

tensorflow_gpu-1.12.0-cp36-cp36m-win_amd64.whl

Cudal0k0 + Cudnhn7k3kl

Then ran to reinstall CUDNN7.3.1

And then the installation package directory cd

pip install tensorflow_gpu-1.12.0-cp36-cp36m-win_amd64.whl

Critical nonetheless, the final installation is complete

img

Figure

(At that time I was excited to see can be miserable)


At this point the installation is complete


Now that it is installed to test myself, not test, then seemed very own fishing

Mother of help to find the code for big brother wrote five convolution neural network

Tensorflow AlexNet comparison operation efficiency of the CPU and GPU

For simplicity, it is placed directly after the Magic's big brother code change

Copy the code

  1 from datetime import datetime
  2 import math
  3 import time
  4 import tensorflow as tf
  5 import os
  6 #os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
  7 #os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
  8 batch_size = 32
  9 num_batches = 100
 10 # 该函数用来显示网络每一层的结构,展示tensor的尺寸
 11 
 12 def print_activations(t):
 13     print(t.op.name, ' ', t.get_shape().as_list())
 14 
 15 # with tf.name_scope('conv1') as scope  # 可以将scope之内的variable自动命名为conv1/xxx,便于区分不同组件
 16 
 17 def inference(images):
 18     parameters = []
 19     # 第一个卷积层
 20     with tf.name_scope('conv1') as scope:
 21         # 卷积核、截断正态分布
 22         kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64],
 23                                                  dtype=tf.float32, stddev=1e-1), name='weights')
 24         conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
 25         # 可训练
 26         biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
 27         bias = tf.nn.bias_add(conv, biases)
 28         conv1 = tf.nn.relu(bias, name=scope)
 29         print_activations(conv1)
 30         parameters += [kernel, biases]
 31         # 再加LRN和最大池化层,除了AlexNet,基本放弃了LRN,说是效果不明显,还会减速?
 32         lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn1')
 33         pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')
 34         print_activations(pool1)
 35     # 第二个卷积层,只有部分参数不同
 36     with tf.name_scope('conv2') as scope:
 37         kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')
 38         conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
 39         biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')
 40         bias = tf.nn.bias_add(conv, biases)
 41         conv2 = tf.nn.relu(bias, name=scope)
 42         parameters += [kernel, biases]
 43         print_activations(conv2)
 44         # 稍微处理一下
 45         lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn2')
 46         pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')
 47         print_activations(pool2)
 48     # 第三个
 49     with tf.name_scope('conv3') as scope:
 50         kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')
 51         conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
 52         biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')
 53         bias = tf.nn.bias_add(conv, biases)
 54         conv3 = tf.nn.relu(bias, name=scope)
 55         parameters += [kernel, biases]
 56         print_activations(conv3)
 57     # 第四层
 58     with tf.name_scope('conv4') as scope:
 59         kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')
 60         conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
 61         biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
 62         bias = tf.nn.bias_add(conv, biases)
 63         conv4 = tf.nn.relu(bias, name=scope)
 64         parameters += [kernel, biases]
 65         print_activations(conv4)
 66     # 第五个
 67     with tf.name_scope('conv5') as scope:
 68         kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
 69         conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
 70         biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
 71         bias = tf.nn.bias_add(conv, biases)
 72         conv5 = tf.nn.relu(bias, name=scope)
 73         parameters += [kernel, biases]
 74         print_activations(conv5)
 75         # 之后还有最大化池层
 76         pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')
 77         print_activations(pool5)
 78         return pool5, parameters
 79 # 全连接层
 80 # 评估每轮计算时间,第一个输入是tf得Session,第二个是运算算子,第三个是测试名称
 81 # 头几轮有显存加载,cache命中等问题,可以考虑只计算第10次以后的
 82 def time_tensorflow_run(session, target, info_string):
 83     num_steps_burn_in = 10
 84     total_duration = 0.0
 85     total_duration_squared = 0.0
 86     # 进行num_batches+num_steps_burn_in次迭代
 87     # 用time.time()记录时间,热身过后,开始显示时间
 88     for i in range(num_batches + num_steps_burn_in):
 89         start_time = time.time()
 90         _ = session.run(target)
 91         duration = time.time() - start_time
 92         if i >= num_steps_burn_in:
 93             if not i % 10:
 94                 print('%s:step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
 95             total_duration += duration
 96             total_duration_squared += duration * duration
 97         # 计算每轮迭代品均耗时和标准差sd
 98         mn = total_duration / num_batches
 99         vr = total_duration_squared / num_batches - mn * mn
100         sd = math.sqrt(vr)
101         print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd))
102 def run_benchmark():
103     # 首先定义默认的Graph
104     with tf.Graph().as_default():
105         # 并不实用ImageNet训练,知识随机计算耗时
106         image_size = 224
107         images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
108         pool5, parameters = inference(images)
109         init = tf.global_variables_initializer()
110         sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))
111         sess.run(init)
112         # 下面直接用pool5传入训练(没有全连接层)
113         # 只是做做样子,并不是真的计算
114         time_tensorflow_run(sess, pool5, "Forward")
115         # 瞎弄的,伪装
116         objective = tf.nn.l2_loss(pool5)
117         grad = tf.gradients(objective, parameters)
118         time_tensorflow_run(sess, grad, "Forward-backward")
119 run_benchmark()

Copy the code

If you use TensorFlow-GPU, then the default is to run with the GPU

GPU operating results:

img

img

GPU usage:

img

CPU usage:

img

As can be seen more occupy memory

The 6-7 line comment code above release operation is the CPU

CPU operating results:

img

img

CPU utilization:

img

I have come to 3.4GHZ 2.8GHZ CPU of the

So for me it's really good CPU


Test Results:

Forward GPU run-time efficiency is 8.42 times the operating efficiency of CPU

Reverse GPU run-time efficiency is 12.50 times the operating efficiency of CPU

GPU and GPU occupancy rate in operating mode with only about 65%, CPU occupancy rate with only about 45%

And the CPU occupancy rate of the CPU mode time reached 100%, and inefficient

GPU seen directly executed by the CPU after blasting


Precautions:

1. This test uses only a convolution neural network operation, does not mean that all cases GPU must have an advantage;

2. In view of the CPU bottleneck, the CPU may run efficiency is not very satisfactory, the use of more high-end CPU operating results may be improved significantly;

Guess you like

Origin www.cnblogs.com/icodeworld/p/11058904.html