Tensorflow2.0 grammar - tensor & basic function (a)

 

Transfer from  https://segmentfault.com/a/1190000020413887

Foreword

TF2.0 content previously learned, was written in a private YNote, the rewriting in SF.
TF2.0-GPU installation tutorial Portal: HTTPS: //segmentfault.com/a/11 ...
before contact with TF1, manual session mechanism, it is looking at a headache. These do not need TF2.0
TF2.0 easier to understand (and gradually Numpic a Pythonic)
TF2.0 backend interfaces using keras (construct network layer), and more convenient.
TF2.0 model layer keras interface definition, have achieved a method call. Most instances means that objects can be called directly as a function

The ranks of the shaft

A list, for example (abstract example, pile up slice of bread ....)

[               # 最外层,无意义不用记 
    [1,2,3],    # 面包片1  (第一个样本)
    [4,5,6], # 面包片2 (第二个样本) ] 
  1. Each inner list represents a sample time, for example [1,2,3] represents the first whole sample
  2. Innermost element represents the attribute value. eg: 1,2,3 single out all the attribute values.
  3. Examples: Element 5 separate out, it is regarded as "a second sample, the attribute value of 5" (horizontal and vertical course index are still taken from 0)

With just data, for example:

t = tf.constant(
    [
        [1., 2., 3.], 
        [4., 5., 6.] ] ) print(tf.reduce_sum(t, axis=0)) # 求和操作,上下压扁, 聚合样本 >> tf.Tensor([5. 7. 9.], shape=(3,), dtype=float32) print(tf.reduce_sum(t, axis=1)) # 求和操作,左右压扁, 聚合属性 >> tf.Tensor([ 6. 15.], shape=(2,), dtype=float32) 

Note: Numpy axis is the case, I initially with x, y-axis to remember abstract way, basically can not remember. . Too much conceptual confusion.
But if you can not remember, every time you use a variety of operating and aggregation API, will spend a lot of time in their own re-psychological reason again. waste time.

So you have to practice to understand, to do: "glance, the data will be able to know the meaning of this dimension, as well as the significance axis operation."

My own memories of the way (axis = 0, axis = 1):

  1. 0 axis is generally representative of the sample (flattened down)
  2. 1 generally represents the shaft, the properties (approximately flattened)

Often need to function with the associated axis parameter polymerization:

tf.reduce_sum()  # 求和
tf.reduce_mean() # 平均值
tf.reduce_max()  # 最大值
tf.reduce_min()  # 最小值
tf.square()      # 平方 tf.concat() # 拼接 注: 如果 axis参数 "不传", 那么"所有维度"都会被操作。 

Common import

# 基本会用到的
import numpy as np
import tensorflow as tf from tensorflow import keras # 可选导入 import os, sys, pickle import scipy import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler # 标准化 from sklearn.model_selection import train_test_split # 训测分离 

Tensor & operator

Constant (ordinary tensor)

definition:

c = tf.constant( [[1., 2., 3.], [4., 5., 6.]] ) # 数字后面加个点代表 转float32 类型 print(c) >> tf.Tensor([[1. 2. 3.] [4. 5. 6.]], shape=(2, 3), dtype=float32)

Six operations (addition, subtraction, matrix multiplication, matrix transpose)

He said first matrix multiplication (University have learned, operation process does not say):

语法格式:  a @ b
条件要求:  a的列数 === b的行数     (必须相等)
eg:       (5行2列 @ 2行10列 = 5行10列) 特例: (第0维度,必须相等) t1 = tf.ones([2, 20, 30]) t2 = tf.ones([2, 30, 50]) print( (t1@t2).shape ) >> (2, 20, 50) # 第0维没变, 后2维照常按照矩阵乘法运算

Matrix transpose:

tf.transpose(t)
# 不仅可以普通转置,还可以交换维度
t2 = tf.transpose(t,[1,0])    # 行变列,列变行。 和基本的转置差不多(逆序索引,轴变逆序)

# 或假如以 (2,100,200,3)形状 为例 t = tf.ones([2, 100, 200, 3]) print(tf.transpose(t, [1, 3, 0, 2]).shape) # 轴交换位置 >> (100, 3, 2, 200) # 原1轴 -> 放在现在0轴 # 原3轴 -> 放在现在1轴 # 原0轴 -> 放在现在2轴 # 原2轴 -> 放在现在3轴

Math has "broadcast mechanism":
image (broadcast mechanism to explain) explained:

我尝试用白话解释下:
    1. 我形状和你不一样, 但我和你运算的时候,我会尽力扩张成 你的形状  来和你运算。
    2. 扩张后如果出现空缺, 那么把自己复制一份,填补上 (如果补不全,就说明不能运算)
    3. 小形状 服从 大形状 (我比你瘦,我动就行。 你不用动。。。)
    eg:
        t = tf.constant(
            [
                [1, 2, 3],
                [4, 5, 6],
            ]
        )
        t + [1,2,1] 
    过程分析:
        [1,2,1] 显然是 小形状, 它会自动尝试变化成大形状 -> 第一步变形(最外层大框架满足, 里面还有空缺): [ [1,2,1], ] 第二步变形 (把自己复制,然后填补空缺): [ [1,2,1], [1,2,1], # 这就是复制的自己 ] 第三步运算(逐位相加) [ + [ = [ [1,2,3], [1,2,1], [2,4,4], [4,5,6], [1,2,1], [5,7,7], ] ] [

Abstract (broadcast mechanism) demonstrates:

假如 t1 的 shape为 [5,200,100,50]
假如 t2 的 shape为 [5,200] 注意:我以下的数据演示,全是表示 Tensor的形状,形状,形状! [5,200,1,50] # 很明显,开始这2行数据 维度没匹配, 形状也没对齐 [5,1] ------------------------ [5,200,1,50] [5,50] # 这行对齐补50 ------------------------ [5,200,5,50] # 这行对齐补5 [5,50] ------------------------ [5,200,5,50] [1, 1, 5,50] # 这行扩张了2个, 默认填1 ------------------------ [5,200,5,50] [1,200, 5,50] # 这行对齐补200 ------------------------ [5,200,5,50] [5,200,5,50] # 这行对齐补5 注意: 1. 每个维度形状:二者必须有一个是1, 才能对齐。 (不然ERROR,下例ERROR->) [5,200,1,50] [5,20] # 同理开始向右对齐,但是 50与20都不是1,所以都不能对齐,所以ERROR 2. 若维度缺失: 依然是全部贴右对齐 然后先从右面开始,补每个维度的形状 然后扩展维度,并默认设形状为1 然后补扩展后维度的形状(因为默认设为1了,所以是一定可以补齐的)

Of course, the above said mechanism automatically broadcast when all operations
you can also manually broadcast:

t1 = tf.ones([2, 20, 1])                # 原始形状  【2,20,1】 print(tf.broadcast_to(t1, [5,2,20,30]).shape) # 目标形状【5,2,20,30】 [5,2,20,30] [2,20, 1] ----------- [5,2,20,30] [2,20,30] ----------- [5,2,20,30] [1,2,20,30] ----------- [5,2,20,30] [5,2,20,30] 注:因为是手动广播,所以只能 原始形状 自己向 目标形状 ”补充维度,或者补充形状“ 而目标形状是一点也不能动的。

Expansion dimension (f.expand_dims) + copy (the tile) instead of => Broadcast (tf.broadcasting)
likewise the above example, I want to shape [2,20,1] into [5,2,20,30]

t1 = tf.ones([2, 20, 1])
a = tf.expand_dims(t1,axis=0)  # 0轴索引处插入一个轴, 结果[1,2,20,1] print(tf.tile(a,[5,1,1,30]).shape) # 结果 [5, 2, 20, 30] 流程: [5,2,20,30] [2,20,1] ----------- [5,2,20,30] # tf.expand_dims(t1,axis=0) [1,2,20,1] # 0号索引插入一个新轴(增维) ----------- [5,2,20,30] # tf.tile(5,1,1,30) (形状对齐,tile每个参数代表对应轴的形状扩充几倍) [5,2,30,30] 1*5 2*1 20*1 1*30

tile and broadcasting difference:

  1. tile is a physical copy, increase physical space
  2. And broadcasting is a virtual copy, (in order to calculate, implicitly implement copy and no increase in physical space)
  3. tile can be arbitrary (an integral multiple copy n * m, mn with an integer)
  4. And Broadcasting (raw data form can exist only in a case where the expansion. 1 * n, n is an integer)

Compression dimension (tf.squeeze):
that is, the dimension of each dimension is 1 are deleted (like math a * 1 = a)

print(tf.squeeze(tf.ones([2,1,3,1])).shape) >>> (2, 3) 

Of course, you can also specify the dimensionality reduction (default is not specified, all dimensions for the entire 1 compression):

print(tf.squeeze(tf.ones([2,1,3,1]), axis=-1).shape) >>> (2, 1, 3)

Index & sliced

Soul Description: Whether or index sections (ranks are separated by commas), and regardless of ranks, the index is zero-based.
Index: takes a value

print(t[1,2])  # 逗号前面代表行的索引, 逗号后面是列的索引
>> tf.Tensor(6.0, shape=(), dtype=float32)

Slice: Take substructure (two ways)
Embodiment 1 (colon slice):

print(t[:, 1:])  # 逗号前面是行。只写: 代表取所有行。逗号后面是列。 1: 代表第二列到最后
>> tf.Tensor([[2. 3.] [5. 6.]], shape=(2, 2), dtype=float32) 

Mode 2 (ellipsis slice): (I believe that people do not understand Numpy heard of the python Ellipsis, is the ellipsis class)
to play themselves to run this line of code:

print(... is Ellipsis)
>>> True

Back to the topic ... :( ellipsis slice is for multi-dimensional, two-dimensional if it is the direct use: you can)

(我们以三维为例,这个就不适合称作行列了)
# shape 是 (2, 2, 2)
t = tf.constant(
    [   # 一维
        [   # 二维
            [1, 2], # 三维
            [3, 4],
        ],
        [ [5, 6], [7, 8], ], ] ) 伪码:t[1维切片, 二维切片, 三维切片] 代码:t[:, :, 0:1] # 1维不动, 2维不动, 3维 取一条数据 结果: shape为 (2,2,1) [ # 一维 [ # 二维 [1], # 三维 [3], ], [ [5], [7], ], ]

Duokanjibian do not understand it.
Not found, even though I'm not one-dimensional and 2-dimensional slice, I was forced to write two: to occupying
that if there are 100 dimensions, I just want to last a dimension slice. Before 99 do not move, then do I have to write 99: placeholder? ?
No, the following code can be solved:

print(t[..., 0:1])       # 这就是 ... 的作用 (注意,只在 numpy 和 tensorflow中有用)

tensor turn numpy type

t.numpy()    # tensor 转为 numpy 类型

variable

definition:

v = tf.Variable(   # 注意: V是大写
    [
        [1, 2, 3], 
        [4, 5, 6] ] )

Variable assignment (the nature of their assignment):

注意: 变量一旦被定义,形状就定下来了。 赋值(只能赋给同形状的值)

v.assign(
    [
        [1,1,1], 
        [1,1,1], ] ) print(v) >> <tf.Variable 'Variable:0' shape=(2, 3) dtype=int32, numpy=array([[1, 1, 1],[1, 1, 1]])>

Variable values ​​(equivalent to convert Tensor):

特别: 变量本身就是 Variable类型, 取值取出得是 Tensor (包括切片取值,索引取值等)
print( v.value() )
>> tf.Tensor([[1 2 3] [4 5 6]], shape=(2, 3), dtype=int32)

Variable Index & slice assignments:

常量:是不可变的。所以只有取值,没有赋值。
变量:取值、赋值都可以
v.assign(xx)  类似于 python的 v=xx

v[0, 1].assign(100) # 索引赋值, v.assign 等价于 v[0, :].assign([10, 20, 30]) # 注意,切片赋值传递的需要是容器类型 特别注意: 前面说过,变量 结构形状 是 不可变的,赋值的赋给的是数据。 但是你赋值的时候要时刻注意,不能改变变量原有形状 拿切片赋值为例: 你切多少个,你就得赋多少个。 并且赋的值结构要一致。 举个栗子: 你从正方体里面挖出来一个小正方体。那么你必须填补一块一模一样形状的小正方体) 还有两种扩展API: v.assign_add() # 类似python 的 += v.assign_sub() # 类似python 的 -=

Variable Index Value & sliced

同 常量切片取值(略)

Variable 转 Numpy

print(v.numpy())

Irregular tensor (RaggedTensor)

definition:

rag_tensor = tf.ragged.constant(
    [ 
        [1,2], 
        [2,3,4,5], ] ) # 允许每个维度的数据长度参差不齐

Splicing: If you need to "irregular splicing tensor" (available tf.concat (axis =))

0轴:竖着拼接(样本竖着摞起来)可随意拼接。 拼接后的依然是"不规则张量"
1轴:横着拼接(属性水平拼起来)这时候需要你样本个数必须相等, 否则对不上,报错
总结: 样本竖着随便拼, 属性横着(必须样本个数相等) 才能拼

RaggedTensor ordinary Tensor:

说明:普通Tensor是必须要求, 长度对齐的。入 对不齐的 末尾补0
tensor = rag_tensor.to_tensor()

Sparse tensor (Sparse Tensor)

Features (understood as Record Index):

  1. Recording only the coordinate position of non-zero, indices parameters: Each sub-list indicates a coordinate
  2. Although only record the coordinates, but later converted to ordinary Tensor, only the coordinate position has a value, the value of other locations all 0
  3. Range to be filled, depending on the setting dense_shape

definition:

s = tf.SparseTensor(
    indices=[[0, 1], [1, 0], [2, 3]], # 注意,这个索引设置需要是(从左到右,从上到下)的顺序设置 values=[1, 2, 3], # 将上面3个坐标值分别设值为 1,2,3 dense_shape=[3, 4] # Tensor总范围 ) print(s) >> SparseTensor(indices=tf.Tensor([[0 1], [1 0],[2 3]], shape=(3, 2), dtype=int64)。。。

Into common Tensor (after converted to ordinary Tensor, see the real value is stored)

tensor = tf.sparse.to_dense(s) 
print(tensor)
>> tf.Tensor([ [0 1 0 0],[2 0 0 0],[0 0 0 3] ], shape=(3, 4), dtype=int32)

If the above use to_dense () may get an error:

error: is out of range

这个错误的原因是创建 tf.SparseTensor(indices=) ,前面也说了indices,要按(从左到右,从上到下)顺序写
当然你也可以用排序API,先排序,然后再转:
eg:
    _ = tf.sparse.reorder(s)        # 先将索引排序
    tensor = tf.sparse.to_dense(_)  # 再转

tf.function

This API is used as a decoration for the effective transfer syntax as Python convert TF syntax structure of FIG.

import tensorflow as tf
import numpy as np

@tf.function def f(): a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) return a + b print( f() ) >>> tf.Tensor([5 7 9], shape=(3,), dtype=int32)
You should find a feature, we define the internal function f (), a grammar tf did not write. @ Tf.function only decoration a line
and call the result of the return value is actually a tensor.
This is the role of the decorator @ tf.function!

Of course function inside, you can also write tf, and it is no problem.
But note that the variables which define the function does not allow variables need to be defined to be out of the function definitions

a = tf.Variable([1,2,3])  # 如需tensor变量,应该放在外面

@tf.function def f(): # a = tf.Variable([1,2,3]) # 这里面不允许定义变量! pass

The combined addition (tf.concat)

My understanding is that (the basic mathematical merger of similar items)

# 合并同类项的原则就是有1项不同,其他项完全相同。
# 前提条件:(最多,有一个维度的形状不相等。   注意是最多)
t1 = tf.ones([2,5,6])
t2 = tf.ones([6,5,6]) print( tf.concat([t1,t2],axis=0).shape ) # axis=0,传了0轴,那么其他轴捏死不变。只合并0轴 >> (8,5,8) 

Stacking dimension reduction (tf.stack)

My understanding is that (elementary arithmetic, carry, (Carry is to expand the number of dimensions represent a))

# 前提条件:所有维度形状必须全部相等。
tf1 = tf.ones([2,3,4])
tf2 = tf.ones([2,3,4]) tf3 = tf.ones([2,3,4]) print(tf.stack([tf1,tf2,tf3], axis=0).shape) # 你可以想象有3组 [2,3,4],然后3组作为一个新维度,插入到 axis对应的索引处。 >> (3, 2, 3, 4) # 对比理解,如果这是tf.concat(), 那么结果就是 (6,3,4)

Split dimensionality reduction (tf.unstack)

And tf.stack exactly reciprocal process, several dimensions are specified axis, it will be split into several data, while reducing dimension.

a = tf.ones([3, 2, 3, 4])
for x in tf.unstack(a, axis=0): print(x.shape) 结果如下(分成了3个 [2,3,4]) >>> (2, 3, 4) >>> (2, 3, 4) >>> (2, 3, 4)

Split no dimension reduction (tf.split)

grammar:

And tf.unstack difference is, tf.unstack is divided equally dimensionality reduction, tf.stack is how to divide will not dimensionality reduction, and can specify the number of copies separated

a = tf.ones([2,4,35,8])
for x in tf.split(a, axis=3,num_or_size_splits=[2,2,4]): print(x.shape) 结果: >> (2, 4, 35, 2) # 最后一维2 >> (2, 4, 35, 2) # 最后一维2 >> (2, 4, 35, 4) # 最后一维4 

scenes to be used:

If we want to segmentation data set (train-test-valid) 3 points ratio of 6: 2: 2
Method 1: (scikit-learn continuous slitting 2)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2)
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train,test_size=0.2)
# 源码中显示 test_size如果不传。默认为0.25。 # 思路,因为 scikit-learn 只能切出2个结果: 所以我们需要切2次: # 第一次 从完整训练集 切出来 (剩余训练集, 测试集) # 第二次 从剩余数据集 切出来 (剩余训练集2, 验证集)

Method 2: (tf.split)

x = tf.ones([1000, 5000])
y = tf.ones([1000, 1])

x_train, x_test, x_valid = tf.split(
    x, 
    num_or_size_splits=[600,200,200], # 切3份 axis=0 ) y_train, y_test, y_valid = tf.split( y, num_or_size_splits=[600,200,200], # 同样切3份 axis=0 ) print(x_train.shape, y_train.shape) print(x_test.shape, y_test.shape) print(x_valid.shape, y_valid.shape) 结果 >>> (600, 5000) (600, 1) >>> (200, 5000) (200, 1) >>> (200, 5000) (200, 1)

High index (tf.gather)

numpy This index is called fancy indexing (if I remember correctly)

data = tf.constant([6,7,8])        # 当作真实数据
index = tf.constant([2, 1, 0]) # 当作索引 print(tf.gather(data, index)) >> tf.Tensor([8 7 6], shape=(3,), dtype=int32)

Sort (tf.sort)

data = tf.constant([6, 7, 8])
print(tf.sort(data, direction='DESCENDING')) # 'ASCENDING' # 默认是ASCENDING升序 tf.argsort() # 同上, 只不过返回的是排序后的,对应数据的index

Top-K(tf.math.top_k)

Look out for the largest n (sort than the first performance then slice is better)

a = tf.math.top_k([6,7,8],2)   # 找出最大的两个,返回是个对象
print(a.indices)    # 取出最大的两个 索引 ()
print(a.values)     # 取出最大的两个 值
>> tf.Tensor([2 1], shape=(2,), dtype=int32) >> tf.Tensor([8 7], shape=(2,), dtype=int32)

tf.GradientTape (custom derivative)

Partial derivative

v1, v2 = tf.Variable(1.), tf.Variable(2.) # 变量 会 被自动侦测更新的
c1, c2 = tf.constant(1.), tf.constant(2.) # 常量 不会 自动侦测更新 y = lambda x1,x2: x1**2 + x2**2 with tf.GradientTape(persistent=True) as tape: """默认这个 tape使用一次就会被删除, persistent=True 代表永久存在,但后续需要手动释放""" # 因为常量不会被自动侦测,所以我们需要手动调用 watch() 侦测 tape.watch(c1) # 如果是变量,就不用watch这两步了 tape.watch(c2) f = y(c1,c2) # 调用函数,返回结果 c1_, c2_ = tape.gradient(f, [c1,c2]) # 参数2:传递几个自变量,就会返回几个 偏导结果 # c1_ 为 c1的偏导 # c2_ 为 c2的偏导 del tape # 手动释放 tape

Seeking partial Derivative (gradient nested)

v1, v2 = tf.Variable(1.), tf.Variable(2.) # 我们使用变量

y = lambda x1,x2: x1**2 + x2**2 with tf.GradientTape(persistent=True) as tape2: with tf.GradientTape(persistent=True) as tape1: f = y(v1,v2) once_grads = tape1.gradient(f, [v1, v2]) # 一阶偏导 # 此列表推导式表示:拿着一阶偏导,来继续求二阶偏导(注意,用tape2) twice_grads = [tape2.gradient(once_grad, [v1,v2]) for once_grad in once_grads] # 二阶偏导 print(twice_grads) del tape1 # 释放 del tape2 # 释放

Explanation

求导数(一个自变量):tape1.gradient(f, v1)       # gradient传 1个自变量
求偏导(多个自变量):tape1.gradient(f, [v1,v2])  # gradient传 1个列表, 列表内填所有自变量

SGD (stochastic gradient descent)

Embodiment 1: Shredded (without using the optimizer)

v1, v2 = tf.Variable(1.), tf.Variable(2.)  # 我们使用变量
y = lambda x1, x2: x1 ** 2 + x2 ** 2 # 二元二次方程 learning_rate = 0.1 # 学习率 for _ in range(30): # 迭代次数 with tf.GradientTape() as tape: # 求导作用域 f = y(v1,v2) d1, d2 = tape.gradient(f, [v1,v2]) # 求导, d1为 v1的偏导, d2为v2的偏导 v1.assign_sub(learning_rate * d1) v2.assign_sub(learning_rate * d2) print(v1) print(v2) 实现流程总结: 1. 偏导 自变量v1,v2求出来的。 (d1, d2 = tape.gradient(f, [v1,v2])) 2. 自变量v1,v2的衰减 是关联 偏导的( 衰减值 = 学习率*偏导) 3. 我们把前2步套了一个大循环(并设定迭代次数), 1-2-1-2-1-2-1-2-1-2 步骤往复执行

Embodiment 2: borrowing Tensorflow Optimizer (Optimizer) implemented gradient descent

v1, v2 = tf.Variable(1.), tf.Variable(2.)  # 我们使用变量
y = lambda x1, x2: x1 ** 2 + x2 ** 2       # 二元二次函数 ,  通常这个函数我们用作计算loss

learning_rate = 0.1 # 学习率 optimizer = keras.optimizers.SGD(learning_rate=learning_rate) # 初始化优化器 for _ in range(30): # 迭代次数 with tf.GradientTape() as tape: f = y(v1,v2) d1, d2 = tape.gradient(f, [v1,v2]) # d1为 v1的偏导, d2为v2的偏导 optimizer.apply_gradients( # 注意这里不一样了,我们之前手动衰减 [ # 而现在这些事情, optimizer.SGD帮我们做了 (d1, v1), # 我们只需把偏导值,和自变量按这种格式传给它即可 (d2, v2), ] ) # 通常这种格式,我们用 zip() 实现 # eg: # model = keras.models.Sequential([......]) # ....... # grads = tape.gradient(f, [v1,v2]) # optimizer.apply_gradients( # zip(grads, model.trainable_variables) # ) print(v1) print(v2) 实现流程总结: 1. 偏导 是自变量v1,v2求出来的 (d1, d2 = tape.gradient(f, [v1,v2])) # 此步骤不变 2. 把偏导 和 自变量 传给optimizer.apply_gradients() optimizer.SGD() 自动帮我们衰减。 3. 我们还是把前2步套了一个大循环(并设定迭代次数), 1-2-1-2-1-2-1-2-1-2 步骤往复执行。 注: 假如你用adam等之类的其他优化器,那么可能有更复杂的公式,如果我们手撕,肯能有些费劲。 这时候我们最好使用 optimizer.Adam ...等各种 成品,优化器。通用步骤如下 1. 先实例化出一个优化器对象 2. 实例化对象.apply_gradients([(偏导,自变量)])

Guess you like

Origin www.cnblogs.com/whw1314/p/12121874.html