every blog every motto: You can do more than you think.
0. 前言
对于构建深度学习网络模型,我们通常有三种方法,分别是:
- Sequential API
- Functional API
- Subclassing API
说明: 推荐使用functional API.
本文主要对有关子类API(tf.keras.Model)构建模型时“两种方法”进行比较分析。
注: 为保持文章的完整性,本文仅就部分问题进行探讨,后续问题见下一篇博文。
1. 正文
- 通过继承tf.keras.Mdoel 这个Python类来定义自己的模型。
- 在继承类中,我们需要重写__init__()(构造函数,初始化)和call(input)(模型调用)两个方法,同时也可以根据需要增加自定义的 方法。
- init方法用于定义/初始化用到的层(如:卷积层、池化层等);call方法用于神经网络的正向传递(自动生成反向传递)
1.1 模板
以下两种方法结果类似,主要区别在于:
- 一种再init方法中调用已有层
- 另一种重写Layer,然后再init中对自定义的层进行实例化
1.1.1 方法一:调用已有层
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__() # Python 2 下使用 super(MyModel, self).__init__()
# 此处添加初始化代码(包含 call 方法中会用到的层),例如
# layer1 = tf.keras.layers.BuiltInLayer(...)
# layer2 = MyCustomLayer(...)
def call(self, input):
# 此处添加模型调用的代码(处理输入并返回输出),例如
# x = layer1(input)
# output = layer2(x)
return output
# 还可以添加自定义的方法
1.1.2 方法二:自定义层
class DoubleConv(layers.Layer):
"""自定义层"""
def __init__(self):
super().__init__()
def call(self, input):
pass
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__() # Python 2 下使用 super(MyModel, self).__init__()
# 此处添加初始化代码(包含 call 方法中会用到的层),例如
doub_block = DoubleConv()
# layer1 = tf.keras.layers.BuiltInLayer(...)
# layer2 = MyCustomLayer(...)
def call(self, input):
# 此处添加模型调用的代码(处理输入并返回输出),例如
# x = layer1(input)
x = DoubleConv()
# output = layer2(x)
# return output
# 还可以添加自定义的方法
1.2 实例演示
1.2.1 调用已有层
1.2.1.1 常规代码
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU
class Models(tf.keras.Model):
def __init__(self):
super().__init__()
self.conv = Conv2D(16, (3, 3), padding='same')
self.bn = BatchNormalization()
self.ac = ReLU()
self.conv2 = Conv2D(32, (3, 3), padding='same')
self.bn2 = BatchNormalization()
self.ac2 = ReLU()
def call(self, x, **kwargs):
x = self.conv(x)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.ac2(x)
return x
m = Models()
m.build(input_shape=(2, 8, 8, 3))
m.summary()
模型结构:
1.2.1.2 调整后代码
1. 共用批归一化
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU
class Models(tf.keras.Model):
def __init__(self):
super().__init__()
self.conv = Conv2D(16, (3, 3), padding='same')
self.bn = BatchNormalization()
self.ac = ReLU()
self.conv2 = Conv2D(32, (3, 3), padding='same')
self.bn2 = BatchNormalization()
self.ac2 = ReLU()
def call(self, x, **kwargs):
x = self.conv(x)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
# ==========================
# 此处共用一个BatchNormalization
# ===========================
x = self.bn(x)
x = self.ac2(x)
return x
m = Models()
m.build(input_shape=(2, 8, 8, 3))
m.summary()
- 上面两处的批归一化(BatchNormalization)共用了一个BatchNormalization,出现如下报错。
- 如果共用一个卷积/激活函数,同样会出现报错。(读者可自行验证)
共用一个批归一化,报错如下:
ValueError: Input 0 of layer batch_normalization is incompatible with the layer: expected axis 3 of input shape to have value 16 but received input with shape [2, 8, 8, 32]
共用一个卷积,报错如下:
ValueError: Input 0 of layer conv2d is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape [2, 8, 8, 16]
共用一个激活函数,报错如下:
ValueError: You tried to call `count_params` on re_lu_1, but the layer isn't built. You can build it manually via: `re_lu_1.build(batch_input_shape)`.
1.2.2 调用自定义层
1.2.2.1 常规代码
说明: 代码较长,此处分开写,打消读者的畏难情绪,便于阅读。
导入模块
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras import layers
自定义层
class DoubleConv(layers.Layer):
def __init__(self, mid_kernel_numbers, out_kernel_number):
"""
初始化含有两个卷积的卷积块
:param mid_kernel_numbers: 中间特征图的通道数
:param out_kernel_number: 输出特征图的通道数
"""
super().__init__()
self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
self.bn = layers.BatchNormalization()
self.bn2 = layers.BatchNormalization()
self.ac = layers.ReLU()
self.ac2 = layers.ReLU()
def call(self, input, **kwargs):
"""正向传播"""
x = self.conv1(input)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.ac2(x)
return x
模型类
class Model(tf.keras.Model):
def __init__(self):
"""
构建模型的类
"""
super().__init__()
# 初始化卷积块
self.block = DoubleConv(16, 32)
def call(self, x, **kwargs):
x = self.block(x)
return x
打印模型结构图
m = Model()
m.build(input_shape=(2, 8, 8, 3))
m.summary()
1.2.2.2 调整后代码
说明: 因代码较长,本部分仅展示调整部分,其余代码同上文(1.2.2.1)
1. 共用卷积
class DoubleConv(layers.Layer):
def __init__(self, mid_kernel_numbers, out_kernel_number):
"""
初始化含有两个卷积的卷积块
:param mid_kernel_numbers: 中间特征图的通道数
:param out_kernel_number: 输出特征图的通道数
"""
super().__init__()
self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
self.bn = layers.BatchNormalization()
self.bn2 = layers.BatchNormalization()
self.ac = layers.ReLU()
self.ac2 = layers.ReLU()
def call(self, input, **kwargs):
"""正向传播"""
x = self.conv1(input)
x = self.bn(x)
x = self.ac(x)
# =======================
# 此处共用卷积
# =======================
x = self.conv(x)
x = self.bn2(x)
x = self.ac2(x)
return x
报错:
AttributeError: 'DoubleConv' object has no attribute 'conv'
2. 共用批归一化
class DoubleConv(layers.Layer):
def __init__(self, mid_kernel_numbers, out_kernel_number):
"""
初始化含有两个卷积的卷积块
:param mid_kernel_numbers: 中间特征图的通道数
:param out_kernel_number: 输出特征图的通道数
"""
super().__init__()
self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
self.bn = layers.BatchNormalization()
self.bn2 = layers.BatchNormalization()
self.ac = layers.ReLU()
self.ac2 = layers.ReLU()
def call(self, input, **kwargs):
"""正向传播"""
x = self.conv1(input)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
# =======================
# 此处公用批归一化
# =======================
x = self.bn(x)
x = self.ac2(x)
return x
报错:
ValueError: Input 0 of layer batch_normalization is incompatible with the layer: expected axis 3 of input shape to have value 16 but received input with shape [2, 8, 8, 32]
3. 共用激活函数
class DoubleConv(layers.Layer):
def __init__(self, mid_kernel_numbers, out_kernel_number):
"""
初始化含有两个卷积的卷积块
:param mid_kernel_numbers: 中间特征图的通道数
:param out_kernel_number: 输出特征图的通道数
"""
super().__init__()
self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
self.bn = layers.BatchNormalization()
self.bn2 = layers.BatchNormalization()
self.ac = layers.ReLU()
self.ac2 = layers.ReLU()
def call(self, input, **kwargs):
"""正向传播"""
x = self.conv1(input)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
x = self.bn2(x)
# =======================
# 此处公用批归一化
# =======================
x = self.ac(x)
return x
4. 自定义层的多次调用(附)
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras import layers
class DoubleConv(layers.Layer):
def __init__(self, mid_kernel_numbers, out_kernel_number):
"""
初始化含有两个卷积的卷积块
:param mid_kernel_numbers: 中间特征图的通道数
:param out_kernel_number: 输出特征图的通道数
"""
super().__init__()
self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
self.bn = layers.BatchNormalization()
self.bn2 = layers.BatchNormalization()
self.ac = layers.ReLU()
self.ac2 = layers.ReLU()
def call(self, input, **kwargs):
"""正向传播"""
x = self.conv1(input)
x = self.bn(x)
x = self.ac(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.ac2(x)
return x
class Model(tf.keras.Model):
def __init__(self):
"""
构建模型的类
"""
super().__init__()
# 初始化卷积块
self.block = DoubleConv(16, 32)
self.block2 = DoubleConv(32, 64)
def call(self, x, **kwargs):
x = self.block(x)
x = self.block2(x)
return x
m = Model()
m.build(input_shape=(2, 8, 8, 3))
m.summary()
1.3 总结
1.3.1 一般性总结
- 两种方法归根到底是一种方法,即对tf.keras.Model的继承,即,我们所说的子类API(Subclassing API)
- 对于三种构建模型方法( Sequential API / Functional API / Subclassing API),入门难度和灵活性依次增大
- 推荐使用函数式(Functional API),一般够用,且较为灵活。
- 对于子类API
- 类的init方法,初始化要用到的层,如:卷积、池化、批归一化、激活函数等
- 类的call方法,定义正向传递过程,即模型的图,其中反向传递自动完成。
- 模型结构图:
- 调用已有层,打印模型结构时,我们能看到其中的每一层的信息(如:特征图大小)
- 调用自定义层,会将自定义层当做一个整体,打印模型结构时,我们看不到内部信息(具体见1.2.2)
1.3.2 (针对)错误性总结
- 调用已有层: 模型内各层不能重复使用!
- 调用自定义层: 模型内卷积、批归一化不能重复使用,激活函数可以重复使用
针对二者的区别,笔者有两种猜想:
- 卷积和批归一化均需要参数,所以不能重复使用,否则,前一个用到的参数无法保留;激活函数不需要参数,所以可以重复使用。
- 每一个层都有自已的名称,所以不能重复使用。(tf1.x 好像要对用到的层的名称进行指定才可使用,tf2.x并无此要求,笔者对tf1.x并不熟悉,此猜想不牢靠)
小结: 无论哪种猜想都不能解释二者为何会有区别,此点待解!
参考文献
[1] https://blog.csdn.net/weixin_39190382/article/details/104130782
[2] https://blog.csdn.net/weixin_39190382/article/details/104130995
[3] https://blog.csdn.net/weixin_42264234/article/details/103946960
[4] https://www.cnblogs.com/xiximayou/p/12690353.html#_label2
[5] https://tf.wiki/zh_hans/basic/models.html
[6] https://stackoverflow.com/questions/55908188/this-model-has-not-yet-been-built-error-on-model-summary#comment104868791_55909624
[6] https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/Conv2D