【tf.keras.Model】构建模型小结(部分问题未解决)

every blog every motto: You can do more than you think.

0. 前言

对于构建深度学习网络模型,我们通常有三种方法,分别是:

  • Sequential API
  • Functional API
  • Subclassing API

说明: 推荐使用functional API.


本文主要对有关子类API(tf.keras.Model)构建模型时“两种方法”进行比较分析。
注: 为保持文章的完整性,本文仅就部分问题进行探讨,后续问题见下一篇博文

1. 正文

  1. 通过继承tf.keras.Mdoel 这个Python类来定义自己的模型。
  2. 在继承类中,我们需要重写__init__()(构造函数,初始化)和call(input)(模型调用)两个方法,同时也可以根据需要增加自定义的 方法。
  3. init方法用于定义/初始化用到的层(如:卷积层、池化层等);call方法用于神经网络的正向传递(自动生成反向传递)

1.1 模板

以下两种方法结果类似,主要区别在于:

  • 一种再init方法中调用已有层
  • 另一种重写Layer,然后再init中对自定义的层进行实例化

1.1.1 方法一:调用已有层

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()     # Python 2 下使用 super(MyModel, self).__init__()
        # 此处添加初始化代码(包含 call 方法中会用到的层),例如
        # layer1 = tf.keras.layers.BuiltInLayer(...)
        # layer2 = MyCustomLayer(...)

    def call(self, input):
        # 此处添加模型调用的代码(处理输入并返回输出),例如
        # x = layer1(input)
        # output = layer2(x)
        return output

    # 还可以添加自定义的方法

1.1.2 方法二:自定义层

class DoubleConv(layers.Layer):
	"""自定义层"""
    def __init__(self):
        super().__init__()

    def call(self, input):
        pass


class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()  # Python 2 下使用 super(MyModel, self).__init__()
        # 此处添加初始化代码(包含 call 方法中会用到的层),例如
        doub_block = DoubleConv()
        # layer1 = tf.keras.layers.BuiltInLayer(...)
        # layer2 = MyCustomLayer(...)

    def call(self, input):
        # 此处添加模型调用的代码(处理输入并返回输出),例如
        # x = layer1(input)
        x = DoubleConv()
        # output = layer2(x)
        # return output

        # 还可以添加自定义的方法

1.2 实例演示

1.2.1 调用已有层

1.2.1.1 常规代码

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU


class Models(tf.keras.Model):
    def __init__(self):
        super().__init__()

        self.conv = Conv2D(16, (3, 3), padding='same')
        self.bn = BatchNormalization()
        self.ac = ReLU()

        self.conv2 = Conv2D(32, (3, 3), padding='same')
        self.bn2 = BatchNormalization()
        self.ac2 = ReLU()

    def call(self, x, **kwargs):
        x = self.conv(x)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.ac2(x)

        return x


m = Models()
m.build(input_shape=(2, 8, 8, 3))
m.summary()

模型结构:
在这里插入图片描述

1.2.1.2 调整后代码

1. 共用批归一化
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU


class Models(tf.keras.Model):
    def __init__(self):
        super().__init__()

        self.conv = Conv2D(16, (3, 3), padding='same')
        self.bn = BatchNormalization()
        self.ac = ReLU()

        self.conv2 = Conv2D(32, (3, 3), padding='same')
        self.bn2 = BatchNormalization()
        self.ac2 = ReLU()

    def call(self, x, **kwargs):
        x = self.conv(x)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        # ==========================
        # 此处共用一个BatchNormalization
        # ===========================
        x = self.bn(x)
        x = self.ac2(x)

        return x


m = Models()
m.build(input_shape=(2, 8, 8, 3))
m.summary()
  • 上面两处的批归一化(BatchNormalization)共用了一个BatchNormalization,出现如下报错。
  • 如果共用一个卷积/激活函数,同样会出现报错。(读者可自行验证)

共用一个批归一化,报错如下:

ValueError: Input 0 of layer batch_normalization is incompatible with the layer: expected axis 3 of input shape to have value 16 but received input with shape [2, 8, 8, 32]

共用一个卷积,报错如下:

ValueError: Input 0 of layer conv2d is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape [2, 8, 8, 16]

共用一个激活函数,报错如下:

ValueError: You tried to call `count_params` on re_lu_1, but the layer isn't built. You can build it manually via: `re_lu_1.build(batch_input_shape)`.

1.2.2 调用自定义层

1.2.2.1 常规代码

说明: 代码较长,此处分开写,打消读者的畏难情绪,便于阅读。
导入模块

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.keras import layers

自定义层

class DoubleConv(layers.Layer):

    def __init__(self, mid_kernel_numbers, out_kernel_number):
        """
        初始化含有两个卷积的卷积块

        :param mid_kernel_numbers: 中间特征图的通道数
        :param out_kernel_number: 输出特征图的通道数
        """
        super().__init__()
        self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
        self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
        self.bn = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.ac = layers.ReLU()
        self.ac2 = layers.ReLU()

    def call(self, input, **kwargs):
        """正向传播"""
        x = self.conv1(input)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.ac2(x)
        return x

模型类

class Model(tf.keras.Model):

    def __init__(self):
        """
        构建模型的类
        """
        super().__init__()
        # 初始化卷积块
        self.block = DoubleConv(16, 32)

    def call(self, x, **kwargs):
        x = self.block(x)
        return x

打印模型结构图

m = Model()
m.build(input_shape=(2, 8, 8, 3))
m.summary()

在这里插入图片描述

1.2.2.2 调整后代码

说明: 因代码较长,本部分仅展示调整部分,其余代码同上文(1.2.2.1)

1. 共用卷积
class DoubleConv(layers.Layer):

    def __init__(self, mid_kernel_numbers, out_kernel_number):
        """
        初始化含有两个卷积的卷积块

        :param mid_kernel_numbers: 中间特征图的通道数
        :param out_kernel_number: 输出特征图的通道数
        """
        super().__init__()
        self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
        self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
        self.bn = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.ac = layers.ReLU()
        self.ac2 = layers.ReLU()

    def call(self, input, **kwargs):
        """正向传播"""
        x = self.conv1(input)
        x = self.bn(x)
        x = self.ac(x)
        
        # =======================
        #   此处共用卷积
        # =======================
        x = self.conv(x)
        x = self.bn2(x)
        x = self.ac2(x)
        return x

报错:

AttributeError: 'DoubleConv' object has no attribute 'conv'
2. 共用批归一化
class DoubleConv(layers.Layer):

    def __init__(self, mid_kernel_numbers, out_kernel_number):
        """
        初始化含有两个卷积的卷积块

        :param mid_kernel_numbers: 中间特征图的通道数
        :param out_kernel_number: 输出特征图的通道数
        """
        super().__init__()
        self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
        self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
        self.bn = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.ac = layers.ReLU()
        self.ac2 = layers.ReLU()

    def call(self, input, **kwargs):
        """正向传播"""
        x = self.conv1(input)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        # =======================
        #   此处公用批归一化
        # =======================
        x = self.bn(x)
        x = self.ac2(x)
        return x

报错:

ValueError: Input 0 of layer batch_normalization is incompatible with the layer: expected axis 3 of input shape to have value 16 but received input with shape [2, 8, 8, 32]
3. 共用激活函数
class DoubleConv(layers.Layer):

    def __init__(self, mid_kernel_numbers, out_kernel_number):
        """
        初始化含有两个卷积的卷积块

        :param mid_kernel_numbers: 中间特征图的通道数
        :param out_kernel_number: 输出特征图的通道数
        """
        super().__init__()
        self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
        self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
        self.bn = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.ac = layers.ReLU()
        self.ac2 = layers.ReLU()

    def call(self, input, **kwargs):
        """正向传播"""
        x = self.conv1(input)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        x = self.bn2(x)
        # =======================
        #   此处公用批归一化
        # =======================
        x = self.ac(x)
        return x

在这里插入图片描述

4. 自定义层的多次调用(附)
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.keras import layers


class DoubleConv(layers.Layer):

    def __init__(self, mid_kernel_numbers, out_kernel_number):
        """
        初始化含有两个卷积的卷积块

        :param mid_kernel_numbers: 中间特征图的通道数
        :param out_kernel_number: 输出特征图的通道数
        """
        super().__init__()
        self.conv1 = layers.Conv2D(mid_kernel_numbers, (3, 3), padding='same')
        self.conv2 = layers.Conv2D(out_kernel_number, (3, 3), padding='same')
        self.bn = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.ac = layers.ReLU()
        self.ac2 = layers.ReLU()

    def call(self, input, **kwargs):
        """正向传播"""
        x = self.conv1(input)
        x = self.bn(x)
        x = self.ac(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.ac2(x)
        return x


class Model(tf.keras.Model):

    def __init__(self):
        """
        构建模型的类
        """
        super().__init__()
        # 初始化卷积块
        self.block = DoubleConv(16, 32)
        self.block2 = DoubleConv(32, 64)

    def call(self, x, **kwargs):
        x = self.block(x)
        x = self.block2(x)
        return x


m = Model()
m.build(input_shape=(2, 8, 8, 3))
m.summary()

在这里插入图片描述

1.3 总结

1.3.1 一般性总结

  1. 两种方法归根到底是一种方法,即对tf.keras.Model的继承,即,我们所说的子类API(Subclassing API)
  2. 对于三种构建模型方法( Sequential API / Functional API / Subclassing API),入门难度和灵活性依次增大
  3. 推荐使用函数式(Functional API),一般够用,且较为灵活。
  4. 对于子类API
    • 类的init方法,初始化要用到的层,如:卷积、池化、批归一化、激活函数等
    • 类的call方法,定义正向传递过程,即模型的图,其中反向传递自动完成。
  5. 模型结构图:
    • 调用已有层,打印模型结构时,我们能看到其中的每一层的信息(如:特征图大小)
    • 调用自定义层会将自定义层当做一个整体,打印模型结构时,我们看不到内部信息(具体见1.2.2)

1.3.2 (针对)错误性总结

  1. 调用已有层: 模型内各层不能重复使用!
  2. 调用自定义层: 模型内卷积、批归一化不能重复使用,激活函数可以重复使用

针对二者的区别,笔者有两种猜想:

  • 卷积和批归一化均需要参数,所以不能重复使用,否则,前一个用到的参数无法保留;激活函数不需要参数,所以可以重复使用。
  • 每一个层都有自已的名称,所以不能重复使用。(tf1.x 好像要对用到的层的名称进行指定才可使用,tf2.x并无此要求,笔者对tf1.x并不熟悉,此猜想不牢靠)

小结: 无论哪种猜想都不能解释二者为何会有区别,此点待解!

参考文献

[1] https://blog.csdn.net/weixin_39190382/article/details/104130782
[2] https://blog.csdn.net/weixin_39190382/article/details/104130995
[3] https://blog.csdn.net/weixin_42264234/article/details/103946960
[4] https://www.cnblogs.com/xiximayou/p/12690353.html#_label2
[5] https://tf.wiki/zh_hans/basic/models.html
[6] https://stackoverflow.com/questions/55908188/this-model-has-not-yet-been-built-error-on-model-summary#comment104868791_55909624
[6] https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/Conv2D

猜你喜欢

转载自blog.csdn.net/weixin_39190382/article/details/109295077