908 ultra-large-scale Chinese herbal medicine image recognition system based on ResNet model

The practices related to image recognition of Chinese herbal medicines have been covered in the previous article. If you are interested, you can read it by yourself:

"Python develops and builds an image recognition system for 23 common Chinese herbal medicines based on the lightweight GhostNet model"

"Develop and build an image recognition system for 40 common Chinese herbal medicines based on the lightweight MnasNet model"

In the previous article, we mentioned that we are independently developing and constructing a large-scale Chinese herbal medicine data set. This article is based on this background. At present, a basic data set containing 908 kinds of Chinese herbal medicine data has been constructed. The overall picture is as follows:

First, let’s take a look at the overall effect:

Examples of categories are as follows:

车前草
金银花
蒲公英
鸭跖草
垂盆草
酸浆
苍耳
马兰头
荠菜
小蓟
水芹菜
天胡荽
酢浆草
婆婆指甲菜
漆姑草
通泉草
波斯婆婆纳
泽漆
狗尾巴草
旋复花
黄花菜
小飞蓬
金线草
鸭舌草
兰花参
柴胡
麦冬
蛇莓
玉竹
桑白皮
曼陀罗
鬼针草
苦菜
葵菜
荨麻
龙葵
蒺藜
何首乌
野薄荷
棕榈
夏枯草
绞股蓝
紫云英
七星草
芍药
贝母
当归
丹皮
柴胡
车前草
紫苏
益母草
枇杷叶
荷叶
大青叶
艾叶
野菊花
金银花
月季花
旋覆花
莲子
菟丝子
银杏
茴香
天麻
葛根
桔梗
黄柏
杜仲
厚朴
全蝎
地龙
土鳖虫
蟋蟀
贝壳
珍珠
磁石
麻黄
桂枝
生姜
香薷
紫苏叶
藁本
辛夷
防风
白芷
荆芥
羌活
苍耳子
薄荷
牛蒡子
蔓荆子
蝉蜕
桑叶
葛根
柴胡
升麻
淡豆豉
知母
栀子
夏枯草
芦根
天花粉
淡竹叶
黄芩
黄连
黄柏
龙胆
苦参
犀角
生地黄
玄参
牡丹皮
赤芍
金银花
连翘
鱼腥草
熟地黄
党参
桂枝
山药
枸杞子
车前草
紫苏
大青叶
荷叶
青皮薄荷
柴胡
香附
当归
黄芪
西洋参
茯苓
苍术
艾叶
老姜
当归
香附
益母草
玫瑰花
桑枝
薄荷
木瓜
鸡血藤
女贞子
莲子
薏米
百合
人参
太子参
鹿茸
龟板
鳖甲
杏仁
桔梗
陈皮
丹参
川芎
旱莲草
车前子
大黄
夏枯草
连翘
金银花
桂枝
柴胡
香附
薄荷
青皮
香橼
佛手
熟地
当归
川芎
白芍
阿胶
丹参
三七
桃仁
红花
元胡
生地
石斛
沙参
麦冬
巴戟天
锁阳
火炭母
地胆草
崩大碗
绞股蓝
布荆
八角枫
八角茴香
八角金盘
八角莲
八角莲叶

There are currently 908 data categories in total, which are summarized by our different members. New categories will be added in the future to continue to expand and accumulate.

Considering such a large category of image recognition, this article chooses the classic ResNet model. The residual network (ResNet) is a deep learning architecture used to solve the gradient disappearance and gradient explosion problems in deep neural networks. It introduces the concept of residual block, which enables the network to learn identity mapping more easily, thus improving the training effect of the network.

The construction principle of ResNet is as follows:

  1. Basic module: The basic module of ResNet is the residual block. Each residual block consists of two convolutional layers, each followed by a batch normalization layer and an activation function (usually ReLU). The outputs of these two convolutional layers are added via skip connections and then passed through the activation function. This skip connection allows information to flow directly through the residual block, thus avoiding information being lost or decayed in the network.

  2. Stacked Residual Blocks: ResNet builds a deeper network by stacking multiple residual blocks. These residual blocks can have different number of layers and filters to suit different tasks and network depth requirements.

  3. Pooling layer and fully connected layer: After stacking the residual blocks, a pooling layer can be added to reduce the size of the feature map, and the final features are classified or regressed through a fully connected layer.

Advantages of ResNet:

  1. Solving the problem of gradient disappearance and gradient explosion: Due to the skip connections in the residual block, ResNet can train deep networks more easily, avoiding the disappearance or explosion of gradients during backpropagation.

  2. Improve the training effect of the network: Residual blocks allow the network to learn identity mapping, that is, to pass the input directly to the output. This allows the network to learn the residual part more easily, thereby improving the training effect of the network.

  3. Very deep networks can be built: Due to the existence of residual connections, ResNet can stack more residual blocks and build very deep networks. This helps extract more complex features, thereby improving the expressive power of the model.

Disadvantages of ResNet:

  1. More parameters: Due to the depth of ResNet, there are a large number of parameters in the network, which will increase the complexity of the model and the training time.

  2. Training difficulties: Although ResNet can solve the gradient disappearance and gradient explosion problems, other training difficulties, such as gradient degradation problems and overfitting, may still occur when training a deeper ResNet.

ResNet solves the problems of gradient disappearance and gradient explosion in deep neural networks by introducing residual blocks and skip connections, and improves the training effect of the network.

The corresponding code implementation is given here:

# coding=utf-8
from keras.models import Model
from keras.layers import (
    Input,
    Dense,
    BatchNormalization,
    Conv2D,
    MaxPooling2D,
    AveragePooling2D,
    ZeroPadding2D,
)
from keras.layers import add, Flatten
from keras.optimizers import SGD
import numpy as np

seed = 7
np.random.seed(seed)


def Conv2d_BN(x, nb_filter, kernel_size, strides=(1, 1), padding="same", name=None):
    if name is not None:
        bn_name = name + "_bn"
        conv_name = name + "_conv"
    else:
        bn_name = None
        conv_name = None

    x = Conv2D(
        nb_filter,
        kernel_size,
        padding=padding,
        strides=strides,
        activation="relu",
        name=conv_name,
    )(x)
    x = BatchNormalization(axis=3, name=bn_name)(x)
    return x


def Conv_Block(inpt, nb_filter, kernel_size, strides=(1, 1), with_conv_shortcut=False):
    x = Conv2d_BN(
        inpt,
        nb_filter=nb_filter[0],
        kernel_size=(1, 1),
        strides=strides,
        padding="same",
    )
    x = Conv2d_BN(x, nb_filter=nb_filter[1], kernel_size=(3, 3), padding="same")
    x = Conv2d_BN(x, nb_filter=nb_filter[2], kernel_size=(1, 1), padding="same")
    if with_conv_shortcut:
        shortcut = Conv2d_BN(
            inpt, nb_filter=nb_filter[2], strides=strides, kernel_size=kernel_size
        )
        x = add([x, shortcut])
        return x
    else:
        x = add([x, inpt])
        return x


def ResNet():
    inpt = Input(shape=(224, 224, 3))
    x = ZeroPadding2D((3, 3))(inpt)
    x = Conv2d_BN(x, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding="valid")
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(x)
    x = Conv_Block(
        x,
        nb_filter=[64, 64, 256],
        kernel_size=(3, 3),
        strides=(1, 1),
        with_conv_shortcut=True,
    )
    x = Conv_Block(x, nb_filter=[64, 64, 256], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[64, 64, 256], kernel_size=(3, 3))
    x = Conv_Block(
        x,
        nb_filter=[128, 128, 512],
        kernel_size=(3, 3),
        strides=(2, 2),
        with_conv_shortcut=True,
    )
    x = Conv_Block(x, nb_filter=[128, 128, 512], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[128, 128, 512], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[128, 128, 512], kernel_size=(3, 3))
    x = Conv_Block(
        x,
        nb_filter=[256, 256, 1024],
        kernel_size=(3, 3),
        strides=(2, 2),
        with_conv_shortcut=True,
    )
    x = Conv_Block(x, nb_filter=[256, 256, 1024], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[256, 256, 1024], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[256, 256, 1024], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[256, 256, 1024], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[256, 256, 1024], kernel_size=(3, 3))
    x = Conv_Block(
        x,
        nb_filter=[512, 512, 2048],
        kernel_size=(3, 3),
        strides=(2, 2),
        with_conv_shortcut=True,
    )
    x = Conv_Block(x, nb_filter=[512, 512, 2048], kernel_size=(3, 3))
    x = Conv_Block(x, nb_filter=[512, 512, 2048], kernel_size=(3, 3))
    x = AveragePooling2D(pool_size=(7, 7))(x)
    x = Flatten()(x)
    x = Dense(908, activation="softmax")(x)
    model = Model(inputs=inpt, outputs=x)
    sgd = SGD(decay=0.0001, momentum=0.9)
    model.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])
    model.summary()

The above is implemented based on the Keras framework. Of course, it can also be implemented based on the PyTorch framework, as shown below:

import torch
from torch import Tensor
import torch.nn as nn
import numpy as np
from torchvision._internally_replaced_utils import load_state_dict_from_url
from typing import Type, Any, Callable, Union, List, Optional


def conv3x3(
    in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1
) -> nn.Conv2d:
    return nn.Conv2d(
        in_planes,
        out_planes,
        kernel_size=3,
        stride=stride,
        padding=dilation,
        groups=groups,
        bias=False,
        dilation=dilation,
    )


def conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion: int = 1

    def __init__(
        self,
        inplanes: int,
        planes: int,
        stride: int = 1,
        downsample: Optional[nn.Module] = None,
        groups: int = 1,
        base_width: int = 64,
        dilation: int = 1,
        norm_layer: Optional[Callable[..., nn.Module]] = None,
    ) -> None:
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError("BasicBlock only supports groups=1 and base_width=64")
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion: int = 4

    def __init__(
        self,
        inplanes: int,
        planes: int,
        stride: int = 1,
        downsample: Optional[nn.Module] = None,
        groups: int = 1,
        base_width: int = 64,
        dilation: int = 1,
        norm_layer: Optional[Callable[..., nn.Module]] = None,
    ) -> None:
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.0)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = conv3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = conv1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x: Tensor) -> Tensor:
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity
        out = self.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        layers: List[int],
        num_classes: int = 1000,
        zero_init_residual: bool = False,
        groups: int = 1,
        width_per_group: int = 64,
        replace_stride_with_dilation: Optional[List[bool]] = None,
        norm_layer: Optional[Callable[..., nn.Module]] = None,
    ) -> None:
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError(
                "replace_stride_with_dilation should be None "
                "or a 3-element tuple, got {}".format(replace_stride_with_dilation)
            )
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(
            3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False
        )
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(
            block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0]
        )
        self.layer3 = self._make_layer(
            block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1]
        )
        self.layer4 = self._make_layer(
            block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2]
        )
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]

    def _make_layer(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        planes: int,
        blocks: int,
        stride: int = 1,
        dilate: bool = False,
    ) -> nn.Sequential:
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )
        layers = []
        layers.append(
            block(
                self.inplanes,
                planes,
                stride,
                downsample,
                self.groups,
                self.base_width,
                previous_dilation,
                norm_layer,
            )
        )
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    groups=self.groups,
                    base_width=self.base_width,
                    dilation=self.dilation,
                    norm_layer=norm_layer,
                )
            )

        return nn.Sequential(*layers)

    def _forward_impl(self, x: Tensor, need_fea=False) -> Tensor:
        if need_fea:
            features, features_fc = self.forward_features(x, need_fea)
            x = self.fc(features_fc)
            return features, features_fc, x
        else:
            x = self.forward_features(x)
            x = self.fc(x)
            return x

    def forward(self, x: Tensor, need_fea=False) -> Tensor:
        return self._forward_impl(x, need_fea)

    def forward_features(self, x, need_fea=False):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        if need_fea:
            x1 = self.layer1(x)
            x2 = self.layer2(x1)
            x3 = self.layer3(x2)
            x4 = self.layer4(x3)
            x = self.avgpool(x4)
            x = torch.flatten(x, 1)
            return [x1, x2, x3, x4], x
        else:
            x = self.layer1(x)
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            return x

    def cam_layer(self):
        return self.layer4


def _resnet(
    block: Type[Union[BasicBlock, Bottleneck]],
    layers: List[int],
    pretrained: bool,
    progress: bool,
    **kwargs: Any
) -> ResNet:
    model = ResNet(block, layers, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(
            "https://download.pytorch.org/models/resnet50-0676ba61.pth",
            progress=progress,
        )
        model_dict = model.state_dict()
        weight_dict = {}
        for k, v in state_dict.items():
            if k in model_dict:
                if np.shape(model_dict[k]) == np.shape(v):
                    weight_dict[k] = v
        pretrained_dict = weight_dict
        model_dict.update(pretrained_dict)
        model.load_state_dict(model_dict)
    return model


def resnet50(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
    return _resnet(Bottleneck, [3, 4, 6, 3], pretrained, progress, **kwargs)

You can directly integrate it into your own project and use it according to your own preferences.

The overall training loss curve is as follows:

The accuracy curve is as follows:

At present, it has only been trained from scratch for more than 60 epochs, and the effect is not very satisfactory. The subsequent plan is to conduct fine-tuning training based on the pre-trained model weights to improve the current accuracy.

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/134876378