PaddlePaddle teach you to identify a variety of herbs!

[Fly] paddle developers say School of Management Associate Professor Han Aiqing Beijing University of Chinese Medicine

Chinese medicine is the main carrier of Chinese medicine in clinical treatment, but also in the core of the carrier medicine culture, contains a large number of scientific and technological resources, cultural resources, industry resources. In related news, China is the world's largest market for Chinese medicines. Data show that in 2019 China is expected to Chinese medicine and Pieces market reached 680 billion yuan, over the next five years will reach 2 trillion yuan, showing the huge Chinese herbal medicine market value. Due to the authenticity of Chinese herbal medicines, medication safety and clinical merits of effective directly related, so the herbs objective, scientific identification of recognition, will help standardize Chinese herbal medicine market, strengthen quality control, promote the development of traditional Chinese medicine industry.

Many scholars use objective indicators to identify recognition of Chinese herbal medicines have been extensively studied, the method including relevant studies using near infrared spectroscopy, fingerprint, electronic noses and chemical pattern recognition, or a multi-method combining research. Common feature of these methods is required based on expert experience or prior knowledge to design artificial or feature extraction, feature extraction pre workload particularly large, the resulting model accuracy on the training set, although high, but the prevalence of the pan on the test set where poor ability; Also using these methods to identify the medicine and requires special equipment, poor convenience, application of high threshold, is not conducive to application. Thus, the development of a feature can be automatically extracted, and can be used to identify the general terminal device, high accuracy, good generalization Chinese recognition systems are urgent needs of the industry.

In recent years, artificial intelligence technology to the depth of learning as the representative of the rapid development. Unlike in the past it is that this artificial intelligence concern not only in academia, but also highly respected in the industry. With "deep learning" as the keyword search National Natural Science Gold Commission funded projects in recent years, and found Foundation-funded project of "deep learning" in a number of issues related to the rapid upward trend year by year, as shown below. Since the current round of artificial intelligence can fall very strong, quickly enabling applications for the industry, so in the areas of industrial, commercial, financial, are also highly sought after, is now fast should be extended to all fields.

Here Insert Picture Description In this context, the world-renowned companies have introduced deep learning framework, launched TensorFlow in the United States, Google, Facebook launched Pytorch, in China, Baidu launched a fly paddle. As a core set of deep learning framework, basic model-base, end to end development kit, tools, components and services in one of the open-source platform depth learning platform. In depth learning framework, we chose to fly paddle. Fly propeller model, and Tesla V100 GPU With Baidu AI Studio development platform and platform operator force, we developed a Chinese herbal medicines recognition model based on the depth of learning, and completed small micro-channel-based development and deployment. In the following, we will all resolve this procedure.

Here Insert Picture Description The subjects were cooked good herbs, medicinal vegetative state is not taken into account. Applications written in Python crawlers crawling 163 kinds of Chinese Herbal Medicine picture from Baidu picture batch, artificially re-examination, remove duplicate photos, pictures and proprietary picture vegetative state and other abnormal images, the last reservation 33 834, the number of pictures about each medicine 200.

Here Insert Picture Description Application leave France, 80% of the sample to the training set and 20% of the sample to the test set. In order to increase the amount of training data sets, improve the generalization ability of the model, to enhance the training set data processing, application data enhancement technology, existing images do scaling, rotating random, random cropping, contrast adjustment, adjust hue and saturation adjustment so that the total amount of training samples reached 213,140, after data enhancement, significantly increasing the number of training samples.

Here Insert Picture Description

Configuring network comprises three parts: the network model, and the optimization function loss function.

本研究采用的网络模型为ResNeXt50卷积神经网络。目前主流深度学习分类模型包括LeNet、AlexNet、VGG、GoogleNet、Inception、ResNet、ResNext等。其中，ResNet是2015年ILSVRC的冠军，而ResNeXt模型是ResNet模型的升级版，是2016年ILSVRC的亚军。ResNeXt同时采用 VGG 堆叠的思想和 Inception 的 split-transform-merge 思想，以一种简单可扩展的方式延续split-transform-merge策略，整个网络的buildingblock都是一样的，不用在每个stage里对每个buildingblock的超参数进行调整，只用一个结构相同的buildingblock，重复堆叠即可形成整个网络。模型的可扩展性比较强，可以认为是在增加准确率的同时基本不改变或降低模型的复杂度。以下为ResNet（左图）与ResNeXt（右图）基本block对比。

Here Insert Picture Description
Left：ABlock of ResNet
Right：Ablock of ResNeXt with cardinality=32,with roughly the same complexity

构成ResNeXt基本单元的Buildingblock代码如下：

def bottleneck_block(self,input,num_filters,stride,cardinality,reduction_ratio,name=None):
        conv0 = self.conv_bn_layer(
            input=input,
            num_filters=num_filters,
            filter_size=1,
            act='relu',
            name='conv' + name + '_x1')
        conv1 = self.conv_bn_layer(
            input=conv0,
            num_filters=num_filters,
            filter_size=3,
            stride=stride,
            groups=cardinality,
            act='relu',
            name='conv' + name + '_x2')
        conv2 = self.conv_bn_layer(
            input=conv1,
            num_filters=num_filters * 2,
            filter_size=1,
            act=None,
            name='conv' + name + '_x3')
        scale = self.squeeze_excitation(
            input=conv2,
            num_channels=num_filters * 2,
            reduction_ratio=reduction_ratio,
            name='fc' + name)
        short = self.shortcut(input, num_filters * 2, stride, name=name)
        return fluid.layers.elementwise_add(x=short, y=scale, act='relu')

ResNeXt-50的整体网络结构如下,主要是将ResNet单元换成了ResNeXt单元。

Here Insert Picture Description
在输出层，将softmax作为分类输出函数，损失函数使用交叉熵（cross_entropy）函数，采用在均方根传递算法（RMSprop)上改进的适合比较大规模的适合训练大数据集的阶梯阶梯型的学习率算法，通过反向传播来不断更新模型中的参数，从而使得损失函数逐渐减小来不断优化模型。

Here Insert Picture Description
针对个人和机构AI研究者普遍缺乏算力的现状，AI Studio平台免费提供基础版（CPU:2 Cores RAM：8GB，Disk：100GB）和高级版（GPU：Tesla V100，Video Mem：16GB；CPU：8Cores，RAM:32GB， Disk：100GB）两种运行环境。由于本项目数据量较大，模型训练过程选用GPU高级版运行环境。

训练分为三步：第一步配置好GPU训练环境；第二步用训练集进行训练；第三步保存好训练的模型。

第一步，定义GPU计算场所，创建一个executor，对program进行参数初始化。

第二步，设置好训练的轮数，用训练集进行训练。遍历batch_reader迭代器，喂入一个批次的数据。为方便后续分析和过程可视化，为每个pass的每个批次数据加上索引step_id，每喂入500个batch，保存一次Pass_Num,trainbatch_Num,Train_loss,Train_acc1和time，并使用print语句输出训练的中间结果，随着训练的进行，损失率逐渐下降，准确率逐渐提高，模型逐渐优化。

Here Insert Picture Description
第三步，模型保存。由于数据量较大，需要训练几十个小时。为防止训练过程意外中断，在训练过程中，每喂入500个批次的数据保存一次中间模型，一旦出现意外中断，下次训练直接导入中间模型继续训练，不需重新开始。最终训练完成时保存最终训练模型，为预测模型做准备。

Here Insert Picture Description
预测程序为独立代码模块，可独立运行。预测主要分为四步：

第一步：配置预测环境；
第二步：预处理预测图片。将非RGB图片进行模式转换，转为RGB模式；对预测图片进行裁剪和缩放，调整大小为[3, 224, 224]；
第三步：加载预测模型并将预测图像放入模型进行预测；
第四步：输出预测结果，确定结果所属类别。本研究共预测图片6766张，预测准确率predict accuracy=94%，部分图片预测结果如下图所示。

Here Insert Picture Description
To facilitate the application of the promotion, we have designed and developed a micro-channel applet. Mobile phone users to use only the real scan Chinese Herbal Medicine, herbs can quickly identify the name, the applet will push the taste of Chinese herbal medicines at the same time identified and Indications Meridian information, FIG. (A) as shown below. Currently it has a collection of small programs potency and taste of common Chinese herbal role 257, as shown (lower) of FIG.

Here Insert Picture Description
This section is currently in internal testing program is still small, on-line time to be determined

Here Insert Picture Description
Fly paddle reproduce the classic and cutting-edge deep learning algorithm, fast iterative update, synchronize the academic front, and provide high-quality support industrial-grade pre-migration training model study, the use of Chinese herbal medicines were flying propeller picture identification to achieve better recognition results, based on applied micro letter applet is also more convenient for people to use, combined with the strong force AI Studio platform operators, data-rich, massive open-source algorithms and share high-quality projects, open for AI technology learners and industrial applications are a fast access to artificial Intelligence the door.

From the identification of Chinese herbal medicines, Chinese herbal medicine Authenticity, on to the tongue, facial features, pulse diagnosis of disease and identify physical medicine and other fields, we have yet to be applied to in-depth exploration of deep learning technology. We hope in the future to fly paddle to help more and more industries completed AI enabling to inject new vitality into the development of the pharmaceutical industry.

Want to connect with more depth learning developer, join the fly paddle official QQ group: 796 771 754.

If you want to learn more about the content flying paddle PaddlePaddle more, see the following document.

Official website address:
https://www.paddlepaddle.org.cn/
GitHub Address:
https://github.com/paddlepaddle/paddle

PaddlePaddle developer

Published 116 original articles · won praise 1 · views 4581

Private letter concerns

PaddlePaddle teach you to identify a variety of herbs!

Guess you like