Assisting the digital transformation of the breeding industry, developing and building a sheep face recognition system based on deep learning models

The digitalization process of the breeding industry is still relatively slow in China. Some large factories have done some related work in the previous exploration period, but due to various factors, it has not been widely spread. Digitalization should not be understood as very Digitalization should not be something that is out of reach for high-end people, and it should not become a gap that is difficult to cross between middle and low-income people. Future digitalization must have a low threshold and be something that can actually bring benefits. In our previous article, we have already based it on dairy cows. A lot of exploratory work has been done on the breeding scene. If you are interested, you can read it by yourself.

"Facilitating the digital transformation of the breeding industry, developing and building a cow face recognition system based on deep learning models"

Here we briefly summarize and review some classic face recognition models. Face recognition is an important task in computer vision, and there are many commonly used models for face recognition. Here are some common face recognition models:

1. VGGFace: VGGFace is a model based on the VGGNet architecture and trained on a large-scale face database. It can be used for face recognition and verification and has better performance. The VGGFace model is a face recognition model based on the VGGNet architecture. The construction principle is as follows:

架构设计:VGGFace模型的主体结构遵循了VGGNet架构设计的思想。它由一系列卷积层和全连接层组成。卷积层用于提取图像的特征表示,而全连接层用于进行分类或验证。
卷积层组:VGGFace模型采用了多个大小为3x3的卷积核进行卷积操作,并通过非线性激活函数(如ReLU)引入非线性特征。为了增加网络的深度和非线性能力,它采用了多个相同大小的卷积层堆叠在一起。
池化层:在卷积层之后,VGGFace模型使用最大池化层,以减小特征图的尺寸,并在空间上进行下采样。最大池化操作有助于保留重要的特征并减少冗余信息。
全连接层:在卷积和池化操作之后,VGGFace模型使用全连接层将提取出的特征映射到对应的类别或验证结果。通常,最后的全连接层会经过softmax激活函数进行分类。
 
优点:
构建简单,易于理解和实现。
VGGFace模型具备较强的特征表达能力和模式识别能力,适合用于人脸识别和验证任务。
缺点:
模型参数量较大,导致计算复杂度较高,需要更多的计算资源。
该模型在训练和部署时可能遇到困难,由于模型结构庞大,需要更多的存储和内存。

2. FaceNet: FaceNet is a face recognition model based on convolutional neural network and triplet loss function. It is able to embed face images into high-dimensional feature space and perform face recognition and verification through Euclidean distance. The FaceNet model is a face recognition model based on a convolutional neural network and a triplet loss function. Its construction principle is as follows:

网络架构:FaceNet模型采用了一种深度卷积神经网络架构。它通过多层卷积和池化操作,逐步提取和学习输入图像的特征表示。最后经过全连接层将特征映射到一个高维特征空间中。
三元组损失函数:FaceNet模型使用三元组损失函数来优化特征表示。对于每个训练样本,从训练集中选择三个样本:锚样本(anchor)、正样本(positive)和负样本(negative)。锚样本和正样本属于同一个人,而负样本属于不同的人。三元组损失函数的目标是使得同一人的特征距离尽量小,不同人的特征距离尽量大。
特征嵌入空间:FaceNet模型学习到的特征向量被映射到一个高维特征空间中。通过欧氏距离或余弦距离,可以衡量不同人之间的距离,以进行人脸识别和验证。
优点:
通过三元组损失函数学习到的特征具有较好的表达能力,适合用于人脸识别和验证任务。
使用三元组损失函数能够直接优化特征向量的距离度量,使得相同人的特征更加接近,不同人的特征更加分散。
缺点:
模型相对复杂,训练过程可能较为耗时。
需要大规模的训练数据和计算资源。模型的性能高度依赖于训练数据的质量和数量。
需要注意的是,FaceNet模型中还可以应用其他技术进行改进,如加权三元组损失函数、在线硬负采样和样本挖掘等。这些改进技术可以进一步提升模型性能和鲁棒性。

3. DeepFace: DeepFace is a face recognition model proposed by Facebook, based on convolutional neural networks and multi-layer perceptrons. It can perform face recognition, verification and attribute prediction with good performance. The DeepFace model is a face recognition model proposed by Facebook. Its construction principle is as follows:

网络架构:DeepFace模型是一个深度卷积神经网络的模型。它由多个卷积层和全连接层构成。卷积层用于提取人脸图像的特征表示,全连接层用于进行人脸识别。
人脸对齐:在DeepFace模型中,首先进行人脸对齐。通过检测人脸关键点,对输入图像进行变换,使得人脸在图像中对齐,减少变化因素的干扰。
特征提取:DeepFace模型使用多个卷积层提取输入图像的特征。通过多层卷积和池化操作,有效地捕捉不同尺度和抽象级别的特征信息。
全连接层和分类:在特征提取后,DeepFace模型通过全连接层将特征映射到人脸类别或特征向量表示。对于人脸识别,该模型可以通过训练识别出不同的人脸。
优点:
DeepFace模型具有较强的特征表达能力和模式识别能力,适合用于人脸识别任务。
通过人脸对齐技术,可以减少人脸间的姿态和尺度差异带来的影响,提高了模型的鲁棒性和准确性。
缺点:
DeepFace模型的训练和调参可能需要大量的计算资源和时间。
由于模型结构庞大,部署和推理的复杂性可能增加。
需要注意的是,DeepFace模型在提出时在LFW(Labeled Faces in the Wild)数据集上取得了很好的性能。然而,它也面临一些限制,如对于遮挡和光照变化的敏感性。为了获得更好的性能,可以结合其他的预处理方法和技术,如数据增强、特征融合等。

4. ArcFace: ArcFace is a face recognition model based on angular cosine intervals. It achieves better face recognition performance by reducing the angular cosine distance between feature vectors of the same person and expanding the distance between different people. The ArcFace model is a face recognition model based on angle cosine intervals. Its construction principle is as follows:

构建特征提取网络:ArcFace模型通常使用卷积神经网络(CNN)作为特征提取器。通过多层卷积和池化操作,提取人脸图像的特征表示,并通过非线性激活函数引入非线性特征。
嵌入特征映射:ArcFace模型将经过特征提取的人脸特征映射到一个高维特征空间。该特征空间的设计目的是使得同一人的特征之间的角度余弦距离尽量小,而不同人的特征之间的距离尽量大。通过引入角度余弦间隔,使得特征在嵌入特征空间上进行了偏移。
添加ArcMargin Loss:为了优化特征空间的判别性,ArcFace模型引入了ArcMargin Loss。该损失函数通过在特征空间中计算特征向量和标签向量之间的角度余弦距离,并最小化同一人的特征距离,放大不同人的特征间距。通过调整ArcMargin Loss的超参数,可以控制同一人的特征向量之间的边界。
优点:
ArcFace模型在人脸识别任务中具备较高的准确性和鲁棒性。通过引入角度余弦间隔,提高了特征的可分性。
可以通过调整ArcMargin Loss的超参数来灵活控制特征向量的边界,使得模型适应各种复杂性和类别之间的差异。
缺点:
ArcFace模型在训练过程中需要大规模和均衡的数据集,对数据质量和数量有一定的要求。
模型相对复杂,需要更多的计算资源和时间来进行训练和推理。
需要注意的是,ArcFace模型的性能还受到超参数设置的影响,如角度余弦间隔的大小和ArcMargin Loss的权重。合适的超参数设置对于获得最佳性能非常关键。

5. OpenFace: OpenFace is an open source face recognition model for face recognition and verification. It uses a deep neural network for feature extraction and distance calculation of face images, and has good performance. The OpenFace model is an open source face recognition model used for face recognition and verification. Its construction principle is as follows:

人脸检测:首先,OpenFace模型使用人脸检测算法(如基于Haar级联分类器或深度学习的检测器)来在输入图像中定位和提取出人脸区域。
人脸对齐:为了减少姿态和尺度的影响,OpenFace模型通过人脸对齐操作将检测到的人脸区域进行几何变换。通常采用基于关键点(如眼睛、鼻子和嘴巴)的人脸对齐方法,使得不同人脸在特征位置上对齐。
特征提取:在人脸对齐后,OpenFace模型使用卷积神经网络(CNN)来提取人脸图像的特征表示。通过多层卷积和池化操作,高级特征被提取出来,形成一个固定大小的特征向量。
特征分类和识别:OpenFace模型通过全连接层将特征向量映射到对应的人脸类别或验证结果。可以使用softmax激活函数进行多类别分类,或使用阈值判定进行人脸验证。
优点:
OpenFace模型是开源的,提供了开源的实现代码和预训练模型,方便使用和定制。
通过人脸对齐操作,可以有效处理姿态和尺度变化带来的影响,提高了模型的准确性和鲁棒性。
模型具备一定的通用性,适用于不同的人脸识别和验证任务。
缺点:
OpenFace模型在大规模和复杂数据集上的性能可能有所限制。
模型在性能和速度方面可能不如某些专门优化的商业人脸识别模型。
模型的准确性受人脸检测和对齐算法的质量和稳定性影响。

6. SphereFace: SphereFace is a face recognition model based on spherical geometry. It improves the performance of face recognition by introducing spherical constraints in the feature space. The SphereFace model is a face recognition model based on spherical geometry. Its construction principle is as follows:

特征提取:SphereFace模型使用卷积神经网络(CNN)架构作为特征提取器。通过多层卷积和池化操作,提取人脸图像的特征表示。
特征映射:SphereFace模型将特征向量映射到球面上。这是通过在特征向量上进行归一化和投影操作实现的。归一化可以使特征向量落在单位球面上,然后通过投影操作将球面上的特征映射到一个限定范围内。
余弦角度分类器:在特征映射之后,SphereFace模型使用余弦角度分类器来进行人脸识别。该分类器通过计算特征向量与类别标签之间的角度余弦值,并将其作为分类的依据。具体而言,该模型将分类问题转化为特征向量与类别标签之间的二分类问题,通过优化角度余弦的Margin来学习特征表示。
优点:
SphereFace模型的特征映射到球面上,这利用了球面几何的特点,使特征向量具有更好的可分性。
通过优化角度余弦的Margin,该模型通过强制不同类别的特征向量之间的分离度,提高了人脸识别的准确性和稳定性。
缺点:
SphereFace模型对于面部姿态和遮挡等复杂情况的鲁棒性可能稍弱,对输入图像的质量和预处理要求较高。
由于特征映射到球面上的操作,模型的复杂度增加,可能导致训练和推理的计算开销加大。

The overall technical route is the same as the previous article. The core purpose here is to move face recognition technology to sheep face data recognition. The overall technical process diagram is as follows:

 The overall project is mainly divided into three parts:

1. Data collection

This part mainly involves connecting with cooperative breeding plants and installing equipment to collect data on the spot. Our data is connected to the storage platform of Alibaba Cloud oss. This area varies from person to person and can be processed according to your actual needs. , you can also store it locally.

2. Model development

This part is the core content of the entire project. Regarding the model, I have summarized and compared a number of classic face recognition network models so far. Here I finally decided to choose the arcFace model for implementation. There was not much modification at the model level. The main focus is on development and adaptation related to data sets. If the native model can perform well, then the subsequent ideas will be broadened.

3. Application construction

In all actual projects, the development, training, evaluation and testing of the model is only a part of the project. For the model to really work, it is necessary to develop and build applications, that is, to implement the business logic part. The main thing here is to be able to To respond to external input images and return correct results.

First look at the data set:

Stored under each directory ID is a sheep face data set of a single sheep.

This article still uses the arcFace model. You can refer to the previous description and will not go into details here.

The overall training process loss data is as follows:
 

28.49554878234863 28.36122703552246
26.448251190185548 30.44441795349121
22.917339782714844 28.14263343811035
19.10800880432129 28.265657424926758
16.56169502258301 28.006635665893555
13.722219200134278 25.89620590209961
11.93085578918457 20.021202087402344
10.174094924926758 20.273380279541016
9.257895278930665 15.120355129241943
8.078882255554198 15.526759147644043
7.367256584167481 12.761467933654785
6.78815788269043 9.665631294250488
6.172802677154541 7.019716501235962
5.745314445495605 7.525074243545532
5.494876651763916 7.099688529968262
5.396396732330322 6.39738130569458
5.2423255729675295 6.338007211685181
5.130193157196045 6.198169708251953
5.07472505569458 5.850244760513306
5.0294198799133305 6.410634994506836
4.994203395843506 5.985965967178345
4.966535930633545 5.553969860076904
4.940446796417237 5.938147068023682
4.917659816741943 5.585787057876587
4.895496597290039 5.407972097396851
4.874711971282959 5.763866901397705
4.853940620422363 5.4627299308776855
4.83395866394043 5.320657253265381
4.814438667297363 5.692613124847412
4.795453205108642 5.381541967391968
4.776773281097412 5.2457475662231445
4.758707256317138 5.6171555519104
4.7409313201904295 5.311766147613525
4.723709411621094 5.181070327758789
4.7071023941040036 5.552199125289917
4.690928859710693 5.245966911315918
4.6751644325256345 5.121516942977905
4.659910850524902 5.489439964294434
4.645095729827881 5.181874752044678
4.630678901672363 5.067218780517578
4.616962566375732 5.433080673217773
4.6036177825927735 5.127049922943115
4.590720138549805 5.019947052001953
4.5783258819580075 5.387711048126221
4.566368465423584 5.081918716430664
4.554848136901856 4.979071617126465
4.543887023925781 5.349139213562012
4.533280830383301 5.040514707565308
4.52316743850708 4.942654371261597
4.513592491149902 5.311756610870361
4.504299716949463 5.004668951034546
4.495562362670898 4.913930892944336
4.487207355499268 5.284327745437622
4.479306697845459 4.977153778076172
4.4717269706726075 4.890130519866943
4.46464916229248 5.258784770965576
4.457883949279785 4.953750371932983
4.451617317199707 4.869576930999756
4.445729465484619 5.241544008255005
4.440151042938233 4.934422016143799
4.4350037765502925 4.852663040161133
4.430132427215576 5.224523544311523
4.4256170463562015 4.917298793792725
4.421499710083008 4.839732885360718
4.417640247344971 5.212682247161865
4.414067325592041 4.906010389328003
4.410848846435547 4.829944372177124
4.4079525184631345 5.203676700592041
4.405214042663574 4.896440029144287
4.402802867889404 4.823580265045166
4.400668125152588 5.197094917297363
4.39873815536499 4.890859603881836
4.397003974914551 4.8192033767700195
4.3954691696167 5.194169282913208
4.394099445343017 4.887556791305542
4.393009090423584 4.817701101303101
4.392023506164551 5.191941261291504
4.391167526245117 4.887598752975464
4.390453968048096 4.817023515701294
4.389844932556152 5.190695762634277
4.389398975372314 4.886103630065918
4.3889604187011715 4.816188097000122
4.388584194183349 5.190183639526367
4.388340282440185 4.884408235549927
4.388122501373291 4.815371990203857
4.387853298187256 5.188681125640869
4.387586517333984 4.884839057922363
4.387446060180664 4.815969944000244
4.387167377471924 5.191036701202393
4.386994647979736 4.88573145866394
4.386748600006103 4.814674615859985
4.38659257888794 5.1875269412994385
4.386337928771972 4.884407997131348
4.386108531951904 4.816586256027222
4.38583589553833 5.187043905258179
4.385676670074463 4.884454965591431
4.385481796264648 4.815013408660889
4.385194244384766 5.187936305999756
4.384989185333252 4.883215427398682
4.384740600585937 4.81392765045166

The visualization looks like this:

It can be seen that it is relatively stable.

The next step is to use faiss to create a vectorized feature database. A brief summary of faiss:

faiss是Facebook AI Research开源的一款用于高效相似性搜索和聚类的库。它特别擅长处理大规模向量数据,并提供了一系列高性能的算法和数据结构。下面是对faiss的详细介绍及其对应的优点和缺点:
算法支持:faiss提供多种高效的相似性搜索算法,包括基于倒排索引(inverted file)的算法、基于k-means的聚类算法、局部敏感哈希(LSH)算法等。这些算法具有广泛的适用性,能够满足不同的搜索和聚类需求。
高性能:faiss在设计上针对性能进行了优化,具备高度并行化和高效利用硬件的特点。它支持使用GPU进行计算加速,能够快速处理大规模向量数据,实现高速的相似性搜索和聚类。
易于使用:faiss提供了简单易用的API接口,方便用户进行向量索引、查询和聚类操作。它具备友好的Python和C++接口,还支持主流的机器学习框架,如PyTorch和TensorFlow。
可扩展性:faiss支持在线学习和增量索引的功能。它允许用户动态添加和删除向量数据,而无需重新构建索引结构,从而提供了较强的可扩展性和灵活性。
内存优化:faiss针对大规模向量数据的特点,提供了各种内存优化的技术。例如,faiss可以将数据划分为多个索引分片,以减少内存使用量;还可以对索引结构进行精细的参数调优,以平衡性能和内存占用。
优点:
faiss具备高性能和高效率的特点,能够快速处理大规模向量数据的相似性搜索和聚类任务。
提供了多种高性能的相似性搜索算法和数据结构,满足不同的搜索需求。
具有较好的可扩展性,支持在线学习和增量索引的功能。
提供了友好的API接口和多语言支持,使得使用和集成相对简单。
缺点:
faiss的主要应用场景是相似性搜索和聚类,不适用于其他更复杂的数据分析任务。
部署和使用faiss可能需要一定的技术背景和理解。

In some of my previous blog posts, I have introduced faiss, a vectorized retrieval tool, in more detail. If you are interested, you can read it by yourself. I will not go into details here: "Faiss Large-Scale Vector Detection in Face Recognition Scenarios
" Performance Test Evaluation Analysis》

"Developing and building a face recognition system based on arcFace+faiss"

"Developing and building a face recognition system based on facenet+faiss"

"Large-scale vector retrieval library Faiss learning summary record"

"First experience of developing and implementing a document query system based on text2vec and faiss"

It's fairly detailed, so if you're interested, you can just go back and read it. The implementation of this part of the retrieval application is completely consistent with the previous logic, so it will not be elaborated again.

The core code implementation of feature database creation is as follows:

def batch2Vec(picDir="datasets/", saveDir="DB/", save_path="DB.json"):
    """
    批量数据向量化处理
    """
    if not os.path.exists(saveDir):
        os.makedirs(saveDir)
    feature=[]
    cows={}
    count=0
    for one_cow in os.listdir(picDir):
        oneDir=picDir+one_cow+"/"
        print("one_cow: ", one_cow, ", one_num: ", len(os.listdir(oneDir)), ", count: ", count)
        for one_pic in os.listdir(oneDir):
            one_path=oneDir+one_pic
            one_vec=sinleImg2Vec(pic_path=one_path)
            if one_cow in cows:
                cows[one_cow].append([one_pic, one_vec])
            else:
                cows[one_cow]=[[one_pic, one_vec]]
            feature.append([one_path, one_vec])
        count+=1
    print("feature_length: ", len(feature))
    with open(saveDir+save_path, "w") as f:
        f.write(json.dumps(feature))

 At this point, the feature database has been constructed, and it can be retrieved next. The relevant implementations have been mentioned in the previous article, so I won’t go into details here. Let’s look at the example directly: [Image input
]

【Result output】

Top5:
 

Top15:

【Image input】

【Result output】

Top5:

Top15:

 In order to integrate the entire calculation process, a dedicated visual system interface has been developed to help operators use the entire project conveniently. The example effect is as follows:

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/133269204