AI generative face image recognition based on deep learning

The rapid development of AIGC (AI Content Generation) technology has indeed provided creators with efficient productivity tools, but it has also caused some problems and challenges. These technologies can generate fake and real images, face-changing videos, etc., providing criminals with opportunities for abuse. Among them, some criminals may use AIGC technology to create false news, violate copyrights, bypass live identity verification, spread rumors and slander others, and engage in illegal activities such as extortion to seek improper benefits. These behaviors have had a serious negative impact on society and undermined the authenticity and credibility of information.

Therefore, we need to recognize the potential risks of AIGC technology and take appropriate measures to deal with it. This includes strengthening the formulation and implementation of laws and regulations, establishing effective regulatory mechanisms, strengthening the safety and traceability of technology, improving the public's technological literacy and vigilance, and strengthening education and publicity to improve people's ability to identify false information. Only under the premise of reasonable supervision and effective management can AIGC technology better bring benefits to creators and society and promote the progress of science and technology and art.

Effective use of deep learning technology to identify AI-generated face images has become a hot research field in recent years, attracting more and more attention from industry and research institutions. This article chooses the public iFakeFaceDB data set and ResNet-50 deep learning model to build an AI generative face image recognition system based on deep learning.

data set

The iFakeFaceDB dataset is a dataset for face image synthesis and spoofing detection. It contains real face images as well as fake face images generated through artificial synthesis. The purpose of this dataset is to help researchers develop and evaluate face synthesis techniques and deception detection algorithms. The use of the iFakeFaceDB dataset can help improve the accuracy and robustness of face synthesis and spoofing detection. Compared to previous databases and to prevent spurious detectors, iFakeFaceDB removes fingerprints generated by the GAN architecture through a method called GANprintR (GAN Fingerprint Removal) while maintaining a very realistic appearance. **As a result of the GANprintR step, iFakeFaceDB poses a higher challenge to advanced fake detectors compared to other databases.

deep learning model

ResNet-50 is a deep convolutional neural network model proposed by Kaiming He et al. of Microsoft Research in 2015. It is a member of the ResNet (Residual Network) series of models and is widely used in computer vision tasks such as image classification, target detection, and image segmentation.

The main feature of ResNet-50 is the introduction of residual connection, which solves the problems of gradient disappearance and expression ability degradation in deep networks through direct cross-layer connections. This connection method allows information to directly skip some layers in the network, making it easier for the network to learn deeper feature representations.

ResNet-50 consists of 50 convolutional layers, including multiple residual blocks. Each residual block consists of two 3x3 convolutional layers and a skip connection. At the beginning and end of the network, there is also a convolutional layer and a fully connected layer to adapt to the specific task.

During the training process, ResNet-50 usually uses pre-trained weights, which are pre-trained on large-scale image datasets. Doing so can speed up the convergence of the model and improve the model's generalization ability.

In view of this, this article chooses ResNet-50 as the preferred model, which can also be easily replaced with other classification models such as shufflenet, MobileNet, EfficientNet, etc.

training code

Import the required libraries:

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import os

Set the dataset path:

dataset_path = "~/data/iFakeFaceDB"

Define a custom dataset class:

class iFakeFaceDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.images, self.labels = self.load_dataset()

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image = self.images[idx]
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label

    def load_dataset(self):
        images = []
        labels = []
        for idx, folder_name in enumerate(os.listdir(self.root_dir)):
            folder_path = os.path.join(self.root_dir, folder_name)
            if os.path.isdir(folder_path):
                for image_name in os.listdir(folder_path):
                    image_path = os.path.join(folder_path, image_name)
                    image = Image.open(image_path)
                    images.append(image)
                    labels.append(idx)
        return images, labels

Data preprocessing:

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Load the dataset and divide it into training and test sets:

dataset = iFakeFaceDataset(dataset_path, transform=transform)
train_dataset, test_dataset = train_test_split(dataset, test_size=0.2, random_state=42)

Create data loader:

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Build ResNet-50 model:

model = torchvision.models.resnet50(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)

Define loss function and optimizer:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Training model:

num_epochs = 100
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

best_accuracy = 0.0
for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    print("-" * 10)

    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = model(images)
        loss = criterion(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)

    epoch_loss = running_loss / len(train_dataset)
    print(f"Train Loss: {epoch_loss:.4f}")

    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")
    print()
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_model_wts = copy.deepcopy(model.state_dict())

Save the model:

model.load_state_dict(best_model_wts)
torch.save(model.state_dict(), "resnet50_model.pth")
print(f"Best Accuracy: {
      
      best_accuracy:.2f}%")

The training results are as follows:

Epoch 100/100
----------
Best val Acc: 99.50%

Reasoning code

import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
from torchvision.models import resnet50
import time

# 加载预训练的ResNet-50模型
model = resnet50(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)
model.load_state_dict(torch.load('resnet50_model.pth'), strict=False)
model.eval()

# 定义图像预处理的转换
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 加载图像并进行预处理
image_path = '~/data/iFakeFaceDB/TPDNE/0000011.jpg'  # 替换为实际图像的路径

image = Image.open(image_path)

since = time.time()
input_tensor = preprocess(image)
input_batch = input_tensor.unsqueeze(0)

# 使用GPU进行推理(如果有可用的GPU)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
input_batch = input_batch.to(device)
model = model.to(device)

# 进行推理
with torch.no_grad():
    output = model(input_batch)


# 获取预测结果的索引和概率
_, predicted_idx = torch.max(output, 1)
predicted_prob = torch.nn.functional.softmax(output, dim=1)[0] * 100

time_elapsed = time.time() - since
print("FPS:", 1/ time_elapsed)
    
# 打印预测结果
print("预测结果:", predicted_idx.item())
print(f"概率: {
      
      predicted_prob[predicted_idx.item()].item():.2f}%")

The prediction results are as follows:

预测结果: 1
概率: 99.92%

Deploy to AlxBoard

Insert image description here
Insert image description here

postscript

The AI ​​generative face image recognition system based on deep learning can be applied in many fields, such as face recognition on social media platforms, identification and prevention of false information, etc. However, it should be noted that this system still has a certain misjudgment rate and limitations, so other factors, such as manual review and other auxiliary technologies, need to be taken into consideration in practical applications to improve accuracy and reliability.

Guess you like

Origin blog.csdn.net/shanglianlm/article/details/132575474