HUAWEI CLOUD ModelArts product experience

Free trial: one-click deployment of commodity recognition models in supermarkets

1. Work preparation

1.1 Account registration and real-name authentication

You have registered a HUAWEI ID, activated HUAWEI CLOUD, and performed real-name authentication, and checked the account status before using ModelArts. The account cannot be in arrears or frozen.

1.2 Configure delegated access authorization

The use of ModelArts involves service interactions such as OBS, SWR, and IEF. When using ModelArts for the first time, users need to configure delegated authorization to allow access to these dependent services.

1. Log in to the ModelArts management console with a HUAWEI CLOUD account , click "Global Configuration" in the left navigation bar, enter the "Global Configuration" page, and click "Add Authorization".

image-20230629172729116

2. On the "Access Authorization" page, select the "User Name" that needs to be authorized, select the new delegation and its corresponding permission "Ordinary User", and check "I have read and agree to the "ModelArts Service Statement"", and then Click Create.

image-20230629172840221

2. Subscription model

The model of commodity recognition in supermarkets is shared in AI Gallery. You can go to AI Gallery and subscribe to this model for free.

2.1 Search for models in AI Gallery.

Method 1: Click the link of the commodity identification model in the supermarket to enter the model details page.

Method 2: On the left menu bar of the ModelArts management console, click **"AI Gallery"** to enter the AI ​​Gallery. Choose "Asset Mart > Model", search for "Supermarket Commodity Recognition", and click the name to enter the model details page.

2.2 Complete the model subscription.

On the model details page, click Subscribe, read and agree to the Data Security and Privacy Risk Assumption Terms and HUAWEI CLOUD AI Gallery Service Agreement, and click Continue Subscribing.

Once the subscription model is complete, the page's "Subscribe" button will read "Subscribed".

image-20230629173128566

3.3 Enter the subscription list of the ModelArts console.

On the model details page, click Go to console. On the pop-up "Select Cloud Service Region" page, select the cloud service region where ModelArts is located, and click "OK" to jump to the "AI Application Management > AI Application > My Subscription" page of the ModelArts console.

image-20230629173211920

In the "My Subscriptions" list, click in front of the model name img. When the status of the version list of the subscribed model is displayed as "Ready", it means that the model can be used.

image-20230629173301068

3. Deploy online services using the subscription model

After the model is successfully subscribed, the model can be deployed as an online service.

1. On the "AI Application Management > AI Application > My Subscription" page, click in front of the model name img, and click "Deployment > Online Service" in the expanded version list to jump to the deployment page.

image-20230629173415045

2. On the deployment page, refer to the following instructions to fill in the key parameters.

  • "Name": Customize the name of an online service, or you can use the default value. Here we take "Supermarket Commodity Identification Service" as an example.
  • Resource Pool: Select Public Resource Pool.
  • "AI Application Source" and "Select AI Application and Version": The subscription model will be automatically selected.
  • "Computing node specification": Select the "Limited Time Free" resource in the drop-down box, check it and read the free specification.
  • Other parameters can use default values.

image-20230629173712608

image-20230629173722554

3. After the parameter configuration is complete, click "Next", and after confirming the specification parameters, click "Submit" to start the deployment of online services.

image-20230629173820438

4. Go to the "Deployment > Online Service" page and wait for the service status to change to "Running", indicating that the service deployment is successful. Estimated duration is about 4 minutes

image-20230629174418145

4. Forecast results

1. After the online service is deployed, click the service name to enter the service details.

2. On the "Prediction" tab, click "Upload" to upload a test image, and click "Prediction" to view the prediction results. A sample image is provided here for prediction.

image-20230629174518154

image-20230629174834950

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-wwziUuh1-1688053129069)(.../zh-cn_image_0000001449294477.jpg)]

3. Prediction results

image-20230629174628871

5. Clean up resources

After the experience is over, it is recommended to suspend or delete the service to avoid resource occupation and waste of resources.

  • Stop online services: In the online service list, click More > Stop in the corresponding service operation column.
  • Delete an online service: In the online service list, click More > Delete in the operation column of the corresponding service.

One-click running Notebook to realize the inventory of steel bars on the construction site

1. Work preparation

Ditto, slightly

2. Run Notebook with one click

1. Click the case link Computer vision-based steel bar number detection to enter the case details page.

2. Click "Run in ModelArts" on the right side of the details page.

3. The system automatically enters the JupyterLab page of ModelArts. If you have not logged in to HUAWEI CLOUD, log in according to the prompts.

4. After logging in, it will prompt in the upper right corner of the page that it is connecting with ModelArts, please wait for the connection to complete.

5. In the resource management window on the right, it is recommended to switch to a limited-time free GPU specification for training, which can improve training efficiency.

image-20230629184802447

6. After the resource switching is completed, you can understand the content steps of the case and run it.

7. Click "No Kernel" on the right side of the ipynb file, and select the AI ​​framework on the pop-up "Select Kernel" page.

8. Repeatedly click on the navigation bar imgto run each step step by step; you can also run all the steps of the case with one click

single step button
img

Run all steps with one click
img

When running, the hollow circle in the upper right corner indicates that the code has not started or is completed, and the solid circle imgindicates that the code is running.
img

9. In the training session of step 9, a total of 25 iterative training sessions will take 60 seconds each time, totaling 25 minutes. Please wait patiently.

image-20230629184817824

image-20230629193241221

Garbage classification (using the new version of automatic learning to achieve image classification)

1. Preparation

1. Register a HUAWEI ID and activate HUAWEI CLOUD and real-name authentication

2. Create an OBS bucket

Log in to the OBS management console , and click Create Bucket in the upper right corner of the bucket list page to create an OBS bucket. For example, create an OBS bucket named "c-flowers".

image-20230629212047300

When selecting an OBS path in ModelArts, the created OBS bucket cannot be found?

The reason is that the created OBS bucket and the used ModelArts service are not in the same region.

Note that the area in the figure below should be consistent with ModelArts

image-20230629212908194

2. Prepare the training data set

1. Enter the AI ​​Gallery , find the 8 types of common household waste image datasets on the "Datasets" page in "Asset Mart" > "Data" , and click "Download" on the right.

image-20230629213327466

image-20230629213334365

2. Select the corresponding cloud service region. For example: North China-Beijing 4, you need to ensure that the region you select is consistent with the region where your management console is located.

image-20230629213400448

3. Enter the "Download Details" page and fill in the following parameters.

  • Download method: ModelArts dataset.
  • Target area: North China - Beijing IV.
  • Data Type: Image.
  • Dataset output location: select an empty directory of your OBS path
  • Dataset input location: select your OBS path
  • name: custom

image-20230629213618695

4. Complete the parameter filling, click "OK", and automatically jump to the "My Downloads" tab of the AI ​​Gallery personal center, wait for about 5 minutes for the download to complete, and enter the "Target Location" to view it in the Object Storage Service (OBS) Dataset storage location.

image-20230629214211252

3. Configure delegated access authorization

The use of ModelArts involves service interactions such as OBS, SWR, and IEF. When using ModelArts for the first time, users need to configure delegated authorization to allow access to these dependent services.

This has been configured before, so you only need to add the existing delegate

4. Create a new version of the automatic learning image classification project

1. Enter the ModelArts management console , select "Automatic Learning" in the left navigation bar

2. After entering the new version of the automatic learning page, click to select "Image Classification" to create a project. Complete the parameter filling:

  • Billing mode: billing on demand.
  • Name: Customize your project name.
  • Description: Customize the description of your project details.
  • Dataset: Click the drop-down box to select the dataset downloaded from AI Gallery (the datasets displayed in the drop-down box are all datasets created under your name in chronological order, and the dataset you created recently is selected from the AI ​​Gallery. Gallery downloaded data set).
  • Output Path: Select the path under your OBS folder.
  • Training Specification: Click the drop-down box to select the training specification.

image-20230629214525706

image-20230629214647345

3. After filling in the parameters, click "Create Project" to jump to the running overview page of the new version of automatic learning.

image-20230629214739805

5. Run the workflow

After the project is created, it will automatically jump to the running overview page of the new version of automatic learning. At the same time, your workflow will automatically start running from the data labeling node.

1. In the data labeling node, when the data labeling node turns orange, it is in the "waiting for operation" state.

2. Click "Continue to run" and the workflow will automatically run from the data annotation node to the service deployment node in sequence. During this period, no user action is required.

image-20230629214932122

3. When the workflow runs to the "Service Deployment" node, the status will change to "Waiting for input", and you need to fill in your input parameters:

  • AI Application: The default is your automatic learning project name.
  • Select the AI ​​application version: the system default, no need to choose again.
  • Resource pool: "Public resource pool" is selected by default, and you can also select a corresponding dedicated resource pool according to your needs.
  • Computing node specifications: Select the corresponding specification according to your actual needs. The configuration costs of different specifications are different. The specification used in this case is [Free for a limited time] CPU: 1 core 4GB.
  • Split (100%): The default is 100.
  • Number of nodes: The default is 1.
  • Whether to stop automatically: In order to avoid resource waste, it is recommended that you turn on this switch, and select the time for automatic stop according to your needs.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-f2q27d3q-1688053129075)(https://s2.loli.net/2023/06/29/D62UOMWctQoLymK.png )]

4. After filling in the parameters, click "Continue to run" on the right side of the running status, and click "OK" in the confirmation pop-up window to continue to complete the workflow.

6. Predictive Analysis

The completed workflow will automatically deploy the corresponding online services, and you only need to make predictions on the corresponding service details page.

1. Click "Instance Details" on the service deployment node or in the management console, select "Deployment > Online Services" to enter the online service details page.

image-20230629221205325

2. On the service details page, click to select the "Forecast" tab.

image-20230629221354203

3. Upload a picture that needs to be predicted, click the prediction, and you can view your prediction results in the prediction result display area on the right.

image-20230629221413841

Seven, clear the corresponding resources

After the prediction is completed, it is recommended to shut down the service to avoid unnecessary billing.

1. Stop running the service

Return to the online service, click "More>Stop" in the operation column of the corresponding service name, and stop the service

image-20230629222048036

2. Clear data in OBS

  1. In the service list on the left navigation bar of the console img, select Object Storage Service OBS to enter the OBS service details page.
  2. Select "Bucket List" in the left navigation bar, find the OBS bucket you created in the list details, and enter the OBS bucket details.
    img
  3. On the bucket details page, select "Object" in the left navigation bar, select an unnecessary storage object in the "Name" column on the right, click "Delete" at the top or click "More" in the operation column, and select "Delete ” to delete the corresponding storage object.
    img
    img

Build a model using a custom algorithm (handwritten digit recognition)

1. Preconditions

You have registered a HUAWEI ID and activated HUAWEI CLOUD, and checked the account status before using ModelArts. The account cannot be in arrears or frozen.

2. Prepare training data

The data used in this case is the MNIST data set. You can download the data set from the MNIST official website to your local. The following 4 files must be downloaded. (Need to flip out)

Link: https://pan.baidu.com/s/1jhfLx3PRU1c9zqZX74tJmQ?pwd=o261
Extraction code: o261

image-20230629223046971

"train-images-idx3-ubyte.gz": the compressed file of the training set. The training set contains a total of 60,000 samples.
"train-labels-idx1-ubyte.gz": The compressed package file of the training set labels. The training set labels contain a total of 60,000 sample category labels.
"t10k-images-idx3-ubyte.gz": the compressed package file of the verification set. The validation set contains a total of 10,000 samples.
"t10k-labels-idx1-ubyte.gz": The compressed package file of the validation set labels. The validation set label contains a total of 10,000 sample category labels.

3. Prepare training files and inference files

Three files: train.py, customize_service.py, config.json

Change the file type to UTF-8

train.py

# base on https://github.com/pytorch/examples/blob/main/mnist/main.py

from __future__ import print_function

import os
import gzip
import codecs
import argparse
from typing import IO, Union

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms    
from torch.optim.lr_scheduler import StepLR

import shutil


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


# 模型训练,设置模型为训练模式,加载训练数据,计算损失函数,执行梯度下降
def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            if args.dry_run:
                break


# 模型验证,设置模型为验证模式,加载验证数据,计算损失函数和准确率
def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


# pytorch mnist
# https://github.com/pytorch/vision/blob/v0.9.0/torchvision/datasets/mnist.py
def get_int(b: bytes) -> int:
    return int(codecs.encode(b, 'hex'), 16)


def open_maybe_compressed_file(path: Union[str, IO]) -> Union[IO, gzip.GzipFile]:
    """Return a file object that possibly decompresses 'path' on the fly.
       Decompression occurs when argument `path` is a string and ends with '.gz' or '.xz'.
    """
    if not isinstance(path, torch._six.string_classes):
        return path
    if path.endswith('.gz'):
        return gzip.open(path, 'rb')
    if path.endswith('.xz'):
        return lzma.open(path, 'rb')
    return open(path, 'rb')


SN3_PASCALVINCENT_TYPEMAP = {
    
    
    8: (torch.uint8, np.uint8, np.uint8),
    9: (torch.int8, np.int8, np.int8),
    11: (torch.int16, np.dtype('>i2'), 'i2'),
    12: (torch.int32, np.dtype('>i4'), 'i4'),
    13: (torch.float32, np.dtype('>f4'), 'f4'),
    14: (torch.float64, np.dtype('>f8'), 'f8')
}


def read_sn3_pascalvincent_tensor(path: Union[str, IO], strict: bool = True) -> torch.Tensor:
    """Read a SN3 file in "Pascal Vincent" format (Lush file 'libidx/idx-io.lsh').
       Argument may be a filename, compressed filename, or file object.
    """
    # read
    with open_maybe_compressed_file(path) as f:
        data = f.read()
    # parse
    magic = get_int(data[0:4])
    nd = magic % 256
    ty = magic // 256
    assert 1 <= nd <= 3
    assert 8 <= ty <= 14
    m = SN3_PASCALVINCENT_TYPEMAP[ty]
    s = [get_int(data[4 * (i + 1): 4 * (i + 2)]) for i in range(nd)]
    parsed = np.frombuffer(data, dtype=m[1], offset=(4 * (nd + 1)))
    assert parsed.shape[0] == np.prod(s) or not strict
    return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


def read_label_file(path: str) -> torch.Tensor:
    with open(path, 'rb') as f:
        x = read_sn3_pascalvincent_tensor(f, strict=False)
    assert(x.dtype == torch.uint8)
    assert(x.ndimension() == 1)
    return x.long()


def read_image_file(path: str) -> torch.Tensor:
    with open(path, 'rb') as f:
        x = read_sn3_pascalvincent_tensor(f, strict=False)
    assert(x.dtype == torch.uint8)
    assert(x.ndimension() == 3)
    return x


def extract_archive(from_path, to_path):
    to_path = os.path.join(to_path, os.path.splitext(os.path.basename(from_path))[0])
    with open(to_path, "wb") as out_f, gzip.GzipFile(from_path) as zip_f:
        out_f.write(zip_f.read())
# --- pytorch mnist
# --- end


# raw mnist 数据处理
def convert_raw_mnist_dataset_to_pytorch_mnist_dataset(data_url):
    """
    raw

    {data_url}/
        train-images-idx3-ubyte.gz
        train-labels-idx1-ubyte.gz
        t10k-images-idx3-ubyte.gz
        t10k-labels-idx1-ubyte.gz

    processed

    {data_url}/
        train-images-idx3-ubyte.gz
        train-labels-idx1-ubyte.gz
        t10k-images-idx3-ubyte.gz
        t10k-labels-idx1-ubyte.gz
        MNIST/raw
            train-images-idx3-ubyte
            train-labels-idx1-ubyte
            t10k-images-idx3-ubyte
            t10k-labels-idx1-ubyte
        MNIST/processed
            training.pt
            test.pt
    """
    resources = [
        "train-images-idx3-ubyte.gz",
        "train-labels-idx1-ubyte.gz",
        "t10k-images-idx3-ubyte.gz",
        "t10k-labels-idx1-ubyte.gz"
    ]

    pytorch_mnist_dataset = os.path.join(data_url, 'MNIST')

    raw_folder = os.path.join(pytorch_mnist_dataset, 'raw')
    processed_folder = os.path.join(pytorch_mnist_dataset, 'processed')

    os.makedirs(raw_folder, exist_ok=True)
    os.makedirs(processed_folder, exist_ok=True)

    print('Processing...')

    for f in resources:
        extract_archive(os.path.join(data_url, f), raw_folder)

    training_set = (
        read_image_file(os.path.join(raw_folder, 'train-images-idx3-ubyte')),
        read_label_file(os.path.join(raw_folder, 'train-labels-idx1-ubyte'))
    )
    test_set = (
        read_image_file(os.path.join(raw_folder, 't10k-images-idx3-ubyte')),
        read_label_file(os.path.join(raw_folder, 't10k-labels-idx1-ubyte'))
    )
    with open(os.path.join(processed_folder, 'training.pt'), 'wb') as f:
        torch.save(training_set, f)
    with open(os.path.join(processed_folder, 'test.pt'), 'wb') as f:
        torch.save(test_set, f)

    print('Done!')


def main():
    # 定义可以接收的训练作业运行参数
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')

    parser.add_argument('--data_url', type=str, default=False,
                        help='mnist dataset path')
    parser.add_argument('--train_url', type=str, default=False,
                        help='mnist model path')

    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=14, metavar='N',
                        help='number of epochs to train (default: 14)')
    parser.add_argument('--lr', type=float, default=1.0, metavar='LR',
                        help='learning rate (default: 1.0)')
    parser.add_argument('--gamma', type=float, default=0.7, metavar='M',
                        help='Learning rate step gamma (default: 0.7)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--dry-run', action='store_true', default=False,
                        help='quickly check a single pass')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')
    parser.add_argument('--save-model', action='store_true', default=True,
                        help='For Saving the current Model')
    args = parser.parse_args()

    use_cuda = not args.no_cuda and torch.cuda.is_available()

    torch.manual_seed(args.seed)

    # 设置使用 GPU 还是 CPU 来运行算法
    device = torch.device("cuda" if use_cuda else "cpu")

    train_kwargs = {
    
    'batch_size': args.batch_size}
    test_kwargs = {
    
    'batch_size': args.test_batch_size}
    if use_cuda:
        cuda_kwargs = {
    
    'num_workers': 1,
                       'pin_memory': True,
                       'shuffle': True}
        train_kwargs.update(cuda_kwargs)
        test_kwargs.update(cuda_kwargs)

    # 定义数据预处理方法
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

    # 将 raw mnist 数据集转换为 pytorch mnist 数据集
    convert_raw_mnist_dataset_to_pytorch_mnist_dataset(args.data_url)

    # 分别创建训练和验证数据集
    dataset1 = datasets.MNIST(args.data_url, train=True, download=False,
                       transform=transform)
    dataset2 = datasets.MNIST(args.data_url, train=False, download=False,
                       transform=transform)

    # 分别构建训练和验证数据迭代器
    train_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

    # 初始化神经网络模型并拷贝模型到计算设备上
    model = Net().to(device)
    # 定义训练优化器和学习率策略,用于梯度下降计算
    optimizer = optim.Adadelta(model.parameters(), lr=args.lr)
    scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)

    # 训练神经网络,每一轮进行一次验证
    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)
        scheduler.step()

    # 保存模型与适配 ModelArts 推理模型包规范
    if args.save_model:

        # 在 train_url 训练参数对应的路径内创建 model 目录
        model_path = os.path.join(args.train_url, 'model')
        os.makedirs(model_path, exist_ok = True)

        # 按 ModelArts 推理模型包规范,保存模型到 model 目录内
        torch.save(model.state_dict(), os.path.join(model_path, 'mnist_cnn.pt'))

        # 拷贝推理代码与配置文件到 model 目录内
        the_path_of_current_file = os.path.dirname(__file__)
        shutil.copyfile(os.path.join(the_path_of_current_file, 'infer/customize_service.py'), os.path.join(model_path, 'customize_service.py'))
        shutil.copyfile(os.path.join(the_path_of_current_file, 'infer/config.json'), os.path.join(model_path, 'config.json'))

if __name__ == '__main__':
    main()

customize_service.py

import os
import log
import json

import torch.nn.functional as F
import torch.nn as nn
import torch
import torchvision.transforms as transforms

import numpy as np
from PIL import Image

from model_service.pytorch_model_service import PTServingBaseService

logger = log.getLogger(__name__)

# 定义模型预处理
infer_transformation = transforms.Compose([
    transforms.Resize(28),
    transforms.CenterCrop(28),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])


class PTVisionService(PTServingBaseService):

    def __init__(self, model_name, model_path):
        # 调用父类构造方法
        super(PTVisionService, self).__init__(model_name, model_path)

        # 调用自定义函数加载模型
        self.model = Mnist(model_path)

        # 加载标签
        self.label = [0,1,2,3,4,5,6,7,8,9]

    def _preprocess(self, data):
        preprocessed_data = {
    
    }
        for k, v in data.items():
            input_batch = []
            for file_name, file_content in v.items():
                with Image.open(file_content) as image1:
                    # 灰度处理
                    image1 = image1.convert("L")
                    if torch.cuda.is_available():
                        input_batch.append(infer_transformation(image1).cuda())
                    else:
                        input_batch.append(infer_transformation(image1))
            input_batch_var = torch.autograd.Variable(torch.stack(input_batch, dim=0), volatile=True)
            print(input_batch_var.shape)
            preprocessed_data[k] = input_batch_var

        return preprocessed_data

    def _postprocess(self, data):
        results = []
        for k, v in data.items():
            result = torch.argmax(v[0])
            result = {
    
    k: self.label[result]}
            results.append(result)
        return results

    def _inference(self, data):

        result = {
    
    }
        for k, v in data.items():
            result[k] = self.model(v)

        return result


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


def Mnist(model_path, **kwargs):
    # 生成网络
    model = Net()

    # 加载模型
    if torch.cuda.is_available():
        device = torch.device('cuda')
        model.load_state_dict(torch.load(model_path, map_location="cuda:0"))
    else:
        device = torch.device('cpu')
        model.load_state_dict(torch.load(model_path, map_location=device))

    # CPU 或者 GPU 映射
    model.to(device)

    # 声明为推理模式
    model.eval()

    return model

config.json

{
    
    
    "model_algorithm": "image_classification",
    "model_type": "PyTorch",
    "runtime": "pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64"
}

4. Create an OBS bucket and upload files

Upload the data and code files used for training, inference code files, and inference configuration files to the OBS bucket. When running a training job on ModelArts, you need to read data and code files from the OBS bucket.

1. Log in to the OBS management console, and create an OBS bucket and folder according to the following example.

{
    
    OBS桶}                     # OBS对象桶,用户可以自定义名称,例如:test-modelarts-xx
      -{
    
    OBS文件夹}          # OBS文件夹,自定义名称,此处举例为pytorch
          - mnist-data      # OBS文件夹,用于存放训练数据集,可以自定义名称,此处举例为mnist-data
          - mnist-code      # OBS文件夹,用于存放训练脚本train.py,可以自定义名称,此处举例为mnist-code
              - infer       # OBS文件夹,用于存放推理脚本customize_service.py和配置文件config.json
          - mnist-output    # OBS文件夹,用于存放训练输出模型,可以自定义名称,此处举例为mnist-output

image-20230629224146224

2. Upload the dataset to the "mnist-data" folder

image-20230629224456536

3. Upload the training script "train.py" to the "mnist-code" folder.

4. Upload the inference script "customize_service.py" and the inference configuration file "config.json" to the "infer" file.

5. Create a training job

1. Log in to the ModelArts management console and select the same region as the OBS bucket.

2. In "Global Configuration", check whether the current account has completed the configuration of access authorization. If not, please refer to Using Delegated Authorization **. ** For users who have previously authorized with an access key, it is recommended to clear the authorization and then use delegation to authorize.

3. In "Training Management" -> "Training Job" in the left navigation bar, click "Create Training Job". Fill in the relevant information for creating a training job.

image-20230629224842540

  • Creation method: Select "Custom Algorithm".
  • Startup method: Select "Preset Framework", select PyTorch in the drop-down box, pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64.
  • Code directory: Select the created code directory path "/test-modelarts-xx/pytorch/mnist-code/".
  • Startup file: Select the training script "train.py" uploaded in the code directory.
  • Input: Click "Add" and set the **"parameter name"** of the training input to "data_url". Set the data storage location to "/test-modelarts-xx/pytorch/mnist-data/".
  • Output: Click "Add" and set the **"parameter name"** of the training output to "train_url". Set the data storage location to "/test-modelarts-xx/pytorch/mnist-output/"
  • Resource type: Select the specification of a single GPU card, such as "GPU: 1*NVIDIA-V100(16GB) | CPU: 8-core 64GB 780GB".

image-20230629225314656

image-20230629225330167

image-20230629225351410

4. Click "Submit" to confirm the parameter information of the training job, and click "OK" after confirmation

5. The page automatically returns to the "Training Job" list page. When the training job status changes to "Completed", the model training process is completed

image-20230629230751828

6. Click the name of the training job to enter the job details interface to view the training job log information, and observe whether there is an obvious Error message in the log. If there is, it means that the training failed. Please locate the cause and solve it according to the log prompts.

7. Click the training output path at the bottom left of the training details page, jump to the OBS directory, and check whether there is a model folder and whether there is a generated training model in the model folder. If the model folder is not generated or the training model is not generated, it may be caused by incomplete training input data. Please check whether the uploaded training data is complete and retrain.

image-20230629230905466

image-20230629230941880

6. Reasoning Deployment

After the model training is complete, AI applications can be created and deployed as online services.

1. In the ModelArts management console, click "AI Application Management>AI Application" in the left navigation bar, enter the "My AI Application" page, and click "Create".

2. On the "Create AI Application" page, fill in the relevant parameters, and then click "Create Now".

image-20230629231226310

On the AI ​​application list page, when the AI ​​application status changes to "Normal", it means that the AI ​​application is created successfully.

image-20230629231336014

3. Click the small triangle to the left of the AI ​​application name to open all versions under this AI application. In the row of the corresponding version, click Deploy > Online Service in the operation column to deploy the AI ​​application as an online service.

image-20230629231358259

4. On the "Deployment" page, fill in the parameters with reference to the figure below, and then follow the interface prompts to complete the online service creation. This case applies to the CPU specification, and the node specification needs to select the CPU

image-20230629231456663

5. After completing the service deployment, return to the online service page list page and wait for the service deployment to complete. When the service status is displayed as "Running", it means that the service has been successfully deployed.

image-20230629231845274

7. Forecast results

1. On the "Online Services" page, click the online service name to enter the service details page.

2. Click the "Prediction" tab, select "multipart/form-data" as the request type, fill in "image" as the request parameter, click the "Upload" button to upload a sample image, and then click "Prediction".

image-20230629232128854

8. Clear resources

If you no longer need to use this model and online services, it is recommended to clear related resources to avoid unnecessary costs.

On the "Online Services" page, "Stop" or "Delete" the online service you just created.

On the "AI Application Management" page, "Delete" the newly created AI application.

On the "Training Jobs" page, "Delete" the finished training job.

Enter OBS, delete the OBS bucket and folder used in this example, and the files in the folder.
” page, click Create.

Guess you like

Origin blog.csdn.net/qq_61228493/article/details/131466675