Configure pytorch (gpu) analysis environment

Pytorch is currently one of the hottest deep learning frameworks, the other being TensorFlow. However, I have been using the CPU version before. I bought a 3070Ti notebook a few months ago (yes, I bought a 30 series when the 40 series graphics card came out, which is really hard to say), and I also have an M1. The chip Macbook Pro currently also supports pytorch GPU acceleration, so I thought, install Pytorch on these two computers, and learn deep learning shallowly.

Apple silicon

The first is the M1 chip, which is very simple. First install a conda, which is just a built-in mamba package manager, add conda-forge channel, arm64 version.

# 下载
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-MacOSX-arm64.sh
# 安装
bash Mambaforge-MacOSX-arm64.sh

Then we create an environment with mamba, using the development version of pytorch, so the channel specifies pytorch-nightly

mamba create -n pytorch \
   jupyterlab jupyterhub pytorch torchvision torchaudio 
   -c pytorch-nightly

Finally, use conda activate pytorch, and then test whether the GPU is correctly identified

import torch
torch.has_mps
# True
# 配置device
device = torch.device("mps")

References: https://developer.apple.com/metal/pytorch/

Windows NVIDIA

First of all, you need to make sure that your computer is installed with an NVIDIA graphics card and has the corresponding CUDA driver.

CUDA graphics card architecture requirements: https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html

The new generation of computers basically come with CUDA drivers. You can open the system information in the NVIDIA control panel

insert image description here

Check the CUDA driver you have installed in the component, for example, mine is 11.7.89.

insert image description here

You can also view it from the command line,

insert image description here

Next, let's install pytorch. It is also recommended conda method, we first download Miniconda from Tsinghua mirror source.

insert image description here

Select the Windows installation package

insert image description here

After the installation is complete, we can enter the command line through Anaconda Prompt and install it according to the recommendation on the pytorch website.

insert image description here

But there is a difference. In order to avoid environment conflicts, it is best to create a separate environment, so the code is as follows

conda create -n pytorch pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Then use to conda activate pytorchstart the environment, and then test in the python environment

import torch
torch.has_cuda
# True

A few common questions (at least the ones I thought of while writing the article):

Q: What is the difference between using conda and pip to install?

A: conda is the official recommended installation method for pytorch, because conda will help you install the CUDA driver and related tool sets required for pytorch to run. This means that the space occupied by conda will be a little bit more.

Q: Can the latest pytorch be installed on very old hardware?

A: I think this is similar to installing a game. Although you can install the game, you still can’t run it if you don’t meet the minimum configuration requirements of the game.

Q: Do I have to install CUDA driver and CUDA toolkit on my computer?

A: Actually, I am personally not sure how to answer, the following are some of my current insights. If you use conda, then he will help you solve some dependency problems. If you are using pip, then you need to configure it yourself. Among them, the CUDA driver must be installed, because the CUDA driver is responsible for connecting the GPU hardware with the computer operating system. If the driver is not installed, the operating system will not recognize the CUDA core, which means that you do not have an NVIDIA graphics card installed. The CUDA toolkit is a collection of various development tools that are convenient for us to call the CUDA core. When you install the CUDA toolkit, you will also install the CUDA driver. Unless you want to do low-level development, or you need to compile a pytorch from source code, we don't need to install CUDA toolkit.

Q: What if the CUDA driver version on my computer is older? Or my CUDA driver version is 11.7, but I installed pytorch with cuda=11.8, or what happens to pytorch with different versions?

A: We are installing pytorch with cuda=11.7. Essentially, we are installing the pytorch version compiled in the CUDA Toolkit version 11.7 environment. When the difference between cuda versions is not particularly large, or if it is not a destructive upgrade, then it can run theoretically.

Handwritten data performance testing

The following uses GPT3.5 to provide me with a handwritten character recognition (MNIST) case code to test the speed of different platforms.

import torch
import torchvision
import torchvision.transforms as transforms

# 转换数据格式并且加载数据
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5,), (0.5,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

# 定义网络模型
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.pool = torch.nn.MaxPool2d(2, 2)
        self.conv2 = torch.nn.Conv2d(6, 16, 5)
        self.fc1 = torch.nn.Linear(16 * 4 * 4, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.nn.functional.relu(self.conv1(x)))
        x = self.pool(torch.nn.functional.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = torch.nn.functional.relu(self.fc1(x))
        x = torch.nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

# 这里的代码比较随意,就是用哪个平台运行哪个
# CPU
device = torch.device("cpu")
# CUDA
device = torch.device("cuda:0")
# MPS
device = torch.device("mps")

net.to(device)

# 定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 训练网络

import time

start_time = time.time()  # 记录开始时间

for epoch in range(10):  # 进行10次迭代训练
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 100))
            running_loss = 0.0

end_time = time.time()  # 记录结束时间
training_time = end_time - start_time  # 计算训练时间

print('Training took %.2f seconds.' % training_time)

print('Finished Training')

# 测试网络
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
      ))

The final statistical result is

Windows platform

  • 3070Ti Training took 45.02 seconds.
  • i9 12900H Training took CPU 75.65

Mac platform

  • M1 Max Training took 50.79 seconds.
  • M1 MAX CPU Training took 109.61 seconds.

Overall, GPU acceleration is noticeable, both on mac and on windows. The second is the comparison of GPU acceleration effects. M1 max chip is 10% worse than 3070Ti? .

However, the current tests are all small data sets. After I study for a while, I will try the effect of large data sets.

Guess you like

Origin blog.csdn.net/u012110870/article/details/129966685