PyTorch-03-CUDA Explanation: Why does deep learning use GPU?

Why deep learning and neural networks use GPUs

The purpose of this article is to help beginners understand what CUDA is and how it works with PyTorch, and more importantly, why we still use GPUs in neural network programming.

Insert picture description here

Graphics Processing Unit (GPU)

To understand CUDA, we need to have a certain understanding of the graphics processing unit (GPU). GPU is a processor that is good at processing specialized calculations.

This is the opposite of the central processing unit (CPU), which is a processor that is good at processing general calculations. The CPU is the processor that supports most of the typical calculations on our electronic devices.

The calculation speed of GPU may be much faster than CPU. However, this is not always the case. The speed of the GPU relative to the CPU depends on the type of calculation performed. The most suitable type of calculation for GPU is the calculation that can be done in parallel.

Parallel Computing

Parallel computing is a type of calculation in which specific calculations are broken down into independent smaller calculations that can be executed simultaneously. The resulting calculations are then recombined or synchronized to form the result of the original larger calculation.
Insert picture description here
The number of tasks that can be broken down for larger tasks depends on the number of cores included on the particular hardware. A core is a unit that actually performs calculations in a given processor. A CPU usually has four, eight or sixteen cores, while a GPU may have thousands.

There are other important technical specifications, but this description aims to advance the overall idea.

Armed with this working knowledge, we can conclude that parallel computing is done using GPU, and we can also conclude that the most suitable tasks to be solved using GPU are tasks that can be done in parallel. If calculations can be performed in parallel, we can use parallel programming methods and GPUs to speed up the calculations.

Neural network: parallel computing

Let us now turn our attention to neural networks and see why GPUs are used a lot in deep learning. We just saw that GPUs are very suitable for parallel computing, and the fact about GPUs is why deep learning uses GPUs.

In parallel computing, a parallel task is a task that requires little or no effort to divide the entire task into a group of smaller tasks to be calculated in parallel. Tasks that are executed in parallel are easy to see a set of smaller tasks independent of each other.

Insert picture description here

Therefore, neural networks are embarrassingly parallel. Many of the calculations we perform using neural networks can be easily broken down into smaller calculations so that a set of smaller calculations do not depend on each other. Convolution is one such example.

Convolution example

Let us look at an example, convolution operation:

Insert picture description here

This animation shows the convolution process without numbers. We have a blue input channel at the bottom. The shadow at the bottom of the convolution filter, sliding between the input channel and the green output channel:

  1. Blue (bottom)-input channel shadow
  2. Blue above-3 x 3 convolution filter green
  3. Top-output channel

For each position on the blue input channel, the 3 x 3 filter is calculated to map the shaded part of the blue input channel to the corresponding shaded part of the green output channel.

In the animation, these calculations are performed sequentially one after another. However, each calculation is independent of other calculations, which means that any calculation does not depend on the results of any other calculations.

As a result, all these independent calculations can occur in parallel on the GPU, and the entire output channel can be generated. This allows us to see that convolution operations can be accelerated by using parallel programming methods and GPUs.

Nvidia hardware (GPU) and software (CUDA)

This is where CUDA appears. Nvidia is a technology company that designs GPUs. They have created CUDA as a software platform paired with GPU hardware, making it easier for developers to build software that uses the parallel processing capabilities of Nvidia GPUs to accelerate computing.

Insert picture description here
Nvidia GPU is hardware that supports parallel computing, and CUDA is a software layer that provides APIs for developers.

As a result, you may have guessed that to use CUDA requires an Nvidia GPU, and you can download and install CUDA for free from the Nvidia website.

Developers use CUDA by downloading the CUDA toolkit. The toolkit comes with specialized libraries, such as cuDNN, CUDA deep neural network library.

Insert picture description here

PyTorch comes with CUDA

One of the benefits of using PyTorch or any other neural network API is the introduction of parallelism into the API. This means that as neural network programmers, we can focus more on building neural networks and focus on performance issues.

With PyTorch, CUDA was born from the beginning. No other downloads are required. What we need is to have a supported Nvidia GPU, and we can use PyTorch to take advantage of CUDA. We don't need to use CUDA API directly.

Now, if we want to work with the PyTorch core development team or write PyTorch extensions, it may be useful to understand how to use CUDA directly.

After all, PyTorch is written with all the following code:

  1. Python
  2. C++
  3. MIRACLES

Use CUDA in PyTorch

In PyTorch, using CUDA is very easy. If we want to perform specific calculations on the GPU, we can instruct PyTorch to do so by calling cuda() on the data structure (tensor).

Suppose we have the following code:

> t = torch.tensor([1,2,3])
> t
tensor([1, 2, 3])

By default, tensor objects created in this way are on the CPU. As a result, any operations we perform using this tensor object will be executed on the CPU.

Now, to move the tensor to the GPU, we only need to write:

> t = t.cuda()
> t
tensor([1, 2, 3], device='cuda:0')

Because it can be calculated selectively on the CPU or GPU, PyTorch is very versatile.

GPU may be slower than CPU

We said that we can selectively run our calculations on the GPU or the CPU, but why not just run each calculation on the GPU?

Isn’t the GPU faster than the CPU?

The answer is that GPUs are only faster for specific (dedicated) tasks. One problem we may encounter is a bottleneck, which will reduce our performance. For example, the cost of transferring data from the CPU to the GPU is high, so in this case, if the computing task is simple, the overall performance may slow down.

Insert picture description here
Moving relatively small computing tasks to the GPU will not speed up our pace, and may even slow our pace. Keep in mind that the GPU can handle tasks that can be broken down into many smaller tasks very well, and if the computing task is already small, moving the task to the GPU will not bring much benefit.

Therefore, it is generally acceptable to use the CPU at the beginning, and as we deal with larger and more complex problems, please start using the GPU heavily.

GPGPU computing

Initially, the main task of using GPU acceleration was computer graphics. Therefore, it was named "graphics processing unit", but in recent years, more types of parallel tasks have appeared. One task we have seen is deep learning.

Deep learning, along with many other scientific computing tasks that use parallel programming techniques, has led to a new type of programming model called GPGPU or general-purpose GPU computing.

Now, it has become more and more common to perform various tasks on the GPU, and GPGPU computing is often referred to as GPU computing or accelerated computing.

Nvidia has always been a pioneer in this field. Nvidia referred to general-purpose GPU computing as GPU computing for short. Nvidia CEO Jensen Huang has long thought of GPU computing, which is why CUDA was founded nearly ten years ago.

Although CUDA has been around for a long time, it has only just begun to become popular, and Nvidia's work on CUDA has been the reason why Nvidia has led the field of deep learning GPU computing until now.

When we heard Jensen talking about the GPU computing stack, he was referring to the GPU as the underlying hardware, CUDA as the software architecture on top of the GPU, and finally libraries like cuDNN.

This GPU computing stack supports general computing functions on other professional chips. In computer science, we often see such stacks, because the technology is built in layers, just like neural networks.

PyTorch sits on top of CUDA and cuDNN, this is the framework we will strive to eventually support top applications.

This article delves into GPU computing and CUDA, but it exceeds our needs. We will use PyTorch to work near the top of the stack.

Insert picture description here

Tensors Are Up Next

We are now ready to jump into the second part of this neural network programming series, which is all about tensors.
Insert picture description here

We are now ready to jump into the second part of this neural network programming series, which is all about tensors. I hope you found this article useful. Now we should have a good understanding of why GPUs are used for neural network programming. In the first part of this series, we will use the CPU. We are now ready to start using Torch.Tensors and build our first neural network.

Guess you like

Origin blog.csdn.net/weixin_48367136/article/details/111195615