How to speed up Numpy 700 times? Yeah with CuPy

How to speed up Numpy 700 times? Yeah with CuPy

As an extension library Python language, Numpy support a large number of dimensions of the array and matrix operations, it brought a lot of help for the Python community. By means Numpy, scientists data, machine learning practitioners and statisticians can handle large amounts of data in a matrix of simple and efficient way. So Numpy speed but also enhance it? This article describes how to use the library to accelerate CuPy Numpy processing speed.

Selected from towardsdatascience, Author: Compile Heart George Seif, the machines involved: Turre, Zhang Qian.

On its own terms, Numpy than Python's speed has been greatly improved. When you find that Python code runs more slowly, especially in a large number of for-loops loop occurs, the data can be processed normally and realize its processing into Numpy to quantify the maximum speed.

But one thing, but to accelerate the realization of the above Numpy on the CPU. Since the consumer is usually only CPU core 8 or less, the number of parallel processing can be achieved, and the acceleration is limited.

This gave birth to the new accelerated tool --CuPy library.

What is CuPy?

img

With CUDA GPU CuPy is a library to library Numpy array on NVIDIA GPU. Achieve Numpy array-based, GPU itself has more CUDA cores can contribute to a better parallel speedup.

CuPy Numpy interface is a mirror image of, and in most cases, it can be directly replaced Numpy use. As long as compatible with the code replacement CuPy Numpy code, the user can realize accelerated GPU.

CuPy supports most Numpy array operations, including indexing, radio, mathematics, and an array of various matrix transformations.

If you encounter some special cases that are not supported, users can also write custom Python code that will use the CUDA and GPU acceleration. The whole process takes only a short code for C ++ format, and then CuPy GPU can be automatically converted, which is very similar to the use of Cython.

Before you start using CuPy, users can install CuPy library by pip:

pip install cupy

Use CuPy run on the GPU

To comply with the appropriate benchmark, PC configuration is as follows:

  • i7–8700k CPU
  • Ti 1080 GPU
  • 32 GB of DDR4 3000MHz RAM
  • CUDA 9.0

CuPy After installation, the user can import as CuPy Numpy introduced as:

import numpy as np
import cupy as cp
import time

In the following coding, and the switching between Numpy CuPy as simple replacement with np cp Numpy of the CuPy. Numpy and CuPy the following code to create a 3D array of the 1 billion 1 "s are. In order to create an array of measurement speed, users can use native Python library time:

### Numpy and CPU
s = time.time()
*x_cpu = np.ones((1000,1000,1000))*
e = time.time()
print(e - s)### CuPy and GPU
s = time.time()
*x_gpu = cp.ones((1000,1000,1000))*
e = time.time()
print(e - s)

this is very simple!

Incredibly, even more than just create an array, CuPy the speed is still much faster. Numpy create an array 1000000000 1 "s has spent 1.68 seconds, while CuPy in just 0.16 seconds, achieved 10.5 times acceleration.

But CuPy can do more than that.

For example, do some math in the array. The entire array is multiplied by 5 and check the speed Numpy and CuPy again.

### Numpy and CPU
s = time.time()
*x_cpu *= 5*
e = time.time()
print(e - s)### CuPy and GPU
s = time.time()
*x_gpu *= 5*
e = time.time()
print(e - s)

Sure, CuPy again better than Numpy. Numpy took 0.507 seconds, and took only 0.000710 seconds CuPy, 714.1-fold improved full speed.

Now try using more arrays and perform the following three operations:

  1. Array multiplied by 5
  2. Multiplying array itself
  3. Adding to the array itself
### Numpy and CPU
s = time.time()
*x_cpu *= 5
x_cpu *= x_cpu
x_cpu += x_cpu*
e = time.time()
print(e - s)### CuPy and GPU
s = time.time()
*x_gpu *= 5
x_gpu *= x_gpu
x_gpu += x_gpu*
e = time.time()
print(e - s)

The results show, Numpy perform the entire calculation process took 1.49 seconds on the CPU, while the CuPy only 0.0922 seconds on the GPU, a 16.16-fold faster.

Array size (data points) to reach 10 million, greatly enhance the computing speed

It can be achieved using CuPy Numpy and matrix operations on the GPU acceleration times. It is worth noting that the user can achieve acceleration is highly dependent on the size of the array being processed itself. The following table shows the acceleration array size differences (data points):

img

Upon reaching 10 million data points, the speed will suddenly lift; more than 100 million, a very significant speed increase. Numpy actually run faster when less than 10 million data points. In addition, GPU more memory, more data processing it. Therefore, users should pay attention to whether GPU memory is sufficient to meet the data CuPy need to be addressed.

Guess you like

Origin www.cnblogs.com/TMesh/p/11740957.html