NVIDIA-GPU driver installation

The following takes GeForce RTX 2080TI GPU as an example, and installs the driver based on ubuntu18.04.

1. Download the driver

Driver download link: Official driver | NVIDIA

According to the GPU model, make a selection in the following drop-down list, and then click the search button:

2. Install the driver

1) Uninstall the old version driver

sudo apt --purge remove nvidia*
sudo apt autoremove
sudo apt --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "*nvidia*"

2) disable nouveau

(1) Open the blacklist.conf file

sudo vi /etc/modprobe.d/blacklist.conf

(2) Add the following content in the last line and save

blacklist nouveau

(3) Update the linux kernel

sudo update-initramfs -u

 (4) Restart the operating system

sudo reboot

(5) Query whether the disabling is effective

lsmod | grep nouveau

If it is in effect, there is no output

3) Install the driver

Upload the downloaded driver to any user directory of the Linux operating system, for example, the driver

Upload NVIDIA-Linux-x86_64-470.63.01.run to the ~/software directory and enter it:

(1) Give executable permission

chmod +x NVIDIA-Linux-x86_64-470.63.01.run

(2) installation

sudo ./NVIDIA-Linux-x86_64-470.63.01.run --no-opengl-files

Install the default options and press Enter.

4) Use of nvidia-smi

nvidia-sim, referred to as NVSMI, provides functions to monitor NVIDIA GPU usage and change GPU status.

(1) nvidia-smi

Enter nvidia-smi directly in the shell terminal to display the current GPU status, as shown in the following figure:

Table parameter introduction:

· GPU: The number of the GPU in this machine (when there are multiple graphics cards, the number starts from 0). The number of the GPU on the picture is 0

Fan: fan speed (0%-100%), N/A means no fan

Name: GPU type, the type of GPU on the picture is: GeForce 2080TI

Temp: GPU temperature (GPU temperature is too high will cause GPU frequency to drop)

· Perf: The performance state of the GPU, from P0 (maximum performance) to P12 (minimum performance), the figure is: P0

· Persistence-M: The state of the persistent mode. Although the persistent mode consumes a lot of energy, it takes less time to start a new GPU application. The figure shows: off

Pwr: Usage/Cap: energy consumption display, Usage: how much is used, how much is the total Cap

· Bus-Id: GPU bus related display, domain: bus: device.function

· Disp.A: Display Active, indicating whether the display of the GPU is initialized

· Memory-Usage: memory usage

· Volatile GPU-Util: GPU usage

Uncorr. ECC: About ECC, whether to enable error checking and correction technology, 0/disabled, 1/enabled

· Compute M: computing mode, 0/DEFAULT, 1/EXCLUSIVE_PROCESS, 2/PROHIBITED

Processes: Displays the video memory usage, process number, and which GPU is occupied by each process

(2) nvidia-smi –l seconds

Add the -l option to control the refresh time of the GPU status display. For example, to refresh the GPU status every 1s, the command is:

nvidia-smi -l 1

(3) Save GPU monitoring results

To save the GPU state to the report.csv file, the command is as follows:

nvidia-smi -l 1 \
           --format=csv \
           --filename=report.csv \
           --query-gpu=timestamp,\
                       name,index,utilization.gpu,\
                       memory.total,memory.used,power.draw

Parameter explanation: ·  

l: how often to record, the command is written as 1

· --format: the result record file format is csv

· --filename: The name of the result log file

· --query-gpu: record which data to csv file

· timestamp: timestamp

· memory.total: memory size

· memory.total: How much video memory is used

· utilization.gpu: GPU usage

· power.draw: memory power consumption, corresponding to Pwr: Usage

The above are several commonly used commands for GPU status monitoring. If you need to know other uses, you can check them with the following commands:

nvidia-sim -h

3. Uninstall the driver

Execute the following uninstall command:

sudo apt --purge remove nvidia*
sudo apt autoremove
sudo apt --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "*nvidia*"

Enter nvidia-smi, if there is still Nvidia driver information, it means that the uninstallation is not successful. Now enter the directory where NVIDIA-Linux-x86_64-470.63.01.run is located, and execute:

sudo ./NVIDIA-Linux-x86_64-470.63.01.run --uninstall

Then just follow the prompts.

Guess you like

Origin blog.csdn.net/weicao1990/article/details/127632282