The following takes GeForce RTX 2080TI GPU as an example, and installs the driver based on ubuntu18.04.
1. Download the driver
Driver download link: Official driver | NVIDIA
According to the GPU model, make a selection in the following drop-down list, and then click the search button:
2. Install the driver
1) Uninstall the old version driver
sudo apt --purge remove nvidia*
sudo apt autoremove
sudo apt --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "*nvidia*"
2) disable nouveau
(1) Open the blacklist.conf file
sudo vi /etc/modprobe.d/blacklist.conf
(2) Add the following content in the last line and save
blacklist nouveau
(3) Update the linux kernel
sudo update-initramfs -u
(4) Restart the operating system
sudo reboot
(5) Query whether the disabling is effective
lsmod | grep nouveau
If it is in effect, there is no output
3) Install the driver
Upload the downloaded driver to any user directory of the Linux operating system, for example, the driver
Upload NVIDIA-Linux-x86_64-470.63.01.run to the ~/software directory and enter it:
(1) Give executable permission
chmod +x NVIDIA-Linux-x86_64-470.63.01.run
(2) installation
sudo ./NVIDIA-Linux-x86_64-470.63.01.run --no-opengl-files
Install the default options and press Enter.
4) Use of nvidia-smi
nvidia-sim, referred to as NVSMI, provides functions to monitor NVIDIA GPU usage and change GPU status.
(1) nvidia-smi
Enter nvidia-smi directly in the shell terminal to display the current GPU status, as shown in the following figure:
Table parameter introduction:
· GPU: The number of the GPU in this machine (when there are multiple graphics cards, the number starts from 0). The number of the GPU on the picture is 0
Fan: fan speed (0%-100%), N/A means no fan
Name: GPU type, the type of GPU on the picture is: GeForce 2080TI
Temp: GPU temperature (GPU temperature is too high will cause GPU frequency to drop)
· Perf: The performance state of the GPU, from P0 (maximum performance) to P12 (minimum performance), the figure is: P0
· Persistence-M: The state of the persistent mode. Although the persistent mode consumes a lot of energy, it takes less time to start a new GPU application. The figure shows: off
Pwr: Usage/Cap: energy consumption display, Usage: how much is used, how much is the total Cap
· Bus-Id: GPU bus related display, domain: bus: device.function
· Disp.A: Display Active, indicating whether the display of the GPU is initialized
· Memory-Usage: memory usage
· Volatile GPU-Util: GPU usage
Uncorr. ECC: About ECC, whether to enable error checking and correction technology, 0/disabled, 1/enabled
· Compute M: computing mode, 0/DEFAULT, 1/EXCLUSIVE_PROCESS, 2/PROHIBITED
Processes: Displays the video memory usage, process number, and which GPU is occupied by each process
(2) nvidia-smi –l seconds
Add the -l option to control the refresh time of the GPU status display. For example, to refresh the GPU status every 1s, the command is:
nvidia-smi -l 1
(3) Save GPU monitoring results
To save the GPU state to the report.csv file, the command is as follows:
nvidia-smi -l 1 \
--format=csv \
--filename=report.csv \
--query-gpu=timestamp,\
name,index,utilization.gpu,\
memory.total,memory.used,power.draw
Parameter explanation: ·
l: how often to record, the command is written as 1
· --format: the result record file format is csv
· --filename: The name of the result log file
· --query-gpu: record which data to csv file
· timestamp: timestamp
· memory.total: memory size
· memory.total: How much video memory is used
· utilization.gpu: GPU usage
· power.draw: memory power consumption, corresponding to Pwr: Usage
The above are several commonly used commands for GPU status monitoring. If you need to know other uses, you can check them with the following commands:
nvidia-sim -h
3. Uninstall the driver
Execute the following uninstall command:
sudo apt --purge remove nvidia*
sudo apt autoremove
sudo apt --purge remove "*cublas*" "cuda*"
sudo apt --purge remove "*nvidia*"
Enter nvidia-smi, if there is still Nvidia driver information, it means that the uninstallation is not successful. Now enter the directory where NVIDIA-Linux-x86_64-470.63.01.run is located, and execute:
sudo ./NVIDIA-Linux-x86_64-470.63.01.run --uninstall
Then just follow the prompts.