Nvidia related instructions on Linux server

1. GPU-driven memory resident mode

1) Operation command:

  1. Make sure you have root or sudo privileges to execute the commands below.
  2. Open a terminal or command line interface.
  3. Run the following command to set the memory-resident mode of the GPU driver:
nvidia-smi -pm 1

This sets the GPU driver into memory-resident mode.
4. Verify that the setup was successful. Run the following command:

nvidia-smi

This will display the status information of the GPU. In the output, you should see the value "Enabled" for "Persistence Mode".

       Note that the above commands are based on official NVIDIA drivers. The commands may vary if you are using a third-party driver or a different version of the driver. Before executing the above commands, it is recommended to refer to the NVIDIA driver's documentation or help resources for the correct commands and options for your driver version.

       In addition, it should be noted that enabling the memory resident mode of the GPU driver will continue to occupy system resources and increase energy consumption. Therefore, it is recommended to disable the resident mode when you are done using it to save resources and energy if you no longer need it. The GPU driver can be set to non-resident mode with the following command:

nvidia-smi -pm 0

This will disable the memory resident mode of the GPU driver.

2) Analysis of advantages and disadvantages:

Advantages:
        Quick response to tasks : After the memory resident mode is turned on, the GPU driver will remain loaded and can immediately respond to new computing tasks without reloading the driver. This reduces startup and load times and improves task responsiveness.
        Reduce resource overhead : In the memory resident mode, the GPU driver will maintain the allocated video memory and context state, avoiding the overhead of repeatedly allocating and releasing video memory. This is especially useful for frequently executing computing tasks or long-running applications, which can reduce resource waste.

Disadvantages:
       Increased energy consumption : Turning on the memory resident mode will keep the GPU driver loaded all the time, which will cause the graphics card to continuously consume power and increase energy consumption even when it is idle or under light load.
       System stability risk : Long-running applications may cause the GPU driver to be loaded and continue to occupy system resources, which may increase system stability risks. In the event of driver issues or crashes, a system reboot may be required to restore normalcy.
       To sum up, enabling the GPU-driven memory resident mode can improve task response speed and resource utilization efficiency, but it needs to weigh energy consumption and system stability risks. Whether to enable the memory resident mode should be determined according to specific usage scenarios and requirements.

2. Restart the GPU graphics driver separately

Under CentOS system, you can try to restart the GPU graphics card driver alone without restarting the whole server. This can be achieved by following steps:

  1. Make sure you have root or sudo privileges to execute the commands below.
  2. Stop the application or service that is using the GPU to ensure that no running processes are using the GPU.
  3. Uninstall the current NVIDIA driver. You can use the following command:
sudo nvidia-uninstall

This will uninstall the currently installed NVIDIA drivers.
4. Reload the NVIDIA kernel module. Use the following command:

sudo modprobe nvidia

This will reload the NVIDIA kernel module, effectively restarting the GPU driver.
5. Check if the driver loaded successfully. Use the following command:

lsmod | grep nvidia

If nvidia-related modules are displayed in the output, the driver is loaded successfully.

Note that although you don't need to restart the entire server, this method is not a solution for all situations. In some cases, it may still be necessary to restart the server to ensure that the drivers are loaded and configured correctly. Also, if you did a new driver installation or system update, it may be safer and more reliable to reboot the server.

Guess you like

Origin blog.csdn.net/anonymous_me/article/details/130720154