Problems Encountered by Xidian High Performance Computer Platform

0. If the Gaosuan platform is connected to the Internet, just paste the code in the Installation he gave. But if you can't connect to the Internet, you need to transfer the prepared files on your computer to the platform for reuse.
It also involves that the code in Installation is a Linux script, so the windows system also needs a method to execute it. It is too slow to execute wget or git download under the Linux system, and a Linux terminal is required to bypass the wall.

 

1.  Use the source conda file in the manual to initialize the conda environment, and the command vim .bashrc can be opened and edited in the terminal editor.

Execute conda init bash , write the conda initialization code in the last file (home/19200300131/.bashrc), and you will find that there is an environment name (such as base) before the user name.

In addition, conda has changed to Tsinghua source by default. If you use pip to install it, you need to use pip to change the source, which is included in the manual.

 

2. Create a new environment called old_pic and activate the new environment: conda activate old_pic .
There is dlib in the dependency, which requires cmake and glabelc6.2 6.9, and the default version of supercomputing is relatively low, so it needs to be activated every time.
Install requirement, still missing cmake, gcc and cmake to compile dlib.

Execute the following:
module load gcc/12.1.0
module load cmake/3.25.1

and finally install requirement.txt:
pip install -r requirement.txt

 
3. When submitting assignments

Before submitting the job, the conda init part should be reloaded, otherwise the conda activitie will not be executed later, because the environment where the code is submitted for execution is not executed as an individual user, but a public environment, which will not automatically source.

Execute the following: 

The first two sentences activate conda. The third sentence is for dlib, and dlib requires C++, and gcc is used to support C++11

Another: All user names in the script cannot be written~ (the wavy line is the user root directory), and it will go to the user root directory. It can only go to the currently running user directory, and it is in the public directory when running. Write down the specific path.

  

4. Without closing the xshell window, you don’t need to run things for a long time. To debug the code in the terminal, you can execute:

jsub -q gpu sleep 5000

jjobs (get job submitted to a gpu**)

ssh gpu**

Switch the operating environment to gpu** ( nvidia-smi command can view the details of the gpu, pick a low usage rate to use)

Another: Every time you ssh, the environment changes back to base, and the root directory changes back to the user root directory (~) from the code directory:

So you need to cd to go back to the directory of the code to work:

 Summary: If you want to see the results in real time, use ssh to log in and run. If you want to train overnight, submit the script with a job.

To read the error report: cat error. job number

Guess you like

Origin blog.csdn.net/qq_45790998/article/details/128944732