Compile PyTorch based on CUDA11.8 for Galaxy Kirin V10 system under ARM architecture

overview

The company wants to try to use the ChatGLM model locally in the near future . Only the server has two graphics cards installed, so it can only be tried on the server. CUDA driver and so on, the previous colleague has already installed it, and successfully recognized the graphics card, so I skip it. Follow the steps in the GIT README, everything went well, and finally received the following prompt when running the script

RuntimeError: Not compiled with CUDA support

Because the server is ARM (CPU is Phytium ST2500). Literally, the officially compiled aarch64 version of PyTorch does not enable CUDA support. I initially suspected that my installation was wrong. After checking Baidu, google, and bing, I found that other installation methods had to rely on the tool conda, and then installed Miniconda3, Anaconda3, etc. After a short operation, I found that their aarch64 version and servers had compatibility problems. , I have no choice but to give up and compile myself.

Miniconda3 problem

Miniconda3 will now be installed into this location:
/root/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/miniconda3] >>> 
PREFIX=/root/miniconda3
Unpacking payload ...
Miniconda3-latest-Linux-aarch64.sh:行 358: 10241 非法指令            (核心已转储)"$CONDA_EXEC" constructor --prefix "$PREFIX" --extract-conda-pkgs

Anaconda3 problem

Anaconda3 will now be installed into this location:
/root/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/anaconda3] >>> /data1/anaconda3
PREFIX=/data1/anaconda3
Unpacking payload ...
Anaconda3-2023.03-1-Linux-aarch64.sh:行 353: 60027 非法指令            (核心已转储)"$CONDA_EXEC" constructor --prefix "$PREFIX" --extract-conda-pkgs

Compilation steps

1. CUDA 11.8 only supports GCC-10, you need to install GCC-10 first, and the Kirin version source is only 7.3, so refer to the blog post to compile GCC-10 yourself
2. Download PyTorch source code

git clone https://github.com/pytorch/pytorch.git

3. Compile the source code and install

#因为ChatGLM在Python 3.10版本运行成功,所以用3.10来编译,具体根据自己环境调整
cd pytorch
python3.10 setup.py build
python3.10 setup.py install

Q&A

Q: After installing GCC 7.3.0, an error will be reported when compiling PyTorch. I forgot the specific error reported, probably because of a grammatical error.
A: Use GCC 10 instead

Q: When compiling GCC 10 for the first time, when compiling PyTorch, I received the following error

**/libgfortran.a ... which may bind externally can not be used when making a shared object; recompile with -fPIC

A: Later, when compiling GCC 10 and executing the configure script, add CFLAGS="-fPIC". After recompiling GCC 10, PyTorch compiles successfully.
Note: The above scheme of setting CFLAGS directly should not be reasonable, and may cause other problems (such as overwriting the original CFLAGS, resulting in insufficient program optimization). The specific details are not considered in-depth discussion in this article.

Q: The aarch64 program encounters the problem of illegal instruction set
A: Not only one program encountered it, but also encountered it when the Clickhouse (hereinafter referred to as CH) was implemented before. The root cause should be that the instruction sets supported by aarch64 on different platforms are different, so aarch64 programs may not be 100% universal.
When compiling CH before, the company’s local desktop CPU is Phytium FT1500+, and the server is Phytium FT2000+. The CH program compiled by FT1500+ machine will report illegal instructions when it is moved to FT2000+ server to run, but not all programs.

reference link

Ubuntu18.04——Switch gcc version/error solution: error – unsupported GNU version gcc later than 10 are not supported
Solve the problem that Cuda cannot be accelerated by GPU in Pytorch
[C++] Compile and install gcc10

Guess you like

Origin blog.csdn.net/qq_38189542/article/details/130683020