Convenient, fast, stable and high-performance! Using GPU instances to demonstrate Alibaba Cloud Linux 3’s support for the AI ecosystem | Dragon Lizard Technology

Editor's note : Recently, Alibaba Cloud Linux 3 has provided some optimization and upgrades to make the AI ​​development experience more efficient. This article is a preview of the series of articles "Introduction to AI Capabilities of Alibaba Cloud Linux 3". It takes a GPU instance as an example to demonstrate Alibaba Cloud. Linux 3 supports the AI ​​ecosystem. Next, two series of articles will be published, mainly introducing the cloud market image based on Alinux to provide users with an out-of-box AI basic software environment, and introducing the differentiation of AI capabilities based on AMD. Stay tuned. For more information about Alibaba Cloud Linux 3, please go to the official website: https://www.aliyun.com/product/ecs/alinux

picture

When developing artificial intelligence (AI) applications on Linux operating systems, developers may encounter some challenges, which include but are not limited to:

1. GPU driver : In order to use NVIDIA GPU for training or inference on a Linux system, the correct NVIDIA GPU driver needs to be installed and configured. Some additional work may be required as different operating systems and GPU models may require different drivers.

2. AI framework compilation : When programming with an AI framework on a Linux system, the appropriate compiler and other dependencies need to be installed and configured. These frameworks often require compilation, so you need to ensure that the compiler and other dependencies are installed correctly, and that the compiler is configured correctly.

3. Software compatibility : The Linux operating system supports many different software and tools, but there may be compatibility issues between different versions and distributions. This may cause some programs to not run properly or be unavailable on some operating systems. Therefore, R&D personnel need to understand the software compatibility of their working environment and make necessary configurations and modifications.

4. Performance issues : The AI ​​software stack is an extremely complex system that usually requires professional optimization of different models of CPUs and GPUs to achieve its best performance. Performance optimization of software and hardware collaboration is a challenging task for the AI ​​software stack, requiring a high level of technology and expertise.

Alibaba Cloud Linux 3 (hereinafter referred to as "Alinux 3"), Alibaba Cloud's third-generation cloud server operating system, is a commercial version of the operating system developed based on Anolis OS . It provides developers with a powerful AI development platform. Alinux 3 implements full support for the mainstream nvidia GPU and CUDA ecosystem, making AI development more convenient and efficient. In addition, Alinux 3 also supports the optimization of AI by different CPU platforms such as mainstream AI frameworks TensorFlow/PyTorch and Intel/amd. It will also introduce native support for large model SDKs such as modelscope and huggingface, providing developers with rich resources and tool. These supports make Alinux 3 a complete AI development platform, solving the pain points of AI developers without having to fiddle with the environment all the time, making the AI ​​development experience easier and more efficient.

Alinux 3 provides developers with a powerful AI development platform. In order to solve the above challenges that developers may encounter, Alinux 3 provides the following optimization upgrades:

1. Alinux 3 supports developers to install mainstream NVIDIA GPU drivers and CUDA acceleration libraries with one click by introducing the Dragon Lizard Ecological Software Repository (epao) , saving developers the time of matching driver versions and manual installation.

2. The epao warehouse also provides version support for the mainstream AI framework Tensorflow/PyTorch . At the same time, the dependency problem of the AI ​​framework will be automatically solved during the installation process. Developers can quickly develop with the system Python environment without needing to perform additional compilation.

3. Before the AI ​​capabilities of Alinux 3 are provided to developers, all components have been tested for compatibility . Developers can install the corresponding AI capabilities with one click, eliminating possible modifications to system dependencies in the environment configuration and improving improve stability during use.

4. Alinux 3 has been specially optimized for AI for CPUs on different platforms such as Intel/AMD to better release the full performance of the hardware .

5. In order to adapt to the rapid iteration of the AIGC industry faster , Alinux 3 will also introduce native support for large model SDKs such as ModelScope and HuggingFace, providing developers with rich resources and tools.

With the support of multi-dimensional optimization, Alinux 3 has become a complete AI development platform, solving the pain points of AI developers and making the AI ​​development experience easier and more efficient.

The following uses Alibaba Cloud GPU instances as an example to demonstrate Alinux 3's support for the AI ​​ecosystem:

1. Purchase a GPU instance

picture

2. Select the Alinux 3 image

picture

3. Install epao repo configuration

dnf install -y anolis-epao-release

4. Install nvidia GPU driver 

Before installing the nvidia driver, ensure that kernel-devel is installed to ensure that the nvidia driver is installed successfully.

dnf install -y kernel-devel-$(uname-r)

Install nvidia driver:

dnf install -y nvidia-driver nvidia-driver-cuda

After the installation is complete, you can view the GPU device status through the nvidia-smi command.

picture

5. Install cuda ecological library

dnf install -y cuda

6. Install AI framework tensorflow/pytorch

Currently, the CPU version of tensorflow/pytorch is provided, and the GPU version of the AI ​​framework will be supported in the future.

dnf install tensorflow -y
dnf install pytorch -y

After the installation is complete, you can use a simple command to check whether the installation was successful:

picture

picture

7. Deploy the model

Using Alinux 3's ecological support for AI, the GPT-2 Large model can be deployed to continue the task of writing this article.

Install Git and Git LFS to facilitate subsequent model downloads.

dnf install -y git git-lfs wget

Update pip to facilitate subsequent deployment of the Python environment.

python -m pip install --upgrade pip

Enable Git LFS support.

git lfs install

Download the write-with-transformer project source code and pre-trained model. The write-with-transformer project is a web writing APP that can use the GPT-2 large model to continue writing content.

git clone https://huggingface.co/spaces/merve/write-with-transformer
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/gpt2-large
wget https://huggingface.co/gpt2-large/resolve/main/pytorch_model.bin -O gpt2-large/pytorch_model.bin

Install the dependencies required by write-with-transformer.

cd ~/write-with-transformer
pip install --ignore-installed pyyaml==5.1
pip install -r requirements.txt

After the environment is deployed, you can run the web version of the APP to experience the fun of writing with the help of GPT-2. Currently GPT-2 only supports text generation in English.

cd ~/write-with-transformer
sed -i 's?"gpt2-large"?"../gpt2-large"?g' app.py
sed -i '34s/10/32/;34s/30/120/' app.py
streamlit run app.py --server.port 7860

The echo message appears External URL: http://<ECS EXTERNAL IP>:7860, indicating that the web version of the APP runs successfully.

picture

"For more information on dragon lizard products, ecology, and technical cooperation, please send an email to [email protected] and we will contact you as soon as possible."

-- over--

Guess you like

Origin blog.csdn.net/weixin_60347558/article/details/132685084