ChatGLM-6B-PT specified gpu fine-tuning

Fine-tuning ChatGLM-6B-PT in 4-3090, because gpu:0, gpu:2, gpu:3 are all occupied, resulting in insufficient memory for fine-tuning

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 23.70 GiB total capacity; 8.87 GiB already allocated; 79.81 MiB free; 8.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

You can only choose to fine-tune the model on gpu:1. The problem is that the deepspeed script ds_train_finetune.sh runs on the full card by default, and you can see the prompt when running

[INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0, 1, 2, 3

This is because in ds_train_finetune.sh, the default num_gpus=4 all cards are fully open

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py \

Save flow, check the official document  Getting Started - DeepSpeed

There is a detailed introduction to the configuration of a single node selecting a designated GPU for training, which only needs to be configured in ds_train_finetune.sh

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed  --include="localhost:1" --master_port $MASTER_PORT main.py \

In this way, you can choose to specify gpu:1 on the local machine for model fine-tuning


The following content is because I didn’t read the full document, which led to a waste of energy for a long time

Consider changing to

LR=1e-4
CUDA_VISIBLE_DEVICES=1
MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=1 --master_port $MASTER_PORT main.py \

But it’s useless. Although only one card is specified for fine-tuning, it still runs on gpu:0 by default. The previous setting CUDA_VISIBLE_DEVICES=1 has been rewritten. You can see the prompt information when you observe the runtime

[INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0

Check the official deepspeed document, it is written that for multi-machine training, you can specify the gpu used by each machine

 Try to add hostfile according to this idea and configure --hostfile --include

touch hostfile
vim hostfile

 hostfile contents

host slots=4

Modify ds_train_finetune.sh

vim ds_train_finetune.sh

The first three lines of ds_train_finetune.sh

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --hostfile=hostfile --include="host:1" --master_port $MASTER_PORT main.py \

An error message cannot connect to host via ssh because host is not a hostname

Then consider looking at the native hostname

hostname

get hostname 43090

43090

Modify the content of hostfile and the first three lines of ds_train_finetune.sh

43090 slots=4
LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --hostfile=hostfile --include="43090:1" --master_port $MASTER_PORT main.py \

I still get an error and can’t connect to 43090 through ssh, because I connected myself remotely, I thought about using a local connection, and changed the 43090 of the two files to localhost, because the local connection can be connected by ssh localhost

RuntimeError: Using hostfile at hostfile but host=43090 was not reachable via ssh. If you are running with a single node please remove hostfile or setup passwordless ssh.

localhost slots=4
LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --hostfile=hostfile --include="localhost:1" --master_port $MASTER_PORT main.py \

If there is no accident, there will be an accident, and the error will still be reported

RuntimeError: Using hostfile at hostfile but host=localhost was not reachable via ssh. If you are running with a single node please remove hostfile or setup passwordless ssh.

Suddenly I noticed the last sentence of the error message

If you are running with a single node please remove hostfile or setup passwordless ssh.

What hostfile do I use for local connections? I immediately delete the hostfile and --hostfile=hostfile in ds_train_finetune.sh

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed  --include="localhost:1" --master_port $MASTER_PORT main.py \

After the program finished running, I checked the running prompt information, and I really burst into tears

[INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=1

The display is fine-tuned on the specified gpu: 1, and finally nohup can rest assured that it hangs

nohup bash ds_train_finetune.sh > nohup.log 2>&1 &

If memory overflow occurs, you can adjust per_device_train_batch_size in ds_train_finetune.sh to 1

    --per_device_train_batch_size 1 \

Guess you like

Origin blog.csdn.net/Hello_World1023/article/details/130373048