When using the Trainer provided by huggingface for model prediction, if output_hidden_states=True during training, the video memory usage will increase infinitely, eventually leading to a CUDA out of memory memory overflow error.
Solution:
At the final return value of the model, just set hidden_states to None. I don't know the specific reason.