RuntimeError: Trying to backward through the graph a second time, but the buffers have already free - 代码天地

RuntimeError: Trying to backward through the graph a second time, but the buffers have already free

企业开发 2023-08-01 19:57:36 阅读次数: 0

问题:

训练模型的时候碰到报错 RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

原因：

为了减少显存的使用，pytorch训练时在调用backward()后会自动释放中间结果，等到第二次调用的时候，中间结果不存在导致的报错，可以在backward()中添加retain_graph=True参数，保留中间结果

等会，这不符合直觉啊，不应该每次forward都走一遍模型，创建一遍graph么？刚好评论区有人问了这个问题

划重点， perform some computation just before the loop，定义了循环外的变量，在第一次backward的时候释放了，从而导致第二次forward的时候，循环外的变量释放同时无法redo导致中间结果丢失报错

import torch
a = torch.rand(3,3, requires_grad=True)

# This will be share by both iterations and will make the second backward fail !
b = a * a

for i in range(10):
    d = b * b
    res = d.sum()
    # The first here will work but the second will not !
    res.backward(

###运行报错
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.


###改成以下形式，问题解决
import torch
#a = torch.rand(3,3, requires_grad=True)

# This will be share by both iterations and will make the second backward fail !
#b = a * a

for i in range(10):
    a = torch.rand(3,3, requires_grad=True)
    b = a * a
    d = b * b
    res = d.sum()
    # The first here will work but the second will not !
    res.backward()

另外一种解决方法

如果在模型中存在一部分不训练的参数，如Moco等，可以尝试在该部分变量创建函数上添加torch.no_grad()，也能解决这个问题

参考资料RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time - PyTorch Forums

猜你喜欢

转载自blog.csdn.net/yangyanbao8389/article/details/130426801

RuntimeError: Trying to backward through the graph a second time, but the buffers have already free

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been

“RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time”

【Error】: Trying to backward through the graph a second time, but the buffers have already been

Pytorch :Trying to backward through the graph a second time, but the buffers have already been freed

问题解决：Pytorch :Trying to backward through the graph a second time, but the buffers。。

Pytorch 常见报错 RuntimeError: Trying to backward through the graph a second time

【笔记】RuntimeError: Trying to backward through the graph a second time：将无关变量的梯度回传关系撤销

RuntimeError: Trying to backward through the graph a second time (or directly access saved variable

【报错】：RuntimeError: Trying to backward through the graph a second time, but the saved intermediate re

深度学习：模型训练过程中Trying to backward through the graph a second time解决方案

解决：RuntimeError: reflection_pad2d_backward_cuda does not have a deterministic implementation......

【MongoDB bug】“errmsg“ : “db already exists with different case already have: [Starwar] trying to cre

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found

出现 RuntimeError: adaptive_avg_pool2d_backward_cuda does not have a deterministic implementation的解决方法

【yolo系列：运行报错RuntimeError: adaptive_avg_pool2d_backward_cuda does not have a deterministic impleme】

RuntimeError: Trying to resize storage that is not resizable

内存 free buffers和cache

RuntimeError: Address already in use

The world is going through the second wave of the frenzy of cryptocurrency

Decoding billions of integers per second through vectorization

UUID have already exists

how to explain free buffers and cached memory usage

Linux Free/Top中的 Buffers and Cached

loss.backward()引发“RuntimeError: Found dtype Double but expected Float”

RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying

Pytorch distributed RuntimeError: Address already in use

python报错RuntimeError: This event loop is already running

【linux】free命令中cached和buffers的区别

759. Employee Free Time

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)