too many values to unpack (expected 4)

强化学习环境OpenAI gym env.step(action)报错:

too many values to unpack (expected 4)

问题源代码:

observation, reward, done, info = env.step(action)

错误原因:获取的变量少了,应该是5个,现在只定义4个,所以报错。

可以写成这样:

observation, reward, terminated, truncated, info = env.step(action)

也可以在原来的后面加个横杠:

observation, reward, done, info, _ = env.step(action)

相关解释:

Returns:(返回值)
【1】observation (object): this will be an element of the environment's :attr:`observation_space`. This may, for instance, be a numpy array containing the positions and velocities of certain objects. 环境状态信息
【2】reward (float): The amount of reward returned as a result of taking the action. 奖励信息
【3】terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results. 是否到达终端状态,和之前的done类似。
【4】truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
Can be used to end the episode prematurely before a `terminal state` is reached. 是否满足MDP范围之外的截断条件。通常是一个时间限制,但也可用于指示代理物理上超出界限。可用于在达到“终端状态”之前提前结束该集。
info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging). `info”包含辅助诊断信息(有助于调试、学习和记录)。
This might, for instance, contain: metrics that describe the agent's performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version. 例如,这可能包含:描述代理的绩效状态的指标、隐藏在观察中的变量,或组合起来产生总奖励的单个奖励术语。它还可以包含区分截断和终止的信息,但这是不推荐的,而是支持返回两个布尔值,并将在将来的版本中删除。
(deprecated)(弃用值)
 done (bool): A boolean value for if the episode has ended, in which case further :meth:`step` calls will return undefined results. A done signal may be emitted for different reasons: >Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics >simulation has entered an invalid state

一个布尔值,表示事件是否已结束,在这种情况下,进一步的:meth:`step`调用将返回未定义的结果。可能出于不同的原因发出已完成信号:>可能环境下的任务已成功解决,超过了特定的时间限制,或者物理>模拟已进入无效状态

猜你喜欢

转载自blog.csdn.net/weixin_44727682/article/details/128516985