OpenAI Gym中FrozenLake环境(场景)源码分析(4)

接前一篇文章:OpenAI Gym中FrozenLake环境(场景)源码分析(3)

上一篇文章通过pdb调试了第1个关键步骤:

  • env = gym.make("FrozenLake-v1")

本文来看第2个关键步骤:

  • env.reset()

 为了便于看清楚及调试,退出前一次调试,重新运行以下命令开始新的调试:

python -m pdb frozen_lake2.py 

命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb)

env.reset()在frozen_lake2.py的第53行,因此将断点设置在文件的第53行,命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 53
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:53

之后输入c,使程序继续运行(执行到这个断点)。如下所示:

(Pdb) b 53
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:53
(Pdb) c
The observation space: Discrete(16)
16
The action space: Discrete(4)
4
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(53)<module>()
-> state = env.reset()
(Pdb)

可以看到,程序已经停在了断点的位置。输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:

-> state = env.reset()
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(58)reset()
-> def reset(self, **kwargs):
(Pdb) 

可以看到,程序已经进入到了reset函数中。最为关键的是,其指示出了reset函数所在的位置,是在gym/wrappers/time_limit.py文件中。reset方法代码如下:

def reset(self, **kwargs):
        """Resets the environment with :param:`**kwargs` and sets the number of steps elapsed to zero.

        Args:
            **kwargs: The kwargs to reset the environment with

        Returns:
            The reset environment
        """
        self._elapsed_steps = 0
        return self.env.reset(**kwargs)

这个reset方法最终调用了self.env.reset方法。那么env对应的是什么?在reset所属的类class TimeLimit(gym.Wrapper)的构造函数中:

def __init__(
        self,
        env: gym.Env,
        max_episode_steps: Optional[int] = None,
    ):

可见env实际指向了gym.Env。这就意味着self.env.reset实际调用的是gym.Env.reset方法。我们通过调试进行验证。

在调试界面中输入n单步执行,直到执行到self.env.reset这句代码。如下所示:

> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(53)<module>()
-> state = env.reset()
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(58)reset()
-> def reset(self, **kwargs):
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(67)reset()
-> self._elapsed_steps = 0
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(68)reset()
-> return self.env.reset(**kwargs)
(Pdb) 

此时输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:

> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(68)reset()
-> return self.env.reset(**kwargs)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(39)reset()
-> def reset(self, **kwargs):
(Pdb) 

看来我们之前的推断是不对的,并不是gym.Env,而是到了order_enforcing.py中的reset方法中。代码如下:

def reset(self, **kwargs):
        """Resets the environment with `kwargs`."""
        self._has_reset = True
        return self.env.reset(**kwargs)

继续往下调试:

> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(68)reset()
-> return self.env.reset(**kwargs)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(39)reset()
-> def reset(self, **kwargs):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(41)reset()
-> self._has_reset = True
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(42)reset()
-> return self.env.reset(**kwargs)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(41)reset()
-> def reset(self, **kwargs):
(Pdb) 

到了wrappers/env_checkers.py中。代码如下:

def reset(self, **kwargs):
        """Resets the environment that on the first call will run the `passive_env_reset_check`."""
        if self.checked_reset is False:
            self.checked_reset = True
            return env_reset_passive_checker(self.env, **kwargs)
        else:
            return self.env.reset(**kwargs)

继续跟进调试:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(41)reset()
-> def reset(self, **kwargs):
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(43)reset()
-> if self.checked_reset is False:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(44)reset()
-> self.checked_reset = True
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(45)reset()
-> return env_reset_passive_checker(self.env, **kwargs)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(170)env_reset_passive_checker()
-> def env_reset_passive_checker(env, **kwargs):

来到了utils/passive_env_checker.py,代码如下:

def env_reset_passive_checker(env, **kwargs):
    """A passive check of the `Env.reset` function investigating the returning reset information and returning the data unchanged."""
    signature = inspect.signature(env.reset)
    if "seed" not in signature.parameters and "kwargs" not in signature.parameters:
        logger.warn(
            "Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator."
        )
    else:
        seed_param = signature.parameters.get("seed")
        # Check the default value is None
        if seed_param is not None and seed_param.default is not None:
            logger.warn(
                "The default seed argument in `Env.reset` should be `None`, otherwise the environment will by default always be deterministic. "
                f"Actual default: {seed_param}"
            )

    if "options" not in signature.parameters and "kwargs" not in signature.parameters:
        logger.warn(
            "Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information."
        )

    # Checks the result of env.reset with kwargs
    result = env.reset(**kwargs)

    if not isinstance(result, tuple):
        logger.warn(
            f"The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `{type(result)}`"
        )
    elif len(result) != 2:
        logger.warn(
            "The result returned by `env.reset()` should be `(obs, info)` by default, , where `obs` is a observation and `info` is a dictionary containing additional information."
        )
    else:
        obs, info = result
        check_obs(obs, env.observation_space, "reset")
        assert isinstance(
            info, dict
        ), f"The second element returned by `env.reset()` was not a dictionary, actual type: {type(info)}"
    return result

持续跟进:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(170)env_reset_passive_checker()
-> def env_reset_passive_checker(env, **kwargs):
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(172)env_reset_passive_checker()
-> signature = inspect.signature(env.reset)
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(173)env_reset_passive_checker()
-> if "seed" not in signature.parameters and "kwargs" not in signature.parameters:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(178)env_reset_passive_checker()
-> seed_param = signature.parameters.get("seed")
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(180)env_reset_passive_checker()
-> if seed_param is not None and seed_param.default is not None:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(186)env_reset_passive_checker()
-> if "options" not in signature.parameters and "kwargs" not in signature.parameters:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(192)env_reset_passive_checker()
-> result = env.reset(**kwargs)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/envs/toy_text/frozen_lake.py(255)reset()
-> def reset(
(Pdb) 

兜兜转转了一大圈,最终来到了envs/toy_text/frozen_lake.py中的reset。代码如下:

def reset(
        self,
        *,
        seed: Optional[int] = None,
        options: Optional[dict] = None,
    ):
        super().reset(seed=seed)
        self.s = categorical_sample(self.initial_state_distrib, self.np_random)
        self.lastaction = None

        if self.render_mode == "human":
            self.render()
        return int(self.s), {"prob": 1}

这个reset方法才是属于class FrozenLakeEnv(Env)自己的reset方法。

虽然走了这么一大圈,耗费了不少精力,但也有了一个很大的收获:FrozenLake的真正实现的底层文件是gym/envs/toy_text/frozen_lake.py!后续应该直接重点针对这个文件进行分析研读就好了。

 

猜你喜欢

转载自blog.csdn.net/phmatthaus/article/details/131727343