OpenAI Gym中FrozenLake环境(场景)源码分析(6)

接前一篇文章:OpenAI Gym中FrozenLake环境(场景)源码分析(5)

上一篇文章通过pdb调试了第3个关键步骤:

  • env.action_space.sample()

 本文来看第3个关键步骤:

  • env.step(action)

为了便于看清楚及调试,退出前一次调试,重新运行以下命令开始新的调试:

python -m pdb frozen_lake2.py 

命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) 

env.action_space.sample()在frozen_lake2.py的第73行,因此将断点设置在文件的第73行,命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 73
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:73
(Pdb) 

之后输入c,使程序继续运行(执行到这个断点)。如下所示:

(Pdb) c
The observation space: Discrete(16)
16
The action space: Discrete(4)
4
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(73)<module>()
-> new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行(Pdb) 

可以看到,程序已经停在了断点的位置。输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:

-> new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(39)step()
-> def step(self, action):
(Pdb) 

可以看到,程序已经进入到了step函数中。最为关键的是,其指示出了step函数所在的位置,是在gym/wrappers/time_limit.py文件中。step方法代码如下:

    def step(self, action):
        """Steps through the environment and if the number of steps elapsed exceeds ``max_episode_steps`` then truncate.

        Args:
            action: The environment step action

        Returns:
            The environment step ``(observation, reward, terminated, truncated, info)`` with `truncated=True`
            if the number of steps elapsed >= max episode steps

        """
        observation, reward, terminated, truncated, info = self.env.step(action)
        self._elapsed_steps += 1

        if self._elapsed_steps >= self._max_episode_steps:
            truncated = True

        return observation, reward, terminated, truncated, info

这个函数是class TimeLimit(gym.Wrapper)中的方法。继续跟进调试:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(39)step()
-> def step(self, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(50)step()
-> observation, reward, terminated, truncated, info = self.env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(33)step()
-> def step(self, action):
(Pdb) 

这个step方法位于gym/wrappers/order_enforcing.py文件中。step方法代码如下:

    def step(self, action):
        """Steps through the environment with `kwargs`."""
        if not self._has_reset:
            raise ResetNeeded("Cannot call env.step() before calling env.reset()")
        return self.env.step(action)

继续跟进调试:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(33)step()
-> def step(self, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(35)step()
-> if not self._has_reset:
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(37)step()
-> return self.env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(33)step()
-> def step(self, action: ActType):
(Pdb) 

这次又来到了gym/wrappers/env_checker.py的class PassiveEnvChecker(gym.Wrapper)
的step方法中。代码如下:

    def step(self, action: ActType):
        """Steps through the environment that on the first call will run the `passive_env_step_check`."""
        if self.checked_step is False:
            self.checked_step = True
            return env_step_passive_checker(self.env, action)
        else:
            return self.env.step(action)

继续跟进:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(33)step()
-> def step(self, action: ActType):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(35)step()
-> if self.checked_step is False:
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(36)step()
-> self.checked_step = True
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(37)step()
-> return env_step_passive_checker(self.env, action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(211)env_step_passive_checker()
-> def env_step_passive_checker(env, action):

env_step_passive_checker方法位于gym/utils/passive_env_checker.py文件中。代码如下:

def env_step_passive_checker(env, action):
    """A passive check for the environment step, investigating the returning data then returning the data unchanged."""
    # We don't check the action as for some environments then out-of-bounds values can be given
    result = env.step(action)
    assert isinstance(
        result, tuple
    ), f"Expects step result to be a tuple, actual type: {type(result)}"
    if len(result) == 4:
        logger.deprecation(
            "Core environment is written in old step API which returns one bool instead of two. "
            "It is recommended to rewrite the environment with new step API. "
        )
        obs, reward, done, info = result

        if not isinstance(done, (bool, np.bool8)):
            logger.warn(
                f"Expects `done` signal to be a boolean, actual type: {type(done)}"
            )
    elif len(result) == 5:
        obs, reward, terminated, truncated, info = result

        # np.bool is actual python bool not np boolean type, therefore bool_ or bool8
        if not isinstance(terminated, (bool, np.bool8)):
            logger.warn(
                f"Expects `terminated` signal to be a boolean, actual type: {type(terminated)}"
            )
        if not isinstance(truncated, (bool, np.bool8)):
            logger.warn(
                f"Expects `truncated` signal to be a boolean, actual type: {type(truncated)}"
            )
    else:
        raise error.Error(
            f"Expected `Env.step` to return a four or five element tuple, actual number of elements returned: {len(result)}."
        )

    check_obs(obs, env.observation_space, "step")

    if not (
        np.issubdtype(type(reward), np.integer)
        or np.issubdtype(type(reward), np.floating)
    ):
    logger.warn(
            f"The reward returned by `step()` must be a float, int, np.integer or np.floating, actual type: {type(reward)}"
        )
    else:
        if np.isnan(reward):
            logger.warn("The reward is a NaN value.")
        if np.isinf(reward):
            logger.warn("The reward is an inf value.")

    assert isinstance(
        info, dict
    ), f"The `info` returned by `step()` must be a python dictionary, actual type: {type(info)}"

    return result

继续跟进调试:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(211)env_step_passive_checker()
-> def env_step_passive_checker(env, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(214)env_step_passive_checker()
-> result = env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/envs/toy_text/frozen_lake.py(244)step()
-> def step(self, a):
(Pdb) 

最终来到了gym/envs/toy_text/frozen_lake.py文件中,如同前篇文章中分析的一样。frozen_lake.py中的step函数代码如下:

def step(self, a):
        transitions = self.P[self.s][a]
        i = categorical_sample([t[0] for t in transitions], self.np_random)
        p, s, r, t = transitions[i]
        self.s = s
        self.lastaction = a

        if self.render_mode == "human":
            self.render()
        return (int(s), r, t, False, {"prob": p})

对于step函数的具体解析,请看下回。

猜你喜欢

转载自blog.csdn.net/phmatthaus/article/details/131748126