Gym's Spaces.Discrete and Spaces.box

Original: https://www.jianshu.com/p/cb0839a4d1d3

1. OpenAI Gym installation

installation

I environment is Ubuntu16.04 + anaconda + Python3.6.2

git clone https://github.com/openai/gym
cd gym
sudo pip install -e .[all]

Here pip install -e. [All] is to install all of the environment, you do not want to do so may pip install -e. Install basic items, after manual installation environment required. Note To use administrator privileges to install, otherwise it will complain!

helloworld

After installation is complete, we run a small demo to verify the successful installation, here CartPole-v0 validate 1000 at:

import gym
env = gym.make('CartPole-v0')
env.reset()  #重置环境
for _ in range(1000):  #1000帧
    env.render()  #每一帧重新渲染环境
    env.step(env.action_space.sample()) # take a random action

The results should be run like this:

 

Normally we should be in pole before sliding out of the screen to stop off environment, it will be introduced later.

If you want to see what it was like in the other environment, CartPole-v0 can be replaced MountainCar-V0 , MsPacman-V0 , these are from the environment Env base class.

You can view a list of all the environment OpenAI Gym:

from gym import envs
print(envs.registry.all())

2. OpenAI Gym use

Observation (observation)

Helloworld example above, the action is random, if you want to make a better action than taking action randomly in each step, the actual action to understand the impact on the environment might be good.

step function returns the necessary information environment, step function returns four values ​​observation, reward, done, info, following information is specific:

  • Observation (Object): a description of environment-related objects you observe the environment, such as pixel information of the camera, angular velocity and angular acceleration, the state board game board in the robot.
  • Reward (float): All the return of previously different behavior obtained and calculated manner different environments, but the goal is always to increase your total return.
  • DONE (boolean): to determine whether a reset (reset) environment, most clearly defined task is divided into episodes, and complete episode is True representation has been terminated.
  • info (dict): diagnostic information for debugging, sometimes for learning, but a formal evaluation is not allowed to use this information to learn. This is the realization of a typical agent-environment loop of.

 

 

Each time step, Agent choose an action, Environment observation and returns a reward.

 

The following example is started by calling reset, it returns an initial observation. Therefore, a more appropriate way to improve helloworld code is done when the return is True, end the current episode:

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

Is different from the previous results of each pole will go down, reset the environment, and each frame is returned Observation observation information pole position.

In this code, each done returns True (pole will be down to the time) to reset the environment, then each frame returns Observation to monitor the current model.

Spaces (Space)

Previous examples have used random action, then the action is how to represent it? Each environment has a valid action Space Object and description of the observations:

import gym
env = gym.make('CartPole-v0')
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(4,)

Discrete Space fixed range allows non-negative. This example, the movement of the pole only left and right, so in this case, the effective action is 0 or 1. Box represents a n-dimensional space frame, in this example a two-dimensional space of the mast, so effective observation would be four array of numbers. You can also check the scope of the Box:

print(env.observation_space.high)
#> array([ 2.4       ,         inf,  0.20943951,         inf])
print(env.observation_space.low)
#> array([-2.4       ,        -inf, -0.20943951,        -inf])

Box and Discrete are the most common spaces, or checks can be sampled from the space belongs to its content:

from gym import spaces
space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7}
x = space.sample()
assert space.contains(x)
assert space.n == 8

Many environmental data in these spaces are not like this simple example so intuitive, but as long as your model is good enough, he did not need to try to explain the data.


Author: huyuanda
link: https: //www.jianshu.com/p/cb0839a4d1d3
Source: Jane book
Jane book copyright reserved by the authors, are reproduced in any form, please contact the author to obtain authorization and indicate the source.

Guess you like

Origin blog.csdn.net/lxlong89940101/article/details/90696899