【openai_ros】11 - Project: Training a Hopper robot【fail】


Unit 4: Project: Training a Hopper robot

Hopper Robot

For this part of the project, you will use the Hopper robot. The Hopper robot is a one-legged robot whose biggest challenge is being able to stand up.
在这里插入图片描述

Tasks to accomplish

1. Create all the environments needed in order to be able to train the Hopper robot. First, generate the Robot Environment. You can start with the below templates.

hopper_env.py
from openai_ros import robot_gazebo_env


class HopperEnv(robot_gazebo_env.RobotGazeboEnv):
    """Superclass for all Robot environments.
    """

    def __init__(self):
        """Initializes a new Robot environment.
        """
        # Variables that we give through the constructor.

        # Internal Vars
        self.controllers_list = ['my_robot_controller1','my_robot_controller2', ..., 'my_robot_controllerX']

        self.robot_name_space = "my_robot_namespace"

        reset_controls_bool = True or False
        
        # We launch the init function of the Parent Class robot_gazebo_env.RobotGazeboEnv
        
        super(MyRobotEnv, self).__init__(controllers_list=self.controllers_list,
                                                robot_name_space=self.robot_name_space,
                                                reset_controls=reset_controls_bool)

    # Methods needed by the RobotGazeboEnv
    # ----------------------------
    
    

    def _check_all_systems_ready(self):
        """
        Checks that all the sensors, publishers and other simulation systems are
        operational.
        """
        # TODO
        return True
    
    # Methods that the TrainingEnvironment will need to define here as virtual
    # because they will be used in RobotGazeboEnv GrandParentClass and defined in the
    # TrainingEnvironment.
    # ----------------------------
    def _set_init_pose(self):
        """Sets the Robot in its init pose
        """
        raise NotImplementedError()
    
    
    def _init_env_variables(self):
        """Inits variables needed to be initialized each time we reset at the start
        of an episode.
        """
        raise NotImplementedError()

    def _compute_reward(self, observations, done):
        """Calculates the reward to give based on the observations given.
        """
        raise NotImplementedError()

    def _set_action(self, action):
        """Applies the given action to the simulation.
        """
        raise NotImplementedError()

    def _get_obs(self):
        raise NotImplementedError()

    def _is_done(self, observations):
        """Checks if episode done based on observations given.
        """
        raise NotImplementedError()
        
    # Methods that the TrainingEnvironment will need.
    # ----------------------------

2. Generate the Task Environment. You can start with the below template.

hopper_stay_up_env.py
from gym import spaces
import hopper_env
from gym.envs.registration import register
import rospy

# The path is __init__.py of openai_ros, where we import the MovingCubeOneDiskWalkEnv directly
timestep_limit_per_episode = 1000 # Can be any Value

register(
        id='HopperStayUp-v0',
        entry_point='hopper_stay_up_env:HopperStayUpEnv',
        timestep_limit=timestep_limit_per_episode,
    )

class HopperStayUpEnv(hopper_env.HopperEnv):
    def __init__(self):
        
        # Only variable needed to be set here
        number_actions = rospy.get_param('/my_robot_namespace/n_actions')
        self.action_space = spaces.Discrete(number_actions)
        
        # This is the most common case of Box observation type
        high = numpy.array([
            obs1_max_value,
            obs12_max_value,
            ...
            obsN_max_value
            ])
            
        self.observation_space = spaces.Box(-high, high)
        
        # Variables that we retrieve through the param server, loded when launch training launch.
        


        # Here we will add any init functions prior to starting the MyRobotEnv
        super(MyTrainingEnv, self).__init__()


    def _set_init_pose(self):
        """Sets the Robot in its init pose
        """
        # TODO

    def _init_env_variables(self):
        """
        Inits variables needed to be initialised each time we reset at the start
        of an episode.
        :return:
        """
        # TODO


    def _set_action(self, action):
        """
        Move the robot based on the action variable given
        """
        # TODO: Move robot

    def _get_obs(self):
        """
        Here we define what sensor data of our robots observations
        To know which Variables we have access to, we need to read the
        MyRobotEnv API DOCS
        :return: observations
        """
        # TODO
        return observations

    def _is_done(self, observations):
        """
        Decide if episode is done based on the observations
        """
        # TODO
        return done

    def _compute_reward(self, observations, done):
        """
        Return the reward based on the observations given
        """
        # TODO
        return reward
        
    # Internal TaskEnv Methods

3. Create a training script that will use the Qlearn algorithm in order to train the Hopper robot.

4. Create a training script that will use the deepq algorithm in order to train the Hopper robot.

5. Compare the results between the two learning algorithms, and check which one perpforms better. Try to improve the larning script.

发布了58 篇原创文章 · 获赞 7 · 访问量 6117

猜你喜欢

转载自blog.csdn.net/weixin_42828571/article/details/104338302