ML-Agents (b) create a learning environment


ML-Agents (b) create a learning environment

I. Introduction

On the one we are talking about how to configure ML-Agents environment, this section we create a sample, the main use of Reinforcement Learning (Reinforcement Learning).

image-20200315221346488

Above, the present example will be trained to find a ball rolling randomly placed cubes, but also to avoid fall off the internet.

This example is based on the ML-Agents official of example, the official Chinese version and English version of the two documents, the English version is the latest, most of the same Chinese version and English version, but there are also different, this article is based on the latest version done (v0.15.0, master branch), you need to refer to the official documentation can also be eaten with reference to the following addresses.

English: Https://Github.Com/Unity-Technologies/ml-agents/tree/master/docs

Chinese: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md

Second, an overview

In Unity ML-Agents project involves the following basic steps:

  1. Create an environment to accommodate agent. The environment may contain minor amounts from the physical object is a simple game or simulated environment to the entire ecosystem, environment pattern can be varied;
  2. Realization of Agent subclasses. Agent subclass defines the necessary code for agent used to observe their environment, performing the specified actions and rewards calculation for intensive training. You can also implement an alternative method to complete the task or tasks to reset the agent fails agent;
  3. Agent subclass will achieve an appropriate script is added to the GameObject, when the object in the scene, it means that the corresponding agent in the simulated environment.

( PS. In the official Chinese document, steps 2 and 3 subclasses need to implement the Academy and Brain, but in the new version, these two things have no need to define in the scene, so this is more important subclass of Agent, learn basic logic here)

Third, set the Unity project

The first step, let's create a new Unity project, and ML-Agents package into it:

  1. Open Unity, create a new project called random names, such as "RollerBall";

    image-20200315223810312

  2. Unity in the menu "Edit" -> "Project Settings", in the pop-up window, find the "Player", the "Api Compathbility Level" was changed to ".NET 4.x", as shown below;

    image-20200315224402045

    image-20200315224522187

  3. In the previous section, we have the code library clones ml-agents to a local, if there is no clone, please refer to on a "Unity ML-Agents v0.15.0 (a) the deployment environment and test run" in the five 1 , here, we assume we are all already cloned library, in Unity ML-Agents need to import Unity plug-in. Here is my version of Unity2019.2, as follows:

    • Packages folder found in the root directory of the project;

    image-20200315224936858

    • Folder has a "manifest.json" file, edit it, this is the engineering package Packages collection, added at the end "com.unity.ml-Agents": " File: D: / Unity the Projects / ml-Agents /com.unity.ml-agents ", here file: after your own clone ml-agents-source path, oh I do not copy, unless you are this path - -, as shown below;

      image-20200315225658933

      After modifying saved in Unity to cut, if the path is correct, the screen will appear import package package will successfully emerge "ML Agents" folder in the project folder Packages file, as shown below:

      image-20200315230214543

  4. Creation Environment

    Next, we create a simple ML-Agent environment. "Physical" components of the environment include a Plane (acting as agent moving floor), a Cube (acting as agent to find the target) and a Sphere (represented agent itself).

    • Create a floor

      • Right-click in the Hierarchy window, select 3D Object> Plane.

      • The game object named "Floor".

      • Select Plane to view its properties in the Inspector window.

      • The Transform to Position = (0,0,0), Rotation = (0,0,0), Scale = (1,1,1).

      • Plane material modifications, variations nice point.

    I copied the above process are, in fact, create a Plane, and then another nice material on the line, are free to define a OK.

    image-20200315231103610

    • Create a target cube

      • In the Hierarchy window, right-click and select 3D Object> Cube.

      • The game object named "Target"

      • Select Target to view its properties in the Inspector window.

      • The Transform to Position = (3,0.5,3), Rotation = (0,0,0), Scale = (1,1,1).

      • Cube modify material.

      image-20200315231522024

    • Add Agent sphere

      • Right-click in the Hierarchy window, select 3D Object> Sphere.
      • The game object named "RollerAgent"
      • Select Target to view its properties in the Inspector window.
      • The Transform to Position = (0,0.5,0), Rotation = (0,0,0), Scale = (1,1,1).
      • Sphere on the Mesh Renderer, expand Materials Properties and change the default material for the Checker 1 .
      • Click the Add the Component .
      • Add Physics / Rigidbody components to the Sphere. (Add Rigidbody)

    image-20200315231748255

OK, the above process will be in Unity 3D environment created, here we come to realize Agent.

Fourth, implement Agent

And "realize Academy" and "add Brain" in the official Chinese documents, the latest edition no longer needed! Agent directly on the line.

To create an Agent:

  1. Select RollerAgent game objects in order to view the object in the Inspector window.
  2. Click the Add the Component .
  3. In the Components list, click New Script (at the bottom).
  4. The script named "RollerAgent".
  5. Click the Create and the Add .

Then, edit the new RollerAgentscript:

  1. Open the RollerAgentscript;
  2. So RollerAgentinherited Agentclass, while references using MLAgentsand using MLAgents.Sensorsnamespace;
  3. Delete Update()method, to retain the Start()use after the method.

So far, the above steps are the basic steps to add the ML-Agents to any Unity project and need. Next, we will add logic, our agent can use reinforcement learning (reinforcement learning) learning techniques to find the cube.

Agent initialization and reset

When the agent (sphere) reaches the target position (block), it will own state marked as complete, and agent reset function (Reset) again will move to a new block location. In addition, if the agent from falling off the platform, will trigger the reset function, so that the agent is initialized, the target position will be randomly refreshed.

In order to reset the speed of the agent (and thereafter applying a force to move it), we need to refer to the sphere Rigidbodyassembly. This component can be a reference to written Start()method, the above logic, our RollerAgentscript is as follows:

using MLAgents;
using MLAgents.Sensors;
using UnityEngine;

public class RollerAgent : Agent
{
    public Transform Target;//方块
    public float speed = 10;//小球移动速度

    private Rigidbody rBody;//球刚体
    private void Start()
    {
        rBody = GetComponent<Rigidbody>();
    }

    /// <summary>
    /// Agent重置
    /// </summary>
    public override void AgentReset()
    {
        if (this.transform.position.y < 0)
        {//如果小球掉落,小球初始化
            rBody.velocity = Vector3.zero;
            rBody.angularVelocity = Vector3.zero;
            transform.position = new Vector3(0, 0.5f, 0);
        }

        //方块位置随机
        Target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);
    }
}

Next, we implement Agent.CollectObservations(VectorSensor sensor)methods.

Note that here the old version and different methods, and not before the function VectorSensor sensorparameters, but use the same.

Observing the environment (Observing the Environment)

Agent will send the information we collect to Brain, use this information to make decisions by the Brain. When you train - Agent (or model has been trained), the data as a feature vector input into the neural network. In order to succeed Agent learning a task, we need to provide the correct information. A good rule of thumb is to consider what you need to use in the analysis of solutions to computational problems.

Here is more important, that you're in training, you need to take into account the variable is what we'll look at the information in this case we need to collect the agent which:

  • Position of the target

    sensor.AddObservation(Target.position);

  • agent location

    sensor.AddObservation(transform.position);

  • agent's speed, which helps control agent to learn at their own pace, without making it past the target and roll down from the platform

    sensor.AddObservation(rBody.velocity.x);

    sensor.AddObservation(rBody.velocity.z);

Here a total of eight observations (a position count x, y, z three values), then need to Behavior Parametersbe provided in the component properties, as shown below:

image-20200316202539829

In the Chinese document, these values ​​were normalized, the latest English document and not be normalized, directly coupled on the line. Here then overloaded functions as follows:

/// <summary>
/// Agent收集的观察值
/// </summary>
/// <param name="sensor"></param>
public override void CollectObservations(VectorSensor sensor)
{
    sensor.AddObservation(Target.position);//目标的位置
    sensor.AddObservation(transform.position);//小球的位置
    sensor.AddObservation(rBody.velocity.x);//小球x方向的速度
    sensor.AddObservation(rBody.velocity.z);//小球z方向的速度
}

Agent final part of the Aegnt.AgentAction()function, this method is mainly used for receiving a command decision and to Brain Reward (reward) depending on the circumstances.

Action (Actions)

Brain operation decisions in the form of an array of transfer to AgentAction () function. This array element is mainly composed of the Brain agent Vector Action, Space Typeand Space Sizeto decide. Here representing the vectorial space , vector space type motion and vector spatial movement , ML-Agents The two types of actions: Continusous vectorial space is a continuously changing number vector may be, for example, an element may be applied to the agent indicates a one Rigidbodyon the inside or torque; Discrete vectorial space will be defined as an operation table, the agent provided to the specific operation is the index of the table.

Here we use a Continusousvector space movement, about to Space Typeset Continuous, and Space Sizeis set to 2. This decision represents the first element generated using Brain action[0]determined force is applied along the x axis, by action[1]determining a force applied along the z axis (if the agent is a three-dimensional movement, then Space Sizeto set 3). Note that , in this Brain does not know action[]the specific meaning of each value in the array, just to adjust the action based on observations entered in the training process, and then see what kind of rewards will be. DETAILED provided below, also in Behavior Parameterssetting the price of the group:

image-20200316203631828

Extended look, where you can also use the Discrete type of training, but the corresponding Space Sizewill become 4 because there are four directions need to control.

OK, the above-described operation code is as follows:

//Space Type=Continuous  Space Size=2
Vector3 controlSignal = Vector3.zero;
controlSignal.x = vectorAction[0];//x轴方向力
controlSignal.z = vectorAction[1];//z轴方向力
//当然上面这两句可以互换,因为Brain并不知道action[]数组中数值具体含义
rBody.AddForce(controlSignal * speed);

Rewards (Rewards)

Reinforcement Learning (reinforcement learning) need incentives. Similarly reward (penalty) also AgentAction()function to achieve, together with the rewrite operation to achieve the above function. Learning algorithm used in each step of the learning process simulation and assigned to the agent's incentives to determine whether to provide the best action for the agent. When the agent to complete the task, reward it. In this example, if the Agent (the ball) reached the target location (block), then give it a 1 point bonus.

RollerAgent calculates the distance required to reach the target when it reaches the target, we can Agent.SetReward()agent marked the completion of the method in terms of reward and give it 1 point, while using Done()the method to reset the environment.

//计算自身与目标的距离
float distanceToTarget = Vector3.Distance(transform.position,Target.position);

//不同情况进行奖励
if (distanceToTarget < 1.42f)
{//到达目标附近
    SetReward(1);
    Done();
}

Finally, if the ball fall platform, letting agent reset. There is no set punishment, interested boots child may try to set up their own punishment.

if (transform.position.y < 0)
{//小球掉落
    //SetReward(-1); 惩罚先不设置
    Done();
}

AgentAction () method

OK, by the above action and reward constitute AgentAction () method, which is mainly to understand the meaning of each step is why, last AgentAction()method is as follows:

public override void AgentAction(float[] vectorAction)
{
    //Space Type=Continuous  Space Size=2
    Vector3 controlSignal = Vector3.zero;
    controlSignal.x = vectorAction[0];//x轴方向力
    controlSignal.z = vectorAction[1];//z轴方向力
    //当然上面这两句可以互换,因为Brain并不知道action[]数组中数值具体含义
    rBody.AddForce(controlSignal * speed);

    //计算自身与目标的距离
    float distanceToTarget = Vector3.Distance(transform.position, Target.position);

    //不同情况进行奖励
    if (distanceToTarget < 1.42f)
    {//到达目标附近
        SetReward(1);
        Done();
    }
    if (transform.position.y < 0)
    {//小球掉落
        Done();
    }
}

The final set Editor

At this point, all the game objects and ML-Agent components are ready, then we need to add some script on the scene of the ball, modify some properties.

  1. Select the scene RollerAgent ball, first add the Behavior Parametersscript, and set them Space Sizeto 8 , Space Typeas the Continuous , Space Sizeis 2 . If this step until a few years has been to get up, you do not have control. But there there Behavior Name, this property should be the distinguished name Brain, the new version has the same multiple Agent Brain, it should be distinguished here, I personally think so, if there is wrong, please correct me;

    image-20200316210411597

  2. This step must remember that there are no old version, you need to add Decision Requestercomponents, and Decision Periodchanged to 10! English documentation is written here much, if not add this script, your ball is not moving up.

    image-20200316210228413

Manual testing environment

Before starting a long training manual testing your test environment is a sensible approach. For manual tests, we need to Roller Agentadd a script Heuristic()method, in order to replace Brain decisions, as follows:

/// <summary>
/// 手动测试
/// </summary>
/// <returns></returns>
public override float[] Heuristic()
{
    var action = new float[2];
    action[0] = Input.GetAxis("Horizontal");
    action[1] = Input.GetAxis("Vertical");
    return action;
}

In fact, this is to action [] array so that the agent assigned to the action via the keyboard action space.

Then also you need Behavior Parametersin the assembly Behavior Typeto Heuristic Only, the following:

image-20200316211002156

This time we can run it, (suddenly found Rollerthe Targer forget dragged on, the square targets drag come in), then you can use WSAD or up and down to control the ball, and near the box, the box will automatically re-set, if ball drop, will be re-set.

OK, we Behavior Typechanged back Default, ready to begin training the ~

Fifth, training

Open Anaconda3, we find built before training environment, start "Terminal".

image-20200316211638215

ml-agent cd to the root directory, for example, my path:

cd /d D:\Unity Projects\ml-agents

image-20200316211815856

Here to chip in, we modify Config file ml-agents, locate ml-agentsthe configfolder, and open the trainer_config.yamlconfiguration file, add the last sentence

RollerBallBrain:
    batch_size: 10
    buffer_size: 100

image-20200316212116811

Here you can see RollerBallBrain fact, I was just Behavior Parameterssetting components Behavior Name. Here modify these two parameters will override the default configuration file entries foremost default, these two ultra-small modification parameter values can make training faster, if the original parameters (batch_size: 1024, buffer_size: 10240 ), need to be trained about 300,000 steps , but modified only less than 20,000 steps. It should be based on specific project-specific settings.

After we finished configuring, back to the command line, enter:

mlagents-learn config/trainer_config.yaml --run-id=RollerBall-1 --train

image-20200316212754833

Unity is running the program.

If Unity and training environment Anaconda successful communication, you will find the command line configuration of your training:

image-20200316213051727

At the same time, you can see Unity to start their own small fast ball movement, the box will be randomly generated according to different states.

Over time, the command line will display the corresponding execution step number, elapsed time, average awards and other information.

image-20200316213444723

As the training progresses, you will find a small ball hard to fall off the platform, and has been following the position of the box:

test

Finally, training time is too long, you can configure file by max_stepsmodifying the biggest step training, so I'm here directly Ctrl + C, so the training model will survive.

image-20200316214811364

Find this RolerBallBrain.nnfile, the ml-agent of the modelsfolder, this document admitted .nn Unity as follows:

image-20200316215026438

Then select the scene ball, the Behavior Parameterscomponent Modelproperties, select just the trained models, and Behavior Typeelected Inference Only, as follows:

image-20200316215247543

Then click on the run, you can see the ball good use of our training model began to find the box.

1111

TensorBoard statistics

We at the command line, you can also find information on the chart just training, enter the command line:

tensorboard --logdir=summaries

image-20200316215746006

Then the address to open the browser, usuallyhttp://localhost:6006/

As you can see the number of training each value step change value.

image-20200316215921943

The meaning of a value, copy some official Chinese document:

  • Lesson - only during the course of training] is meaningful.

  • Cumulative Reward - the average cumulative reward for all scenarios agent. During a successful training should be increased.

  • Entropy - a random degree of decision-making model. Successful training process should slowly decrease. If you decrease too quickly, increase the betasuper parameters.

  • Episode Length - the average length of all the agent in the environment of each scene.

  • Learning Rate - require much training algorithms when searching for the optimal policy steps. Over time should be reduced.

  • Policy Loss - average loss policy feature updates. The degree of change and policy (decision action process) related. The success of this magnitude during the training should be reduced.

  • Value Estimate - the average value of all the states of the agent visit of estimates. During a successful training should be increased.

  • Value Loss - average loss value of feature updates. Associated with the ability of the model to predict the value of each state. This success should be reduced during training.

OK, above it is a small example of the whole process of the official training.

The record also tiring, but welcome to discuss the message, reproduced, please indicate the original address trouble Oh, thank you ~

Quote:

https://github.com/Unity-Technologies/ml-agents/tree/master/docs

https://github.com/Unity-Technologies/ml-agents/blob/master/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md

Guess you like

Origin www.cnblogs.com/gentlesunshine/p/12507677.html