Unity machine learning ML-Agents first example

In the previous section, we installed the development environment of machine learning mlagents. In this section, we create the first example to understand what machine learning is.
Our example is very simple, that is, let the robot move autonomously to the target position and cannot move outside the floor range.

First, let’s briefly understand the following machine learning process.

machine learning process

MLAgents machine reinforcement learning process (reinforcement learning)
observation - monitoring, observation
decision - decision-making
action - action
reward - reward and punishment
The translation of these four steps may not be very accurate. It is probably observation first, then decision-making, then action, and finally reward and punishment. .

Script start
We first create a new script, I created MoveToTarget.cs here

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;

public class MoveToTarget : Agent
{
    
}

All machine learning classes must inherit the Agent base class.

Observation, Action (monitoring and action)
We first override the CollectObservations function, which is responsible for observing or monitoring data. In this example, let the agent observe the position of the target. .
Then override the OnActionReceived function and act through the received buffer data. Here we should note that the machine learning algorithm only applies to numbers, which means that the machine does not know what an object is. ), and I don’t know what left and right movement is. It is only responsible for processing numbers, such as float and int type data.

Next, we create an agent (agent - box, blue), target (target - sphere, yellow), and floor plane (box, gray) in Unity. As shown below:

Understand important parameters
Add our script MoveToTarget on the agent. At this time, a behavioral parameter script of BehaviorParameters will be automatically added.

discrete meaning

Let’s first understand the meaning of discreteness:
If we discretely input 1, branch 0 inputs 5.
Insert image description here
Override Action reception in the code. Let's look at the log. Because there is only one discrete branch, it is DiscreteActions[0]

public class MoveToTarget : Agent
{
    public override void OnActionReceived(ActionBuffers actions)
    {
        Debug.Log(actions.DiscreteActions[0]);
    }
}

Because we override the Action, we also need a request decision. We add a DecisionRequester (decision request) on the agent object, and the parameter DecisonPeriod is the period of the request.
Insert image description here
Next we can execute it and see what is output.

Debugging and viewing output

First open cmd and let us enter the vent virtual environment. We talked about it in the previous article, which is the MLApp\venv\Scripts\activate.bat batch file. Make sure it looks like this.
Insert image description here
Then we enter

mlagents-learn
Then a beautiful Unity Logo will appear and tell us that Unity can start running. As shown below:
Insert image description here
After Unity is running, we see that the output of the cmd window and Unity has started.

Insert image description here

We can see the discrete output because 5 is set and there are only 0-4 here.

continuous type

Next we test the continuous type
In Unity we change the SpaceType to continuous. And set Size to 1.
Insert image description here
The script also needs to be changed to receive continuous type

public class MoveToTarget : Agent
{
    public override void OnActionReceived(ActionBuffers actions)
    {
        Debug.Log(actions.ContinuousActions[0]);
    }
}

We continue to run and enter mlagents-learn in cmd
At this time we will get an error:
Insert image description here

It's because we retried to use the same default ID to train again. Here we either use mlagents-learn --force to force override learning, or change the ID, mlagents-learn --run-id=test2.

Then after the virtual environment is opened, we run Unity.
After running, the log we get is as follows:
Insert image description here

We have seen that continuous floating point numbers are from -1 to 1. At this point we understand the difference between discrete and continuous.

Monitoring and action code
Below we will continue to improve the script and collect monitoring information.
We need to override the CollectObservations(VectorSensor sensor) function. You can understand this function as AI, which is what data artificial intelligence needs to solve your problem. In this example, we want the box (agent) object to move to the position of the ball (target) object. Let’s think about the following, what data do we need to know?

If you want the agent to move to the target, do you need to know where the agent is and where the target is, so you need to know the positions of the two targets, so we pass the coordinates into the monitoring through the sensor. So in the code we pass the coordinate positions of the two objects to the observer.

	[SerializeField] Transform targetTfm;	
    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.position);
        sensor.AddObservation(targetTfm.transform.position);
    }

In actions, actions are the results of (decision), and we act based on decision data.

	//行动
	float moveSpd = 10f;
    public override void OnActionReceived(ActionBuffers actions)
    {
        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];
        transform.position += new Vector3(moveX, 0f, moveZ) * Time.deltaTime * moveSpd;
    }

Because the information we give to the observation function is two coordinates, which is equivalent to 6 float types, the SpaceSize of Vector Observations needs to be filled in 6. For actions, we only need to move the agent's x and z axes, so fill the Vector Action's SpaceSize with 2.
Insert image description here

How to make machine learn
You can think of machine learning as training a puppy. If the puppy completes the specified action, you can give it a bone. Otherwise, give punishment.
In this example, we surround the floor with 4 walls. We judge that if it moves to the wall, points will be deducted, and if it moves to the target, points will be added. We surround the Plane with 4 walls in Unity. We add a wall and check the IsTrigger of the Collider of the wall and target. We perform a trigger process.

Add reward module script

	private void OnTriggerEnter(Collider other)
    {
        Debug.Log("OnTriggerEnter："+other.name);
        if (other.name.Equals("target"))
        { 
            AddReward(+1f); //奖励
            EndEpisode();   //结束经历
            plane.material.color = Color.green;
            Debug.Log("奖励");
        }
        else if (other.name.Equals("wall"))
        {
            AddReward(-1f); //惩罚
            EndEpisode();   //结束
            plane.material.color = Color.grey;
            Debug.Log("惩罚");
        }
    }

In the above code, if the target is encountered, we call AddReward +1, end this AI section, and change the color of the plane to green. Otherwise, if the wall is encountered, then AddReward -1 is used and the plane becomes gray.

End of round processing
Whenever a reward or punishment is received, EndEpisode will be called. When this paragraph ends and we want it to continue training, we need to reset the agent object again, and we need to override the function OnEpisodeBegin.

    //当一段经历开始
    public override void OnEpisodeBegin()
    {
        transform.position = Vector3.zero;
        Debug.Log("经历开始");
    }

After running the mlagent virtual machine, we run unity. We can see that the machine has begun to learn how to run to the target position. It was very difficult at the beginning and often hit the wall. Slowly, the probability of hitting the target will increase. The bigger.
The effect is as follows:
Please add image description

During the running process, the agent object may be very stupid at first and basically spin in circles. After a long period of running, the target will be found faster and faster.

several parameters

There are a few points to note here

MaxStep (maximum step)

Insert image description here

MaxStep here is the maximum number of steps in an episode (Episode). If we don’t want the number of steps to be tried every time it runs to be too long, we can give a value. You can try numbers like 1000 or 10000. After reaching this, OnEpisodeBegin will be re-entered. The purpose of the setting is that if the agent has been running without encountering the target and just avoided the wall, then our training purpose may not be achieved.

Heuristic (enlightenment)
My understanding of this is to test whether your running logic is correct through your control. Belongs to a debugging function.

//启发
    public override void Heuristic(in ActionBuffers actionsOut)
    {
        ActionSegment<float> continuousActions = actionsOut.ContinuousActions;
        continuousActions[0] = Input.GetAxisRaw("Horizontal");
        continuousActions[1] = Input.GetAxisRaw("Vertical");
    }

We can modify the BehaviorType of the agent's BehaviorParameters to Heuristic (enlightenment), and then run Unity to control the agent. The simulated machine will be reset if it encounters target and wall for debugging.

Methods to accelerate machine learning
There is also a method to accelerate machine learning, which is to copy the current training scenario many times and let them run at the same time to achieve the purpose of accelerating machine training. , we can slightly modify the scene and script. As follows:

We need to modify the script and change the original position to localPosition. Because it is easy to reuse our code and only need to copy the ground in a few pictures.

The full code is as follows:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;

public class MoveToTarget : Agent
{
    [SerializeField] Transform targetTfm;
    [SerializeField] Renderer plane;
    float moveSpd = 30f;
    //通过传感器把坐标传入监视
    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(targetTfm.transform.localPosition);
    }

    //行动接收
    public override void OnActionReceived(ActionBuffers actions)
    {
        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];
        transform.localPosition += new Vector3(moveX, 0f, moveZ) * Time.deltaTime * moveSpd;
    }

    //当一段经历开始
    public override void OnEpisodeBegin()
    {
        transform.localPosition = Vector3.zero;
        Debug.Log("经历开始");
    }

    //启发
    public override void Heuristic(in ActionBuffers actionsOut)
    {
        ActionSegment<float> continuousActions = actionsOut.ContinuousActions;
        continuousActions[0] = Input.GetAxisRaw("Horizontal") * Time.deltaTime * moveSpd;
        continuousActions[1] = Input.GetAxisRaw("Vertical") * Time.deltaTime * moveSpd;
    }

    private void OnTriggerEnter(Collider other)
    {
        //Debug.Log("OnTriggerEnter："+other.name);
        if (other.name.Equals("target"))
        { 
            AddReward(+1f); //奖励
            EndEpisode();   //结束经历
            plane.material.color = Color.green;
            //Debug.Log("奖励");
        }
        else if (other.name.Equals("wall"))
        {
            AddReward(-1f); //惩罚
            EndEpisode();   //结束
            plane.material.color = Color.grey;
            //Debug.Log("惩罚");
        }
    }
   
}

After we completed the modification, we ran the mlagents environment and Unity. It was obvious that the batch speed was faster. As shown below:
Please add image description
As you can see from the GIF, the frequency of lighting green is getting faster and faster. On my machine, only the green one lights up in the end.

Insert image description here

After the machine operation is completed, the MovetoTart1.onnx file will be generated.

Then in
H:\UnityProject\MLApp\venv\Scripts\results\ you can see all our mlagents test data, including the trained MoveToTar.onnx we need file, we copy it to Unity/Assets. This onnx is the neural network of the brain trained by AI.

We drag this file into the Model.

Select InferenceOnly for BehaviorType and click Unity to run, so that the agent has the AI to find the target.

Environment Settings
The machine learning environment can be customized and configured. You can refer to here.
Create a movetarget.yaml file and put it in the Unity/config folder (create one)

behaviors:
  MoveToTar1:
    trainer_type: ppo
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
      beta_schedule: constant
      epsilon_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000

Through the following instructions, it is executed according to the customized parameters. We will discuss the specific meaning of the parameters later when we have the opportunity.

Use this configuration file to enable machine learning and enter the following command:

mlagents-learn config/movetarget.yaml --run-id=test5

Insert image description here

Further optimize the machine

Let's continue with the previous test. When running, changing the position of the target, we find that the agent may not be able to find the target. Can we think about why? Such as the following animation:

Please add image description

Yes, because the position of our target has not changed during machine learning, the AI may think that the target is a dead object, so we can modify the script so that the target changes position after each operation is completed. If there is a change, the machine will Will become smarter.

We modify the code as follows:

    public override void OnEpisodeBegin()
    {
        transform.localPosition = new Vector3(Random.Range(-9f, 0f), 0f, Random.Range(-4f, 4f));
        targetTfm.localPosition = new Vector3(Random.Range(1f, 9f), 0f, Random.Range(-4f, 4f));
        //Debug.Log("经历开始");
    }

We randomly set the positions of target and agent every time, but they will not overlap. Then move on to machine learning.

We enter the following command to perform test8 calculations based on the last run of test5.

mlagents-learn config/movetarget.yaml --initialize-from=test5 --run-id=test8

After the operation, we overwrite the onnx file and continue running. The results are as follows:
Please add image description

Web monitoring

To monitor statistics of agent performance during training, use TensorBoard directives.

You can open another cmd, enter the virtual environment (venv), and enter the following command:

tensorboard --logdir results

Insert image description here
Then just enter the address into the browser

http://localhost:6006/

Based on the graph, you can see various curves to modify your machine training.

That’s it for this chapter. There are also many official examples of machine learning. If you are interested, you can learn by yourself. If we have the opportunity, we will further expand on the next article, or make another interesting demo.

Source code of this chapter

GitHub - thinbug/MLApp

Citation:
https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Create-New.md

Unity Machine Learning 2 ML-Agents first example_ml-agents puppy-CSDN blog

Training an ML-Agents project in Unity - solving torch and mlagents configuration issues_mlagents training_LLLQQQismmmmme's blog-CSDN blog

A preliminary study on Unity docking with ML-Agents_Avonis' Blog-CSDN Blog

GitHub - thinbug/MLApp

Unity's ml-agents (1): Environment configuration and preliminary use_mlagents-CSDN blog