Unity-ML-Agents--Learning-Environment-Design-Agents.md-code interpretation (1)

Code source: https://github.com/Unity-Technologies/ml-agents/blob/release_19/docs/Learning-Environment-Design-Agents.md#decisions

1.Agent.CollectObservations()

1.1 Code summary

    public GameObject ball;

    public override void CollectObservations(VectorSensor sensor)
    {
        // 添加观测值
        // 立方体的方向(2个浮点数)
        sensor.AddObservation(gameObject.transform.rotation.z);
        sensor.AddObservation(gameObject.transform.rotation.x);
        // 小球相对于立方体的位置(3个浮点数)
        sensor.AddObservation(ball.transform.position - gameObject.transform.position);
        // 小球的速度(3个浮点数)
        sensor.AddObservation(m_BallRb.velocity);
    }

This code is an important function of an Agent in Unity ML-Agents CollectObservations, which is used to collect observations of the current state and pass these observations to the neural network for processing and decision-making .

The parameter VectorSensor sensoris a vector sensor that can be used to collect observations . In this function, we need to collect different observations according to our game environment and task design, and add these observations to the sensor.

Specifically, what this code does is collect the following observations:

① Cube's Z-axis rotation angle (a floating point number).

② Cube's X-axis rotation angle (a floating point number).

③The position vector of the Ball relative to the Cube (three floating point numbers).

④Ball's speed vector (three floating point numbers).

These observations will be composed into a vector and passed to the neural network for processing and decision making. During training, we adjust the weights and biases of the neural network to optimize its decision-making ability so that it can achieve higher scores in the game.

1.2 Code decomposition

1.2.1 public GameObject ball

public GameObject ball;

This line of code declares a variable ball of type GameObject, which is used to refer to the ball object in the scene. In the following code, you can use this variable to get the information and state of the ball object.

Q: GameObject type?

In Unity, GameObject is one of the most basic object types, which represents the entity in the scene. Each GameObject can have one or more components (Component), such as Transform (controlling the position, rotation and scaling of the GameObject), MeshRenderer (the grid used to render the object), etc. You can use code to get GameObjects in the scene so that you can manipulate and modify them.

1.2.2 public override void CollectObservations(VectorSensor sensor)

public override void CollectObservations(VectorSensor sensor)

This code is an important function of an Agent in Unity ML-Agents CollectObservations. It is used to collect observations of the current state and pass these observations to the neural network for processing and decision-making.

The parameter VectorSensor sensoris a vector sensor that can be used to collect observations. In this function, we need to collect different observations according to our game environment and task design, and add these observations to the sensor.

VectorSensorIs a class in the ML-Agents SDK that is used to send observation data to agents. Instead sensor, it is an objectCollectObservations passed in as a parameter in the method , which represents the observation data of the current agent. In the method, we will call the method to add the observation data of the current agent to . These observations will eventually be sent to the agent for training the agent's neural network model.VectorSensorCollectObservationssensorAddObservationVectorSensor

1.2.3 sensor.AddObservation(gameObject.transform.rotation.z)

sensor.AddObservation(gameObject.transform.rotation.z);
sensor.AddObservation(gameObject.transform.rotation.x);

These two lines of code add the object's orientation information to the sensor . Refers to the game object where the current script is located , and refers to the transform component of the game object . Here, we take the z and x rotation values ​​of the game object and add these values ​​to the vector sensor to be used by the neural network model. This orientation information helps the model understand where objects are oriented in the world.gameObjecttransform

sensor.AddObservationIt is a method provided by ML-Agents SDK, which is used to send observation information to Agent. In this example, the parameter in CollectObservationsthe function sensoris an VectorSensorobject of type , which provides a series of methods to add different types of observation information. AddObservationThe method can VectorSensoradd a floating-point value to , which is used to represent the current observation information . In this example, we VectorSensoradded 2 floating-point values ​​to , namely gameObject.transform.rotation.zand gameObject.transform.rotation.x. These two values ​​are used to indicate the current orientation of the agent.

1.2.4 sensor.AddObservation(ball.transform.position - gameObject.transform.position)

 sensor.AddObservation(ball.transform.position - gameObject.transform.position);

This line of code adds the position of the ball relative to the object to the observation . Specifically, ball.transform.positionrepresents the world space coordinates of the ball, and gameObject.transform.positionrepresents the world space coordinates of the object where the current script is located. Subtract the two to get the position of the ball relative to the object. Adding this vector to the observation allows the agent to perceive the position of the ball relative to itself.

These three floating-point numbers represent the position of the ball relative to the cube, that is, relative position information. This information is collected from the environment at each time step and is used to train the agent to learn how to control the cube to keep the ball in the correct position. In the neural network, these relative position information are passed to the model as input features to help the model learn decision-making actions.

问:gameObject.transform.position ?

gameObjectis a special variable that represents the GameObject itself on which the script is mounted. gameObject.transform.positionRefers to the position of the current game object, and returns a Vector3 type coordinate value, including x, y, and z components. This value represents the position of the game object in 3D space.

Q: Why three floating point numbers?

gameObject.transform.positionRepresents the position of the game object , which is a three-dimensional vector containing three floating-point numbers representing the position in the x, y and z directions respectively. Instead ball.transform.position - gameObject.transform.position, it represents the relative position of the ball relative to the cube, which is also a three-dimensional vector, so it also needs three floating-point numbers to represent.

1.2.5 sensor.AddObservation(m_BallRb.velocity)

sensor.AddObservation(m_BallRb.velocity)

This line of code adds the velocity of the ball to the sensor as an observation. The velocity of the ball is a 3D vector, so three floats are added as observations. Specifically, it m_BallRbadds the ball rigidbody component's velocity vector to the sensor so the agent can sense the ball's current velocity.

Q: Why three floating point numbers?

m_BallRb.velocityIndicates the velocity of the sphere, which is a three-dimensional vector, respectively representing the velocity of the sphere in the x, y, and z directions. Therefore, three floating-point numbers are required to describe this vector. Here, the three floating-point numbers correspond to the three input nodes in the neural network.

2. Observable Fields and Properties

2.1 Code summary

using Unity.MLAgents.Sensors.Reflection;

public class Ball3DHardAgent : Agent {

    [Observable(numStackedObservations: 9)]
    Vector3 PositionDelta
    {
        get
        {
            return ball.transform.position - gameObject.transform.position;
        }
    }
}

This code uses the ReflectionSensor in the ML-Agents SDK , which can obtain the value of a specific property from the code through reflection and pass it to the neural network as an observation .

Specifically, the code defines a Ball3DHardAgent class and inherits the Agent class, indicating that this is a trainable agent. This agent has a public variable ball of type GameObject, which represents the ball in the scene. At the same time, this class also defines a property called PositionDelta , which returns the difference between the position of the ball and the position of the agent. This attribute uses the Observable feature , indicating that it can be observed by ReflectionSensor and stacked in multiple time steps (numStackedObservations: 9), so that the neural network can obtain more historical information for learning.

2.2 Code decomposition

2.2.1 using Unity.MLAgents.Sensors.Reflection

using Unity.MLAgents.Sensors.Reflection;

This line of code imports the Reflection namespace, which contains the ObservableAttribute feature class.

2.2.2 public class Ball3DHardAgent : Agent

public class Ball3DHardAgent : Agent

This code defines a Ball3DHardAgentclass named that inherits Agentthe class, indicating that this is an agent using the Unity ML-Agents framework.

An agent refers to an entity used to solve problems in machine learning. It can perceive the environment and maximize a certain objective function through learning strategies to achieve the optimal solution to the problem. In the Unity ML-Agents framework, an agent can be any game object (GameObject) in the scene.

This class contains a PositionDeltaproperty named that uses the attribute to add the input of the neural network to the observation data . Specifically, an attribute of type Vector3 that calculates the ball's position offset relative to the agent and adds it to the observations as a vector of length 9 (numStackedObservations: 9). ObservablePositionDelta

When using the Reflection Sensor, Observablethe observation data (observation) used by the feature to mark the agent should be added to the buffer of the Reflection Sensor . This feature also allows specifying where observations are placed in the buffer, and how many observations are stacked.

2.2.3 [Observable(numStackedObservations: 9)]

[Observable(numStackedObservations: 9)]

This is an attribute used to identify a field or attribute that needs to be included in the observation, and the parameter numStackedObservationsindicates how many steps to record the observation. In this example, PositionDeltaattributes will be included in observations and recorded in 9 consecutive steps.

2.2.4 Vector3 PositionDelta

Vector3 PositionDelta

Vector3 PositionDeltais a public getproperty that returns a 3D vector representing the position of the ball relative to the agent. It is marked with [Observable(numStackedObservations: 9)], indicating that the property can be observed, and the number of history records to be observed is 9.

Q: get?

getIs a keyword in C# , used to get the property value . In this context, getis PositionDeltaan accessor to the property , which means to get the position difference vector between the ball and the agent. When we use the attribute in other partsPositionDelta , it will return the position difference vector between the ball and the agent as defined in this method.

问:ObservableAttribute?

ObservableAttributeis a feature used to mark an attribute, indicating that this attribute should be added to the observation . In ML-Agents SDK, observation is a way to provide environmental information to the agent, which can be used by the trained neural network to make decisions. numStackedObservationsThe argument is used to specify how many times this attribute should be added to the observations to produce a stack of histories, also known as "stacked observations". In this example, PositionDeltathe property is marked as observable and specifies that it should be added 9 times.

3.One-hot encoding categorical information

3.1 Code summary

enum ItemType { Sword, Shield, Bow, LastItem }

public override void CollectObservations(VectorSensor sensor)
{
    // 遍历所有物品类型
    for (int ci = 0; ci < (int)ItemType.LastItem; ci++)
    {
        // 如果当前持有物品是该类型,就设置为 1,否则为 0
        sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f);
    }
}

This code defines the item type using an enum , which is then used in CollectObservationsa functionVectorSensor to collect observation data. Here, the type of the currently held item is converted to an integer (0, 1, 2, etc.) and added to the observation vector by looping through all item types and checking the currently held item type. If the currently held item type is the same as the currently traversed item type , then set the value to 1, otherwise set the value to 0.

3.2 Code decomposition

3.2.1 enum ItemType { Sword, Shield, Bow, LastItem }

enum ItemType { Sword, Shield, Bow, LastItem }

An enumeration type ItemType is defined here , which contains four members: Sword, Shield, Bow and LastItem. Among them, LastItem can be regarded as a mark, which is used to mark the number of members of the ItemType enumeration type .

Enumerated types can be used to define some constants with specific meanings. In this example, ItemType can represent different item types in the game . Sword, Shield, and Bow represent the three different weapon types, while LastItem is used to help count the number of members of the enumeration type.

It should be noted that the members of the enumeration type will be assigned an integer value by default, starting from 0 and incrementing. In this example, Sword has a value of 0, Shield has a value of 1, Bow has a value of 2, and LastItem has a value of 3.

3.2.2 public override void CollectObservations(VectorSensor sensor)

public override void CollectObservations(VectorSensor sensor)

This function is used to collect the observation information of the Agent, and the VectorSensorparameter is a vector used to store the observation information of the Agent.

In this example, the function adds an observation for each enumeration value. The loop iterates through each value in the enumeration, and for each value, if the current Agent owns the item, add 1.0 to the observation vector, otherwise add 0.0. In this way, the condition of the items owned by the Agent can be passed to the neural network as observation information for training.

3.2.3 for (int ci = 0; ci < (int)ItemType.LastItem; ci++)

for (int ci = 0; ci < (int)ItemType.LastItem; ci++)
    {
        // 如果当前持有物品是该类型,就设置为 1,否则为 0
        sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f);
    }

This code is in CollectObservationsthe function to add some data to the agent's observation space. Specifically, it traverses all enumeration values ​​( not included ) in an enumeration typeItemType , and according to the type of item currently held, the observation value corresponding to this type is set to 1, and the observation value corresponding to the other types is set to 0. These observations will be used in the agent's perception process to decide its next action.LastItem

sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f) The function of this line of code is to convert the type of the currently held item into an integer and compare it with the loop variableci . ciAdd an observation value of 1 to the observation vector if the type of the currently held item is equal to , otherwise add an observation value of 0. In this way, the information of the currently held item can be transmitted to the neural network of the agent, so that it can perceive the type of item currently held by the agent.

The second traversal method:

// 定义枚举类型 ItemType,表示拥有的物品类型
enum ItemType { Sword, Shield, Bow, LastItem }

// 定义 NUM_ITEM_TYPES 常量表示物品类型的数量,这里为枚举类型中所有值的数量
const int NUM_ITEM_TYPES = (int)ItemType.LastItem;

public override void CollectObservations(VectorSensor sensor)
{
    // 使用 AddOneHotObservation 方法将一个整型值编码为 one-hot 向量
    // 第一个参数是要编码的整型值,第二个参数是 one-hot 向量的维度(即可能的取值数量)
    sensor.AddOneHotObservation((int)currentItem, NUM_ITEM_TYPES);
}

This code implements the encoding of a variable of enumeration type into a one-hot vector, which is convenient for neural network processing.

const int NUM_ITEM_TYPES = (int)ItemType.LastItem;

NUM_ITEM_TYPESis a constant that is set to LastItemthe integer value of the last element (ie) of the ItemType enumeration. Since ItemTypethe elements in the enumeration are consecutive integer values, LastItemthe integer value of is equal to the number of all elements plus one. By NUM_ITEM_TYPESsetting to (int)ItemType.LastItem, we guarantee that its value will always be equal to the number of elements in the ItemType enumeration. This will be used later in the code.

sensor.AddOneHotObservation((int)currentItem, NUM_ITEM_TYPES);

This code uses the method , which encodes the observation at the specified index into a one-hot vector . A one-hot vector is a vector in which only one element is 1 and the rest are 0. This type of encoding is very common in classification problems, such as encoding item types (swords, shields, bows, etc.) into one-hot vectors.AddOneHotObservation

Here, we pass the index of the currently held item type to AddOneHotObservationthe method, and also need to pass the number of possible values ​​of the feature, that is, ItemTypethe number of elements in the enumeration type ( LastItemobtained by casting to an integer). This will give the neural network a set of observations of 0s and 1s, indicating which type of item is currently being held.

The third way:

enum ItemType { Sword, Shield, Bow }

public class HeroAgent : Agent
{
    [Observable]
    ItemType m_CurrentItem;
}

This code defines an enum ItemTyperepresenting different kinds of game items and a HeroAgentclass that inherits from Agentthe class. HeroAgentThere is a variable in the class m_CurrentItem, which is used to indicate the type of game item held by the current hero. This variable is marked with [Observable], indicating that it can be accessed by ML-Agents and collected as state information.

[Observable]is an attribute in Unity ML-Agents that can be used to mark variables as attributes that need to be observed. During training, the agent observes these properties and passes them as states to the neural network for learning. These properties can be any type of primitive data type, such as integers, floats, booleans, or enumerated types. Labeling [Observable]attributes with attributes ensures that they are observed and recorded correctly during training.

 ItemType m_CurrentItem;

This line of code declares a public field m_CurrentItemof type ItemType. ItemTypeIt is an enumeration type, which contains three enumeration values Sword, Shieldand Bow, indicating the equipment currently held by the hero.

[Observable]The attribute tells ML-Agents that the field needs to be passed as an observation to the agent's neural network. Therefore, in the agent's CollectObservations()function, you can pass the integer value of as an observation to the neural network via VectorSensor.AddObservation()the method .m_CurrentItem

Guess you like

Origin blog.csdn.net/aaaccc444/article/details/130331413