2 Representing Position and Orientation
A fundamental requirement in robotics and computer vision is to represent
the position and orientation of objects in an environment. Such objects include robots, cameras, workpieces, obstacles and paths.
Instead of describing the individual points we describe the position and orientation of the object by the position and orientation of its coordinate frame.
The position and orientation of a coordinate frame is known as its pose and is shown graphically as a set of coordinate axes. The relative pose of a frame with respect to a reference coordinate frame is denoted by the symbol ξ.
The point P in Fig. 2.2 can be described with respect to either coordinate frame. Formally we express this as
An important characteristic of relative poses is that they can be composed or compounded. Consider the case shown in Fig. 2.3. If one frame can be described in terms of another by a relative pose then they can be applied sequentially
So what is ξ? It can be any mathematical object that supports the algebra described above and is suited to the problem at hand. It will depend on whether we are considering a 2- or 3-dimensional problem. Some of the objects that we will discuss in the rest of this chapter include vectors as well as more exotic mathematical objects such as homogeneous transformations, orthonormal rotation matrices and quaternions.
To recap:
- A point is described by a coordinate vector that represents its displacement from a
reference coordinate system;- A set of points that represent a rigid object can be described by a single coordinate
frame, and its constituent points are described by displacements from that coordinate
frame;- The position and orientation of an object’s coordinate frame is referred to as its
pose;- A relative pose describes the pose of one coordinate frame with respect to another
and is denoted by an algebraic variable ξ;- A coordinate vector describing a point can be represented with respect to a different
coordinate frame by applying the relative pose to the vector using the · operator;- We can perform algebraic manipulation of expressions written in terms of relative
poses.
2.1 lRepresenting Pose in 2-Dimensions
A 2-dimensional world, or plane, is familiar to us from high-school Euclidean geometry. We use a Cartesian coordinate system or coordinate frame with orthogonal axes denoted x and y and typically drawn with the x-axis horizontal and the y-axis vertical. The point of intersection is called the origin. Unit-vectors parallel to the axes are denoted ’ and (. A point is represented by its x- and y-coordinates (x, y) or as a bound vector
Figure 2.6 shows a coordinate frame {B} that we wish to describe with respect to the reference frame {A}. We can see clearly that the origin of {B} has been displaced by the vector t = (x, y) and then rotated counter-clockwise by an angle θ. A concrete representation of pose is therefore the 3-vector AξB∼ (x, y, θ), and we use the symbol ∼ to denote that the two representations are equivalent. Unfortunately this representation is not convenient for compounding since
is a complex trigonometric function of both poses. Instead we will use a different way of representing rotation.
The matrix has a very specific structure and belongs to the special Euclidean group of dimension 2 or
.
2.2 Representing Pose in 3-Dimensions
A point P is represented by its x-, y- and z-coordinates (x, y, z) or as a bound vector
2.2.1 Representing Orientation in 3-Dimensions
Any two independent orthonormal coordinate frames can be related by a sequence of rotations (not more than three) about coordinate axes, where no two successive rotations may be about the same axis. Euler’s rotation theorem (Kuipers 1999).
The implication for the pose algebra we have used in this chapter is that the ⊕ operator is not commutative – the order in which rotations are applied is very important.
Mathematicians have developed many ways to represent rotation and we will discuss several of them in the remainder of this section: orthonormal rotation matrices, Euler and Cardan angles, rotation axis and angle, and unit quaternions.
2.2.1.1 Orthonormal Rotation Matrix
The matrix R belongs to the special orthogonal group of dimension 3 or . It has the properties of an orthonormal matrix that were mentioned on page 16 such as and .
The orthonormal rotation matrices for rotation of θ about the x-, y- and z-axes are
The orthonormal matrix has nine elements but they are not independent. The columns have unit magnitude which provides three constraints. The columns are orthogonal to each other which provides another three constraints. Nine elements and six constraints is effectively three independent values.
2.2.1.2 Three-Angle Representations
Euler’s rotation theorem requires successive rotation about three axes such that no two successive rotations are about the same axis. There are two classes of rotation sequence: Eulerian and Cardanian, named after Euler and Cardano respectively.
The Eulerian type involves repetition, but not successive, of rotations about one particular axis: XYX, XZX, YXY, YZY, ZXZ, or ZYZ. The Cardanian type is characterized by rotations about all three axes: XYZ, XZY, YZX, YXZ, ZXY, or ZYX. In common usage all these sequences are called Euler angles and there are a total of twelve to choose from.
The ZYZ sequence
is commonly used in aeronautics and mechanical dynamics, and is used in the Toolbox.
The Euler angles are the 3-vector
.
The two different sets of Euler angles correspond to the one rotation matrix. The mapping from rotation matrix to Euler angles is not unique and always returns a positive angle for θ.
Another widely used convention is the roll-pitch-yaw angle sequence angle
which are intuitive when describing the attitude of vehicles such as ships, aircraft and cars. Roll, pitch and yaw (also called bank, attitude and heading) refer to rotations about the x-,y-,z-axes, respectively. This XYZ angle sequence, technically Cardan angles, are also known as Tait-Bryan angles or nautical angles. For aerospace and ground vehicles the x-axis is commonly defined in the forward direction, z-axis downward and the y-axis to the right-hand side.
The roll-pitch-yaw sequence allows all angles to have arbitrary sign and it has a singularity when which is fortunately outside the range of feasible attitudes for most vehicles.
2.2.1.3 Singularities and Gimbal Lock
A fundamental problem with the three-angle representations just described is singularity. This occurs when the rotational axis of the middle term in the sequence becomes parallel to the rotation axis of the first or third term. This is the same problem as gimbal lock, a term made famous in the movie Apollo 13.
In mathematical, rather than mechanical, terms this problem can be seen using the definition of the Lunar module’s coordinate system where the rotation of the spacecraft’s body-fixed frame {B} with respect to the stable platform frame {S} is
For the case when
we can apply the identity
leading to
which cannot represent rotation about the y-axis.
The loss of a degree of freedom means that mathematically we cannot invert the transformation, we can only establish a linear relationship between two of the angles. In such a case the best we can do is determine the sum of the pitch and yaw angles. We observed a similar phenomena with the Euler angle singularity earlier.
All three-angle representations of attitude, whether Eulerian or Cardanian, suffer this problem of gimbal lock when two consecutive axes become aligned. For ZYZEuler angles this occurs when and for roll-pitch-yaw angles when pitch . The best that can be hoped for is that the singularity occurs for an attitude which does not occur during normal operation of the vehicle – it requires judicious choice of angle sequence and coordinate system. Singularities are an unfortunate consequence of using a minimal representation.
To eliminate this problem we need to adopt different representations of orientation. Many in the Apollo LM team would have preferred a four gimbal system and the clue to success, as we shall see shortly in Sect. 2.2.1.6, is to introduce a fourth parameter.
2.2.1.4 Two Vector Representation
For arm-type robots it is useful to consider a coordinate frame {E} attached to the end-effector as shown in Fig. 2.14. By convention the axis of the tool is associated with the z-axis and is called the approach vector and denoted
. For some applications it is more convenient to specify the approach vector than to specify Euler or roll-pitch-yaw angles.
However specifying the direction of the z-axis is insufficient to describe the coordinate frame – we also need to specify the direction of the x- and y-axes. An orthogonal vector that provides orientation, perhaps between the two fingers of the robot’s gripper is called the orientation vector,
. These two unit vectors are sufficient to completely define the rotation matrix
Any two non-parallel vectors are sufficient to define a coordinate frame. For a camera we might use the optical axis, by convention the z-axis, and the left side of the camera which is by convention the x-axis. For a mobile robot we might use the gravitational acceleration vector (measured with accelerometers) which is by convention the z-axis and the heading direction (measured with an electronic compass) which is by convention the x-axis.
2.2.1.5 Rotation about an Arbitrary Vector
Two coordinate frames of arbitrary orientation are related by a single rotation about some axis in space. This information is encoded in the eigenvalues and eigenvectors of R.
An orthonormal rotation matrix will always have one real eigenvalue at and a complex pair where θ is the rotation angle. For the case then which implies that the corresponding eigenvector v is unchanged by the rotation. There is only one such vector and that is the one about which the rotation occurs.
The inverse, converting from angle and vector to a rotation matrix, is achieved using Rodrigues’ rotation formula
Alternatively we can multiply the unit vector by the angle to give another 3-parameter representation vθ. While these forms are minimal and efficient in terms of data storage they are analytically problematic. Many variants have been proposed including
and
but all are ill-defined for
.
2.2.1.6 Unit Quaternion
Quaternions have been controversial since they were discovered by W. R. Hamilton over 150 years ago but they have great utility for roboticists. The quaternion is an extension of the complex number – a hyper-complex number – and is written as a scalar plus a vector
where
and the orthogonal complex numbers i,j and k are defined such that
We will denote a quaternion as
One early objection to quaternions was that multiplication was not commutative but as we have seen above this is exactly the case for rotations. Despite the initial controversy quaternions are elegant, powerful and computationally straightforward and widely used for robotics, computer vision, computer graphics and aerospace inertial navigation applications.
To represent rotations we use unit-quaternions. These are quaternions of unit magnitude, that is, those for which or .
The unit-quaternion has the special property that it can be considered as a rotation of
about the unit vector
which are related to the quaternion components by
and is similar to the angle-axis representation of Sect. 2.2.1.5.
If we write the quaternion as a 4-vector
then multiplication can be expressed as a matrix-vector product where
Compounding two orthonormal rotation matrices requires 27 multiplications and 18 additions. The quaternion form requires 16 multiplications and 12 additions. This saving can be particularly important for embedded systems.
2.2.2 Combining Translation and Orientation
We have discussed several different representations of orientation, and we need to combine this with translation, to create a tangible representation of relative pose. The two most practical representations are: the quaternion vector pair and the 4 × 4 homogeneous transformation matrix.
Alternatively we can use a homogeneous transformation matrix to describe rotation and translation. The derivation is similar to the 2D case of Eq. 2.10 but extended to account for the z-dimension
The Cartesian translation vector between the origin of the coordinates frames is t and the change in orientation is represented by a 3 × 3 orthonormal submatrix R. The vectors are expressed in homogenous form and we write
and
is a 4 × 4 homogeneous transformation. The matrix has a very specific structure and belongs to the special Euclidean group of dimension 3 or
.
The 4 × 4 homogeneous transformation is very commonly used in robotics and computer vision.
2.3 Wrapping Up
In this chapter we learned how to represent points and poses in 2- and 3-dimensional worlds. Points are represented by coordinate vectors relative to a coordinate frame. A set of points that belong to a rigid object can be described by a coordinate frame, and its constituent points are described by displacements from the object’s coordinate frame. The position and orientation of any coordinate frame can be described relative to another coordinate frame by its relative pose ξ. Relative poses can be applied sequentially (composed or compounded), and we have shown how relative poses can be manipulated algebraically. An important algebraic rule is that composition is non-commutative – the order in which relative poses are applied is important.
Further Reading
The treatment in this chapter is a hybrid mathematical and graphical approach that covers the 2D and 3D cases by means of abstract representations and operators which are later made tangible. The standard robotics textbooks such as Spong et al. (2006), Craig (2004), Siciliano et al. (2008) and Paul (1981) all introduce homogeneous transformation matrices for the 3-dimensional case but differ in their approach. These books also provide good discussion of the other representations such as angle-vector and 3-angle representations. Spong et al. (2006, Sec 2.5.1) have a good discussion of
singularities. Siegwart et al. (2011) explicitly cover the 2D case in the context of mobile robot navigation.
Hamilton and his supporters, including Peter Tait, were vigourous in defending Hamilton’s precedence in inventing quaternions, and for muddying the water with respect to vectors which were then beginning to be understood and used. Rodrigues developed the key idea in 1840 and Gauss discovered it in 1819 but, as usual, did not publish it. Quaternions had a tempestuous beginning. The paper by Altmann (1989) is an interesting description on this tussle of ideas, and quaternions have even been woven into fiction (Pynchon 2006).
Quaternions are discussed briefly in Siciliano et al. (2008). The book by Kuipers (1999) is a very readable and comprehensive introduction to quaternions. Quaternion interpolation is widely used in computer graphics and animation and the classic paper by Shoemake (1985) is very readable introduction to this topic. The first publications about quaternions for robotics is probably Taylor (1979) and with subsequent work by Funda (1990).