2 Representing Position and Orientation

A fundamental requirement in robotics and computer vision is to represent
the position and orientation of objects in an environment. Such objects include robots, cameras, workpieces, obstacles and paths.

Instead of describing the individual points we describe the position and orientation of the object by the position and orientation of its coordinate frame.

The position and orientation of a coordinate frame is known as its pose and is shown graphically as a set of coordinate axes. The relative pose of a frame with respect to a reference coordinate frame is denoted by the symbol ξ.
在这里插入图片描述
The point P in Fig. 2.2 can be described with respect to either coordinate frame. Formally we express this as
${}^Ap = {}^Aξ_{B} · {}^Bp$

An important characteristic of relative poses is that they can be composed or compounded. Consider the case shown in Fig. 2.3. If one frame can be described in terms of another by a relative pose then they can be applied sequentially
${}^Aξ_{C} = {}^Aξ_{B} · {}^Bξ_{C}$
在这里插入图片描述
So what is ξ? It can be any mathematical object that supports the algebra described above and is suited to the problem at hand. It will depend on whether we are considering a 2- or 3-dimensional problem. Some of the objects that we will discuss in the rest of this chapter include vectors as well as more exotic mathematical objects such as homogeneous transformations, orthonormal rotation matrices and quaternions.

To recap:

A point is described by a coordinate vector that represents its displacement from a
reference coordinate system;

A set of points that represent a rigid object can be described by a single coordinate
frame, and its constituent points are described by displacements from that coordinate
frame;

The position and orientation of an object’s coordinate frame is referred to as its
pose;

A relative pose describes the pose of one coordinate frame with respect to another
and is denoted by an algebraic variable ξ;

A coordinate vector describing a point can be represented with respect to a different
coordinate frame by applying the relative pose to the vector using the · operator;

We can perform algebraic manipulation of expressions written in terms of relative
poses.

2.1 lRepresenting Pose in 2-Dimensions

A 2-dimensional world, or plane, is familiar to us from high-school Euclidean geometry. We use a Cartesian coordinate system or coordinate frame with orthogonal axes denoted x and y and typically drawn with the x-axis horizontal and the y-axis vertical. The point of intersection is called the origin. Unit-vectors parallel to the axes are denoted ’ and (. A point is represented by its x- and y-coordinates (x, y) or as a bound vector
$p = x \hat{x} + y \hat{y}$
Figure 2.6 shows a coordinate frame {B} that we wish to describe with respect to the reference frame {A}. We can see clearly that the origin of {B} has been displaced by the vector t = (x, y) and then rotated counter-clockwise by an angle θ. A concrete representation of pose is therefore the 3-vector AξB∼ (x, y, θ), and we use the symbol ∼ to denote that the two representations are equivalent. Unfortunately this representation is not convenient for compounding since
$(x_1, y_1, \theta_1) \oplus (x_1, y_1, \theta_1)$
is a complex trigonometric function of both poses. Instead we will use a different way of representing rotation.
在这里插入图片描述
The matrix has a very specific structure and belongs to the special Euclidean group of dimension 2 or $T ∈ SE(2) ⊂ R^{3×3}$ .
${}^AT_{B} = \begin{bmatrix} cos\theta & sin\theta & x \\ -sin\theta & cos\theta & y \\ 0 & 0 & 1 \end{bmatrix}$
在这里插入图片描述

2.2 Representing Pose in 3-Dimensions

A point P is represented by its x-, y- and z-coordinates (x, y, z) or as a bound vector
$P = x\hat{x} + y\hat{y} + z\hat{z}$

2.2.1 Representing Orientation in 3-Dimensions

Any two independent orthonormal coordinate frames can be related by a sequence of rotations (not more than three) about coordinate axes, where no two successive rotations may be about the same axis. Euler’s rotation theorem (Kuipers 1999).

The implication for the pose algebra we have used in this chapter is that the ⊕ operator is not commutative – the order in which rotations are applied is very important.

Mathematicians have developed many ways to represent rotation and we will discuss several of them in the remainder of this section: orthonormal rotation matrices, Euler and Cardan angles, rotation axis and angle, and unit quaternions.

2.2.1.1 Orthonormal Rotation Matrix

The matrix R belongs to the special orthogonal group of dimension 3 or $R ∈ SO(3) ⊂ R^{3×3}$ . It has the properties of an orthonormal matrix that were mentioned on page 16 such as $R^T= R^{−1}$ and $det(R) = 1$ .

The orthonormal rotation matrices for rotation of θ about the x-, y- and z-axes are
在这里插入图片描述
The orthonormal matrix has nine elements but they are not independent. The columns have unit magnitude which provides three constraints. The columns are orthogonal to each other which provides another three constraints. Nine elements and six constraints is effectively three independent values.
在这里插入图片描述

2.2.1.2 Three-Angle Representations

Euler’s rotation theorem requires successive rotation about three axes such that no two successive rotations are about the same axis. There are two classes of rotation sequence: Eulerian and Cardanian, named after Euler and Cardano respectively.

The Eulerian type involves repetition, but not successive, of rotations about one particular axis: XYX, XZX, YXY, YZY, ZXZ, or ZYZ. The Cardanian type is characterized by rotations about all three axes: XYZ, XZY, YZX, YXZ, ZXY, or ZYX. In common usage all these sequences are called Euler angles and there are a total of twelve to choose from.
在这里插入图片描述
The ZYZ sequence
$R = R_z(\phi)R_y(\theta)R_z(\psi)$
is commonly used in aeronautics and mechanical dynamics, and is used in the Toolbox.
The Euler angles are the 3-vector $\Gamma = (φ,θ, ψ)$ .

The two different sets of Euler angles correspond to the one rotation matrix. The mapping from rotation matrix to Euler angles is not unique and always returns a positive angle for θ.

Another widely used convention is the roll-pitch-yaw angle sequence angle
$R = R_x(\theta_r)R_y(\theta_p)R_z(\theta_y)$
which are intuitive when describing the attitude of vehicles such as ships, aircraft and cars. Roll, pitch and yaw (also called bank, attitude and heading) refer to rotations about the x-,y-,z-axes, respectively. This XYZ angle sequence, technically Cardan angles, are also known as Tait-Bryan angles or nautical angles. For aerospace and ground vehicles the x-axis is commonly defined in the forward direction, z-axis downward and the y-axis to the right-hand side.

The roll-pitch-yaw sequence allows all angles to have arbitrary sign and it has a singularity when $θ_p= ±\frac{\pi}{2}$ which is fortunately outside the range of feasible attitudes for most vehicles.

2.2.1.3 Singularities and Gimbal Lock

A fundamental problem with the three-angle representations just described is singularity. This occurs when the rotational axis of the middle term in the sequence becomes parallel to the rotation axis of the first or third term. This is the same problem as gimbal lock, a term made famous in the movie Apollo 13.

In mathematical, rather than mechanical, terms this problem can be seen using the definition of the Lunar module’s coordinate system where the rotation of the spacecraft’s body-fixed frame {B} with respect to the stable platform frame {S} is
${}^SR_B = R_y(\theta_p)R_z(\theta_r)R_x(\theta_y)$
For the case when $θ_r= \frac{\pi}{2}$ we can apply the identity
$R_y(\theta_p)R_z(\frac{\pi}{2}) = R_z(\frac{\pi}{2})R_x(\theta_p)$
leading to
${}^SR_B = R_y(\frac{\pi}{2})R_z(\theta_r)R_x(\theta_y) = R_y(\frac{\pi}{2})R_x(\theta_r + \theta_y)$
which cannot represent rotation about the y-axis.

The loss of a degree of freedom means that mathematically we cannot invert the transformation, we can only establish a linear relationship between two of the angles. In such a case the best we can do is determine the sum of the pitch and yaw angles. We observed a similar phenomena with the Euler angle singularity earlier.

All three-angle representations of attitude, whether Eulerian or Cardanian, suffer this problem of gimbal lock when two consecutive axes become aligned. For ZYZEuler angles this occurs when $θ = kπ, k ∈ Z$ and for roll-pitch-yaw angles when pitch $θ_p = ±(2k + 1)\frac{\pi}{2}$ . The best that can be hoped for is that the singularity occurs for an attitude which does not occur during normal operation of the vehicle – it requires judicious choice of angle sequence and coordinate system. Singularities are an unfortunate consequence of using a minimal representation.

To eliminate this problem we need to adopt different representations of orientation. Many in the Apollo LM team would have preferred a four gimbal system and the clue to success, as we shall see shortly in Sect. 2.2.1.6, is to introduce a fourth parameter.

2.2.1.4 Two Vector Representation

For arm-type robots it is useful to consider a coordinate frame {E} attached to the end-effector as shown in Fig. 2.14. By convention the axis of the tool is associated with the z-axis and is called the approach vector and denoted $\hat{a} = (a_x, a_y, a_z)$ . For some applications it is more convenient to specify the approach vector than to specify Euler or roll-pitch-yaw angles.
在这里插入图片描述
However specifying the direction of the z-axis is insufficient to describe the coordinate frame – we also need to specify the direction of the x- and y-axes. An orthogonal vector that provides orientation, perhaps between the two fingers of the robot’s gripper is called the orientation vector, $\hat{o} = (o_x, o_y, o_z)$ . These two unit vectors are sufficient to completely define the rotation matrix
在这里插入图片描述
Any two non-parallel vectors are sufficient to define a coordinate frame. For a camera we might use the optical axis, by convention the z-axis, and the left side of the camera which is by convention the x-axis. For a mobile robot we might use the gravitational acceleration vector (measured with accelerometers) which is by convention the z-axis and the heading direction (measured with an electronic compass) which is by convention the x-axis.

2.2.1.5 Rotation about an Arbitrary Vector

Two coordinate frames of arbitrary orientation are related by a single rotation about some axis in space. This information is encoded in the eigenvalues and eigenvectors of R.

An orthonormal rotation matrix will always have one real eigenvalue at $λ = 1$ and a complex pair $λ = cosθ ± i sinθ$ where θ is the rotation angle. For the case $λ = 1$ then which implies that the corresponding eigenvector v is unchanged by the rotation. There is only one such vector and that is the one about which the rotation occurs.

The inverse, converting from angle and vector to a rotation matrix, is achieved using Rodrigues’ rotation formula
在这里插入图片描述
Alternatively we can multiply the unit vector by the angle to give another 3-parameter representation vθ. While these forms are minimal and efficient in terms of data storage they are analytically problematic. Many variants have been proposed including $v sin(θ/2)$ and $v tan(θ)$ but all are ill-defined for $θ = 0$ .

2.2.1.6 Unit Quaternion

Quaternions have been controversial since they were discovered by W. R. Hamilton over 150 years ago but they have great utility for roboticists. The quaternion is an extension of the complex number – a hyper-complex number – and is written as a scalar plus a vector
在这里插入图片描述
where $s ∈ R,v ∈ R^3$ and the orthogonal complex numbers i,j and k are defined such that
$i^2 = j^2 = k^2 = ijk = -1$
We will denote a quaternion as
$\mathring{q} = s <v_1, v_2, v_3>$
One early objection to quaternions was that multiplication was not commutative but as we have seen above this is exactly the case for rotations. Despite the initial controversy quaternions are elegant, powerful and computationally straightforward and widely used for robotics, computer vision, computer graphics and aerospace inertial navigation applications.

To represent rotations we use unit-quaternions. These are quaternions of unit magnitude, that is, those for which $|\mathring{q}| = 1$ or $s^2 + v_1^2 + v_2^2 + v_3^2 = 1$ .

The unit-quaternion has the special property that it can be considered as a rotation of $θ$ about the unit vector $\hat{n}$ which are related to the quaternion components by
$s = cos\frac{\theta}{2}, v = {sin\frac{\theta}{2}}\hat{n}$
and is similar to the angle-axis representation of Sect. 2.2.1.5.
在这里插入图片描述
If we write the quaternion as a 4-vector $(s,v_1,v_2,v_2)$ then multiplication can be expressed as a matrix-vector product where

Compounding two orthonormal rotation matrices requires 27 multiplications and 18 additions. The quaternion form requires 16 multiplications and 12 additions. This saving can be particularly important for embedded systems.

2.2.2 Combining Translation and Orientation

We have discussed several different representations of orientation, and we need to combine this with translation, to create a tangible representation of relative pose. The two most practical representations are: the quaternion vector pair and the 4 × 4 homogeneous transformation matrix.
在这里插入图片描述
Alternatively we can use a homogeneous transformation matrix to describe rotation and translation. The derivation is similar to the 2D case of Eq. 2.10 but extended to account for the z-dimension

The Cartesian translation vector between the origin of the coordinates frames is t and the change in orientation is represented by a 3 × 3 orthonormal submatrix R. The vectors are expressed in homogenous form and we write
在这里插入图片描述
and ${}^AT_B$ is a 4 × 4 homogeneous transformation. The matrix has a very specific structure and belongs to the special Euclidean group of dimension 3 or $T ∈ SE(3) ⊂ R^{4×4}$ .

The 4 × 4 homogeneous transformation is very commonly used in robotics and computer vision.
Summary of the various concrete representations of pose ξ introduced in this chapter
Conversion between rotational representations

2.3 Wrapping Up

In this chapter we learned how to represent points and poses in 2- and 3-dimensional worlds. Points are represented by coordinate vectors relative to a coordinate frame. A set of points that belong to a rigid object can be described by a coordinate frame, and its constituent points are described by displacements from the object’s coordinate frame. The position and orientation of any coordinate frame can be described relative to another coordinate frame by its relative pose ξ. Relative poses can be applied sequentially (composed or compounded), and we have shown how relative poses can be manipulated algebraically. An important algebraic rule is that composition is non-commutative – the order in which relative poses are applied is important.

Robotics, Vision and Control - 2