Homogeneous Coordinates

A Concrete Explanation

Updated 2016-10-08

Computer graphics programs usually work with homogeneous coordinates, whose main purpose is to allow for some operations like translation to be linear and thus vectorizable. Vectorized operations can be heavily parallelized on the processing unit which results in a tremendous performance boost. Video games make heavy use of this technique.

Homogeneous coordinates are often introduced as a trick. I think this is a misconception and it paves the way for a lot of misunderstandings and errors when working with computer graphics.

In the following article I will detail a more natural approach to the concept.

Notations

In the following we will consider canonical Euclidean planes and spaces. \(x\), \(y\) and \(z\) are the canonical Euclidean dimensions or the vector components depending on the context.

The initial space is called the projective space.

All the transformations are supposed to be endomorphisms.

Usual transformations

Linear transformations in a Euclidean space can be written as a matrix \(M\) so that the transformed point \(P'\) of \(P\) is:

\[ P' = M \cdot P \]

In 2D, the matrix is of size \((2,2)\). A few examples follow:

\[ \begin{pmatrix} s & 0 \\ 0 & s \\ \end{pmatrix} \]

Other linear transformations include shearing and mirroring.

Non-linear operations

In the following section we will consider the projective space to be a 2D plane. This is only to make the explanation more visual, since this can be easily generalized to any dimension.

Some operations are not linear in the projective plane. But what if we consider the projective plane as being a plane in a 3D space? Then we can apply any 3D transformation on the points of this plane.

We say that we promote the plane to the next dimension.

The fundamental principle here is that the higher dimension allows for more complex transformations, e.g. some specific shearing operations in 3D can be seen as a translation on the plane.

After the desired transformations have been applied, we project the resulting points back to the plane.

There are some parameters we need to specify:

In order to answer this we need to analyze how some non-linear operations behave.

Translation operation

Promotion

Translations are not linear in the 2D-Euclidean space. Now if we promote the input points to a 3D space, they will all lie in a plane. It is obviously simpler to keep the \(x\) and \(y\) vectors identical. Then the only parameter for defining the plane is \(z\).

Let’s have a look at the following shearing of some point \(P=(i,j,k)\) by the scalars \(a\) and \(b\):

\[ \begin{pmatrix} 1 & 0 & a \\ 0 & 1 & b \\ 0 & 0 & 1 \\ \end{pmatrix} \cdot (i, j, k) = (i+ka, j+kb, k) \]

If \(k=1\), then the transformed point is \((i+a, j+b, 1)\), which is a translation on the plane \(z=1\).

Considering the projective plane at \(z=1\) makes it easy to write a translation matrix.

Conclusion: the points from a projective space of any dimension can be promoted to the next dimension by adding a coordinate with the scalar value 1.

Translation of a promoted point

Projection

If the transformed points lie in the plane, the projection back to the projective space is trivial: we simply cut off the last dimension.

However, the transformed points might end up somewhere outside the plane. In that case we need some way to map the 3D points to the 2D points. There are several ways of doing this.

First let’s consider some point \(P=(i,j,k)\) and let’s see what happens if we apply the same transformation as before:

\[ \begin{pmatrix} 1 & 0 & a \\ 0 & 1 & b \\ 0 & 0 & 1 \\ \end{pmatrix} \cdot (i,j,k) = (i+ka, j+ka, k) = P' \]

Conclusion: the projection back to the projective space is done by dividing all dimensions by the last one when non-zero. This allows for translations in the projective space.

Perspective operation

We have seen one example that covers it all. But what if the above analysis was good enough for translations only? Let us have a look at a very common operation in computer graphics: perspective correction.

In this section we consider a 3D projective space. The higher dimension is 4D and the 4th coordinates we be written \(w\).

When the camera and the screen are centered on the \(z\)-axis, perspective projection results from simple triangular relations, i.e. by dividing \(x\) and \(y\) by \(\alpha+\beta\cdot z\), where \(\alpha\) and \(\beta\) are scalars depending on the distance from the camera and the screen to the origin.

Perspective relations

In the previous scheme, \(i'=i\frac{z_s-z_c}{k-z_c}\) and \(j'=j\frac{z_s-z_c}{k-z_c}\).

In the simplest case the camera is at the origin and the screen is at \(z=1\), as above. We get the simple transformation where \(\alpha=1\) and \(\beta=1\), that is, in our example, \(i'=i/k\) and \(j'=j/k\).

But there is no way a matrix multiplication can induce a division by one of the dimension. After all, matrices are a notation for linear transformations.

There is one way out though: a division by one dimension is performed during the projection back to the projective space. As such, our lead is to “inject” \(z\) into \(w\):

\[ \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & \beta & \alpha \\ \end{pmatrix} \cdot (i, j, k, 1) = (i, j, k, \alpha+\beta k) \]

If we project the result back to the projective space, we get the 3D point \((\frac{i}{\alpha+\beta k}, \frac{j}{\alpha+\beta k}, \frac{k}{\alpha+\beta k})\). If we project the result on the plane \(z=z_s\), we obtain the perspective projection of \((i, j, k)\).

Conclusion: one more time, the division by the last coordinate is a projection technique that allows for operations in the next dimension to be properly mapped to the projective space.

Homogeneous coordinates

The above analysis issued a relation between points in the projective space and points in the next dimension.

Making use of the above reasoning, homogeneous coordinates offer an alternative notation to cartesian coordinates. We insist on the word ‘notation’ as there are no new points nor new spaces. It means that \((i,j,k)\) and \((i,j,k,1)_h\) specify the same point in 3D.

We use the \(_h\) index to avoid confusion between homogeneous coordinates and cartesian coordinates in a higher dimension.

The mapping between the two coordinate systems is central: the cartesian coordinates are obtained by dividing all dimensions by the homogeneous component if not null, and discarding this last component. This has several consequences:

Examples

Rendering pipeline

In computer graphics, homogeneous coordinates play an important role as they allow for perspective transformations in 3D.

The 3D rendering pipeline can be summarized as follows (C stands for cartesian, H stands for homogeneous):

Vertex data (3D C) → Object data (3D H) → Eye coordinates (3D H) → Clip coordinates (3D H) → Normalized device coordinates (3D C) → Window coordinates (2D C)

2D perspective correction

Let us consider the following problem: we want to rectify the perspective of a picture, e.g. we want a tilted building facade picture to appear as if it had been taken orthogonally.

We proceed with control points: we select pixels in the input picture and tell the program where we would like it to be on the transformed picture.

A \((2,2)\)-matrix will not do the trick as it is not able to handle translations for instance.

Using homogeneous coordinates, we can transform the entire plane to the orthogonal picture we would like. In 3D, this can be seen as tilting the plane to make it match the screen.

This transformation can be embodied within a \((3,3)_h\)-matrix \(M\). The answer to our problem is \(M\).

Let us consider a set of known point pairs \(\langle \text{ control point } (i,j), \text{ target point } (I,J)\rangle\), so that

\[ \begin{pmatrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \\ \end{pmatrix} \cdot (i, j, 1)_h = (\alpha I, \alpha J, \alpha)_h \]

We could be tempted to think we only need the first 2 rows of the matrix: this is wrong! The result of the matrix multiplication yields the homogeneous coordinates of a point, which still needs to be projected to the 2D plane to be significant. Therefore we need to compute the homogeneous component of every point.

The previous matrix operation yields the following equations:

\[ \left\{ \begin{array}{l l} m_{11}i + m_{12}j + m_{13} &= \alpha I \\ m_{21}i + m_{22}j + m_{23} &= \alpha J \\ m_{31}i + m_{32}j + m_{33} &= \alpha \\ \end{array} \right. \]

or, cancelling the scale factor:

\[ \left\{ \begin{array}{l l} \frac{m_{11}i + m_{12}j + m_{13}}{m_{31}i + m_{32}j + m_{33}} &= I \\ \frac{m_{21}i + m_{22}j + m_{23}}{m_{31}i + m_{32}j + m_{33}} &= J \\ \end{array} \right. \]

The matrix has 9 unknown coefficients, but since it is homogeneous, it is defined up to a scale factor. For instance, we have \(M=M/m_{11}\) if \(m_{11}≠0\). As such, only 8 coefficient determines a homogeneous matrix uniquely.

We can solve the system of 8 linearly independent equations constructed as above from 4 well chosen points. The above equation pair yields

\[ \left\{ \begin{array}{l l} m_{11}i + m_{12}j + m_{13} - m_{31}iI - m_{32}jI - m_{33}I &= 0 \\ m_{21}i + m_{22}j + m_{23} - m_{31}iJ - m_{32}jJ - m_{33}J &= 0 \\ \end{array} \right. \]

which can be written as a multiplication

\[ \left\{ \begin{array}{l l} (i, j, 1, 0, 0, 0, -iI, -jI, -I) \cdot m^T &= 0 \\ (0, 0, 0, i, j, 1, -iJ, -jJ, -J) \cdot m^T &= 0 \\ \end{array} \right. \]

with \(m=(m_{11}, m_{12}, m_{13}, m_{21}, m_{22}, m_{23}, m_{31}, m_{32}, m_{33})\).

We can build the following matrix from 4 point sets:

\[ S = \begin{pmatrix} i_1 & j_1 & 1 & 0 & 0 & 0 & -i_1I_1 & -j_1I_1 & -I_1 \\ 0 & 0 & 0 & i_1 & j_1 & 1 & -i_1J_1 & -j_1J_1 & -J_1 \\ ... \\ i_4 & j_4 & 1 & 0 & 0 & 0 & -i_4I_4 & -j_4I_4 & -I_4 \\ 0 & 0 & 0 & i_4 & j_4 & 1 & -i_4J_4 & -j_4J_4 & -J_4 \\ \end{pmatrix} \]

If \(det(S) ≠ 0\), then the result can be found by solving

\[ S\cdot m = 0 \]

which is equivalent to finding the eigenvalues of \(S\). An SVD will numerically find \(m\).

Final note

The present article does not claim to provide any new results when it comes to homogeneous coordinates. It only aims at explaining the theory more visually, and thus making it easier to understand and remember. It all makes perfect sense, it is not just a trick. Mathematically speaking, the key argument that I introduced above is to consider the projective space as an affine hyperplane in the next dimension. Then everything falls into place:

The central idea can be seen as a hook in the process: we momentarily promote our data to the next dimension, manipulate it in there, then project the result back to the original dimension. It generally does not make sense to exploit a result in homogeneous coordinates.