Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.
A classical problem in computer vision is three-dimensional (3D) reconstruction, where one seeks to infer 3D structure about a scene from two-dimensional (2D) images of it.[1] Practical cameras are complex devices, and photogrammetry is needed to model the relationship between image sensor measurements and the 3D world. In the standard pinhole camera model, one models the relationship between world coordinates
X
x
x=K\begin{bmatrix}R&t\end{bmatrix}X , x\inP2 , X\inP3,
where
Pn
n
In this setting, camera calibration is the process of estimating the parameters of the
3 x 4
M=K\begin{bmatrix}R&t\end{bmatrix}
Direct linear transformation (DLT) calibration uses correspondences between world points and camera image points to estimate camera parameters. In particular, DLT calibration exploits the fact that the perspective pinhole camera model defines a set of similarity relations that can be solved via the direct linear transformation algorithm.[3] To employ this approach, one requires accurate coordinates of a non-degenerate set of points in 3D space. A common way to achieve this is to construct a camera calibration rig (example below) built from three mutually perpendicular chessboards. Since the corners of each square are equidistant, it is straightforward to compute the 3D coordinates of each corner given the width of each square. The advantage of DLT calibration is its simplicity; arbitrary cameras can be calibrated by solving a single homogeneous linear system. However, the practical use of DLT calibration is limited by the necessity of a 3D calibration rig and the fact that extremely accurate 3D coordinates are required to avoid numerical instability.[1]
Multiplane calibration is a variant of camera auto-calibration that allows one to compute the parameters of a camera from two or more views of a planar surface. The seminal work in multiplane calibration is due to Zhang.[4] Zhang's method calibrates cameras by solving a particular homogeneous linear system that captures the homographic relationships between multiple perspective views of the same plane. This multiview approach is popular because, in practice, it is more natural to capture multiple views of a single planar surface - like a chessboard - than to construct a precise 3D calibration rig, as required by DLT calibration. The following figures demonstrate a practical application of multiplane camera calibration from multiple views of a chessboard.[5]
The second context in which chessboards arise in computer vision is to demonstrate several canonical feature extraction algorithms. In feature extraction, one seeks to identify image interest points, which summarize the semantic content of an image and, hence, offer a reduced dimensionality representation of one's data.[2] Chessboards - in particular - are often used to demonstrate feature extraction algorithms because their regular geometry naturally exhibits local image features like edges, lines, and corners. The following sections demonstrate the application of common feature extraction algorithms to a chessboard image.
Corners are a natural local image feature exploited in many computer vision systems. Loosely speaking, one can define a corner as the intersection of two edges. A variety of corner detection algorithms exist that formalize this notion into concrete algorithms. Corners are a useful image feature because they are necessarily distinct from their neighboring pixels. The Harris corner detector is a standard algorithm for corner detection in computer vision.[6] The algorithm works by analyzing the eigenvalues of the 2D discrete structure tensor matrix at each image pixel and flagging a pixel as a corner when the eigenvalues of its structure tensor are sufficiently large. Intuitively, the eigenvalues of the structure tensor matrix associated with a given pixel describe the gradient strength in a neighborhood of that pixel. As such, a structure tensor matrix with large eigenvalues corresponds to an image neighborhood with large gradients in orthogonal directions - i.e., a corner.
A chessboard contains natural corners at the boundaries between board squares, so one would expect corner detection algorithms to successfully detect them in practice. Indeed, the following figure demonstrates Harris corner detection applied to a perspective-transformed chessboard image. Clearly, the Harris detector is able to accurately detect the corners of the board.
(\rho,\theta)
(\rho,\theta)
(i,j)
(\rhoi,\thetaj)
The grid structure of a chessboard naturally defines two sets of parallel lines in an image of it. Therefore, one expects that line detection algorithms should successfully detect these lines in practice. Indeed, the following figure demonstrates Hough transform-based line detection applied to a perspective-transformed chessboard image. Clearly, the Hough transform is able to accurately detect the lines induced by the board squares.
The following MATLAB code generates the above images using the Image Processing Toolbox:
% Compute edge imageBW = edge(I, 'canny');
% Compute Hough transform[H theta rho] = hough(BW);
% Find local maxima of Hough transformnumpeaks = 19;thresh = ceil(0.1 * max(H(:)));P = houghpeaks(H, numpeaks, 'threshold', thresh);
% Extract image lineslines = houghlines(BW, theta, rho, P, 'FillGap', 50, 'MinLength', 60);
% --------------------------------------------------------------------------% Display results% --------------------------------------------------------------------------% Original imagefigure; imshow(I);
% Edge imagefigure; imshow(BW);
% Hough transformfigure; image(theta, rho, imadjust(mat2gray(H)), 'CDataMapping', 'scaled');hold on; colormap(gray(256));plot(theta(P(:, 2)), rho(P(:, 1)), 'o', 'color', 'r');
% Detected linesfigure; imshow(I); hold on; n = size(I, 2);for k = 1:length(lines) % Overlay kth line x = [lines(k).point1(1) lines(k).point2(1)]; y = [lines(k).point1(2) lines(k).point2(2)]; line = @(z) ((y(2) - y(1)) / (x(2) - x(1))) * (z - x(1)) + y(1); plot([1 n], line([1 n]), 'Color', 'r');end
The main limitation of using chessboard patterns for geometric camera calibration is that due to their highly repetitive structure, they need to be completely visible in the camera image. This assumption may be violated e.g. when specular reflections due to inhomogenous lighting cause chessboard detection to fail in some of the corners. The measurement of camera distortions close to the image corners is also altered by the need of a completely visible chessboard target.
To solve this issue, chessboard targets can be combined with some position encoding. One popular way is to place ArUco markers[10] inside the lightchessboard squares. The main advantage of such ChArUco targets[11] is that all light chessboard squares are uniquely coded and identifiable. This also allows to do single image multiplane calibration by placing multiple targets with different ArUco in one scene.
An alternative way for adding position encoding to chessboard patterns is the PuzzleBoard pattern:[12] Each chessboard edge is given one bit of information such that local parts of the pattern show a unique bit pattern. In comparison to ChArUco patterns, the position encoding can be read at much lower resolutions.
The following links are pointers to popular implementations of chessboard-related computer vision algorithms.