In signal processing, multidimensional discrete convolution refers to the mathematical operation between two functions f and g on an n-dimensional lattice that produces a third function, also of n-dimensions. Multidimensional discrete convolution is the discrete analog of the multidimensional convolution of functions on Euclidean space. It is also a special case of convolution on groups when the group is the group of n-tuples of integers.
Similar to the one-dimensional case, an asterisk is used to represent the convolution operation. The number of dimensions in the given operation is reflected in the number of asterisks. For example, an M-dimensional convolution would be written with M asterisks. The following represents a M-dimensional convolution of discrete signals:
y(n1,n2,...,nM)=x(n1,n2,...,nM)*\overset{M}{ … }*h(n1,n2,...,nM)
For discrete-valued signals, this convolution can be directly computed via the following:
infty | |
\sum | |
k1=-infty |
infty | |
\sum | |
k2=-infty |
infty | |
...\sum | |
kM=-infty |
h(k1,k2,...,kM)x(n1-k1,n2-k2,...,nM-kM)
The resulting output region of support of a discrete multidimensional convolution will be determined based on the size and regions of support of the two input signals.
Listed are several properties of the two-dimensional convolution operator. Note that these can also be extended for signals of
N
Commutative Property:
x**h=h**x
Associate Property:
(x**h)**g=x**(h**g)
Distributive Property:
x**(h+g)=(x**h)+(x**g)
These properties are seen in use in the figure below. Given some input
x(n1,n2)
h(n1,n2)
g(n1,n2)
y(n1,n2)
w(n1,n2)
w=x**h
Further, that intermediate function is then convolved with the impulse response of the second filter, and thus the output can be represented by:
y=w**g=(x**h)**g
Using the associative property, this can be rewritten as follows:
y=x**(h**g)
meaning that the equivalent impulse response for a cascaded system is given by:
heq=h**g
A similar analysis can be done on a set of parallel systems illustrated below.
In this case, it is clear that:
y=(x**h)+(x**g)
Using the distributive law, it is demonstrated that:
y=x**(h+g)
This means that in the case of a parallel system, the equivalent impulse response is provided by:
heq=h+g
The equivalent impulse responses in both cascaded systems and parallel systems can be generalized to systems with
N
Convolution in one dimension was a powerful discovery that allowed the input and output of a linear shift-invariant (LSI) system (see LTI system theory) to be easily compared so long as the impulse response of the filter system was known. This notion carries over to multidimensional convolution as well, as simply knowing the impulse response of a multidimensional filter too allows for a direct comparison to be made between the input and output of a system. This is profound since several of the signals that are transferred in the digital world today are of multiple dimensions including images and videos. Similar to the one-dimensional convolution, the multidimensional convolution allows the computation of the output of an LSI system for a given input signal.
For example, consider an image that is sent over some wireless network subject to electro-optical noise. Possible noise sources include errors in channel transmission, the analog to digital converter, and the image sensor. Usually noise caused by the channel or sensor creates spatially-independent, high-frequency signal components that translates to arbitrary light and dark spots on the actual image. In order to rid the image data of the high-frequency spectral content, it can be multiplied by the frequency response of a low-pass filter, which based on the convolution theorem, is equivalent to convolving the signal in the time/spatial domain by the impulse response of the low-pass filter. Several impulse responses that do so are shown below.[1]
In addition to filtering out spectral content, the multidimensional convolution can implement edge detection and smoothing. This once again is wholly dependent on the values of the impulse response that is used to convolve with the input image. Typical impulse responses for edge detection are illustrated below.
In addition to image processing, multidimensional convolution can be implemented to enable a variety of other applications. Since filters are widespread in digital communication systems, any system that must transmit multidimensional data is assisted by filtering techniques It is used in real-time video processing, neural network analysis, digital geophysical data analysis, and much more.[2]
One typical distortion that occurs during image and video capture or transmission applications is blur that is caused by a low-pass filtering process. The introduced blur can be modeled using Gaussian low-pass filtering.
A signal is said to be separable if it can be written as the product of multiple one-dimensional signals. Mathematically, this is expressed as the following:
x(n1,n2,...,nM)=x(n1)x(n2)...x(nM)
Some readily recognizable separable signals include the unit step function, and the dirac-delta impulse function.
u(n1,n2,...,nM)=u(n1)u(n2)...u(nM)
\delta(n1,n2,...,nM)=\delta(n1)\delta(n2)...\delta(nM)
Convolution is a linear operation. It then follows that the multidimensional convolution of separable signals can be expressed as the product of many one-dimensional convolutions. For example, consider the case where x and h are both separable functions.
x(n1,n2)**h(n1,n2)=\sum
infty | |
k1=-infty |
infty | |
\sum | |
k2=-infty |
h(k1,k2)x(n1-k1,n2-k2)
By applying the properties of separability, this can then be rewritten as the following:
x(n1,n2)**h(n1,n2)=(\sum
infty | |
k1=-infty |
h(k1)x(n1-k1))(\sum
infty | |
k2=-infty |
h(k2)x(n2-k2))
It is readily seen then that this reduces to the product of one-dimensional convolutions:
x(n1,n2)**h(n1,n2)=[x(n1)*h(n1)][x(n2)*h(n2)]
This conclusion can then be extended to the convolution of two separable M-dimensional signals as follows:
x(n1,n2,...,nM)*\overset{M}{ … }*h(n1,n2,...,nM)=[x(n1)*h(n1)][x(n2)*h(n2)]...[x(nM)*h(nM)]
So, when the two signals are separable, the multidimensional convolution can be computed by computing
nM
The row-column method can be applied when one of the signals in the convolution is separable. The method exploits the properties of separability in order to achieve a method of calculating the convolution of two multidimensional signals that is more computationally efficient than direct computation of each sample (given that one of the signals are separable).[3] The following shows the mathematical reasoning behind the row-column decomposition approach (typically
h(n1,n2)
\begin{align} y(n1,n2)&=\sum
infty | |
k1=-infty |
infty | |
\sum | |
k2=-infty |
h(k1,k2)x(n1-k1,n2-k2)\\ &=\sum
infty | |
k1=-infty |
infty | |
\sum | |
k2=-infty |
h1(k1)h2(k2)x(n1-k1,n2-k2)\\ &=\sum
infty | |
k1=-infty |
h1(k1)[
infty | |
\sum | |
k2=-infty |
h2(k2)x(n1-k1,n2-k2)] \end{align}
The value of
infty | |
\sum | |
k2=-infty |
h2(k2)x(n1-k1,n2-k2)
y
n2
\begin{align} y(n1+\delta,n2)&=\sum
infty | |
k1=-infty |
h1(k1)[
infty | |
\sum | |
k2=-infty |
h2(k2)x(n1-[k1-\delta],n2-k2)]\\ &=\sum
infty | |
k1=-infty |
h1(k1+\delta)[
infty | |
\sum | |
k2=-infty |
h2(k2)x(n1-k1,n2-k2)] \end{align}
Thus, the resulting convolution can be effectively calculated by first performing the convolution operation on all of the rows of
x(n1,n2)
A processor will load in the signal data needed for the given operation. For modern processors, data will be loaded from memory into the processors cache, which has faster access times than memory. The cache itself is partitioned into lines. When a cache line is loaded from memory, multiple data operands are loaded at once. Consider the optimized case where a row of signal data can fit entirely within the processor's cache. This particular processor would be able to access the data row-wise efficiently, but not column-wise since different data operands in the same column would lie on different cache lines.[4] In order to take advantage of the way in which memory is accessed, it is more efficient to transpose the data set and then access it row-wise rather than attempt to access it column-wise. The algorithm then becomes:
h(n1,n2)
h1(n1)
h2(n2)
x(n1,n2)
h1(n1)
g(n1,n2)
g(n1,n2)
g(n1,n2)
y(n1,n2)
Examine the case where an image of size
X x Y
J x K
XYJK
XYJ
XYK
XYJ+XYK
XY(J+K)
The premise behind the circular convolution approach on multidimensional signals is to develop a relation between the Convolution theorem and the Discrete Fourier transform (DFT) that can be used to calculate the convolution between two finite-extent, discrete-valued signals.
For one-dimensional signals, the Convolution Theorem states that the Fourier transform of the convolution between two signals is equal to the product of the Fourier Transforms of those two signals. Thus, convolution in the time domain is equal to multiplication in the frequency domain. Mathematically, this principle is expressed via the following:This principle is directly extendable to dealing with signals of multiple dimensions. This property is readily extended to the usage with the Discrete Fourier transform (DFT) as follows (note that linear convolution is replaced with circular convolution where
⊗
N
y(n)=h(n) ⊗ x(n)\longleftrightarrowY(k)=H(k)X(k)
When dealing with signals of multiple dimensions:The circular convolutions here will be of size
N1,N2,...,NM
The motivation behind using the circular convolution approach is that it is based on the DFT. The premise behind circular convolution is to take the DFTs of the input signals, multiply them together, and then take the inverse DFT. Care must be taken such that a large enough DFT is used such that aliasing does not occur. The DFT is numerically computable when dealing with signals of finite-extent. One advantage this approach has is that since it requires taking the DFT and inverse DFT, it is possible to utilize efficient algorithms such as the Fast Fourier transform (FFT). Circular convolution can also be computed in the time/spatial domain and not only in the frequency domain.
Consider the following case where two finite-extent signals x and h are taken. For both signals, there is a corresponding DFT as follows:
x(n1,n2)\longleftrightarrowX(k1,k2)
h(n1,n2)\longleftrightarrowH(k1,k2)
The region of support of
x(n1,n2)
0\leqn1\leqP1-1
0\leqn2\leqP2-1
h(n1,n2)
0\leqn1\leqQ1-1
0\leqn2\leqQ2-1
The linear convolution of these two signals would be given as:Given the regions of support of
x(n1,n2)
h(n1,n2)
ylinear(n1,n2)
Based on the regions of support of the two signals, a DFT of size
N1 x N2
N1\geqmax(P1,Q1)
N2\geqmax(P2,Q2)
ycircular(n1,n2)=\sum
r1 |
\sum | |
r2 |
Q1-1 | |
[\sum | |
m1=0 |
Q2-1 | |
\sum | |
m2=0 |
h(m1,m2)x(n1-m1-r1N1,n2-m2-r2N2)]
(n1,n2)\in
R | |
N1N2 |
R | |
N1N2 |
\triangleq\{(n1,n2):0\leqn1\leqN1-1,0\leqn2\leqN2-1\}
The result will be that
ycircular(n1,n2)
ylinear(n1,n2)
ycircular(n1,n2)=\sum
r1 |
\sum | |
r2 |
ylinear(n1-r1N1,n2-r2N2){for
Then, in order to avoid aliasing between the spatially aliased replicas,
N1
N2
N1\geqP1+Q1-1
N2\geqP2+Q2-1
If these conditions are satisfied, then the results of the circular convolution will equal that of the linear convolution (taking the main period of the circular convolution as the region of support). That is:
ycircular(n1,n2)=ylinear(n1,n2)
(n1,n2)\in
R | |
N1N2 |
The Convolution theorem and circular convolution can thus be used in the following manner to achieve a result that is equal to performing the linear convolution:
N1
N2
N1\geqP1+Q1-1
N2\geqP2+Q2-1
h(n1,n2)
x(n1,n2)
N1 x N2
h(n1,n2)
x(n1,n2)
Y(k1,k2)=H(k1,k2)X(k1,k2)
Y(k1,k2)
Another method to perform multidimensional convolution is the overlap and add approach. This method helps reduce the computational complexity often associated with multidimensional convolutions due to the vast amounts of data inherent in modern-day digital systems.[6] For sake of brevity, the two-dimensional case is used as an example, but the same concepts can be extended to multiple dimensions.
Consider a two-dimensional convolution using a direct computation:
y(n1,n2)=
infty | |
\sum | |
k1=-infty |
infty | |
\sum | |
k2=-infty |
x(n1-k1,n2-k2)h(k1,k2)
Assuming that the output signal
y(n1,n2)
Instead of performing convolution on the blocks of information in their entirety, the information can be broken up into smaller blocks of dimensions
L1
L2
x(n1,n2)=
P1 | |
\sum | |
i=1 |
P2 | |
\sum | |
j=1 |
xij(n1,n2)
where
x(n1,n2)
N1
N2
P1P2
P1=N1/L1
P2=N2/L2
To produce the output signal, a two-dimensional convolution is performed:
y(n1,n2)=x(n1,n2)**h(n1,n2)
Substituting in for
x(n1,n2)
y(n1,n2)=
P1 | |
\sum | |
i=1 |
P2 | |
\sum | |
j=1 |
xij(n1,n2)**h(n1,n2)
This convolution adds more complexity than doing a direct convolution; however, since it is integrated with an FFT fast convolution, overlap-add performs faster and is a more memory-efficient method, making it practical for large sets of multidimensional data.
Let
h(n1,n2)
M1 x M2
x(n1,n2)
L1 x L2
h(n1,n2)
L1+M1-1
x
L2+M2-1
H(k1,k2)
xij(n1,n2)
L1+M1-1
x
L2+M2-1
Xij(k1,k2)
Yij(k1,k2)=Xij(k1,k2)H(k1,k2)
Yij(k1,k2)
yij(n1,n2)
y(n1,n2)
(M1-1)
x
(M2-1)
yij(n1,n2)
(M1-1)
x
(M2-1)
yi+1,j+1(n1,n2)
In order to visualize the overlap-add method more clearly, the following illustrations examine the method graphically. Assume that the input
x(n1,n2)
(N/2)
x
(N/2)
In the figure below, the first graph on the left represents the convolution corresponding to the component of the input
x0,0
h(n1,n2)
x1,0
h(n1,n2)
The same process is done for the other two inputs respectively, and they are accumulated together in order to form the convolution. This is depicted to the left.
Assume that the filter impulse response
h(n1,n2)
(N/8)
(N/2)
x
(N/8)
n1
n2
(N/2)
+
(N/8)
-
1
(5/8)N-1
in both directions. The lighter blue portion correlates to the overlap between two adjacent convolutions, whereas the darker blue portion correlates to overlap between all four convolutions. All of these overlap portions are added together in addition to the convolutions in order to form the combined convolution
y(n1,n2)
The overlap and save method, just like the overlap and add method, is also used to reduce the computational complexity associated with discrete-time convolutions. This method, coupled with the FFT, allows for massive amounts of data to be filtered through a digital system while minimizing the necessary memory space used for computations on massive arrays of data.
The overlap and save method is very similar to the overlap and add methods with a few notable exceptions. The overlap-add method involves a linear convolution of discrete-time signals, whereas the overlap-save method involves the principle of circular convolution. In addition, the overlap and save method only uses a one-time zero padding of the impulse response, while the overlap-add method involves a zero-padding for every convolution on each input component. Instead of using zero padding to prevent time-domain aliasing like its overlap-add counterpart, overlap-save simply discards all points of aliasing, and saves the previous data in one block to be copied into the convolution for the next block.
In one dimension, the performance and storage metric differences between the two methods is minimal. However, in the multidimensional convolution case, the overlap-save method is preferred over the overlap-add method in terms of speed and storage abilities.[10] Just as in the overlap and add case, the procedure invokes the two-dimensional case but can easily be extended to all multidimensional procedures.
Let
h(n1,n2)
M1 x M2
(M1-1)
(M2-1)
x(n1,n2)
L1+M1-1
x
L2+M2-1
(M1-1)
x
(M2-1)
h(n1,n2)
L1+M1-1
x
L2+M2-1
H(k1,k2)
Xij(k1,k2)
Yij(k1,k2)=Xij(k1,k2)H(k1,k2)
Yij(k1,k2)
yij(n1,n2)
(M1-1)
x
(M2-1)
yij(n1,n2)
y(n1,n2)
(L1 x L2)
yij(n1,n2)
Similar to row-column decomposition, the helix transform computes the multidimensional convolution by incorporating one-dimensional convolutional properties and operators. Instead of using the separability of signals, however, it maps the Cartesian coordinate space to a helical coordinate space allowing for a mapping from a multidimensional space to a one-dimensional space.
To understand the helix transform, it is useful to first understand how a multidimensional convolution can be broken down into a one-dimensional convolution. Assume that the two signals to be convolved are
XM
YK
Z(M
Z(i,j)=
M-1 | |
\sum | |
m=0 |
N-1 | |
\sum | |
n=0 |
X(m,n)Y(i-m,j-n)
Next, two matrices are created that zero pad each input in both dimensions such that each input has equivalent dimensions, i.e.
X'=\begin{bmatrix} X&0\\ 0&0\\ \end{bmatrix}
Y'=\begin{bmatrix} Y&0\\ 0&0\\ \end{bmatrix}
where each of the input matrices are now of dimensions
(M+K-1)
x
(N+L-1)
X''
Y''
X
Y
X''
Y''
lX''=
(M+K-1)
x
(N-1)
M
lY''=
(M+K-1)
x
(L-1)
K
The length of the convolution of these two vectors,
Z''
lZ''=
lY''+
lX''
=(M+K-1)
x
(N+L-1)
This vector length is equivalent to the dimensions of the original matrix output
Z
Z''
When working on a two-dimensional Cartesian mesh, a Fourier transform along either axes will result in the two-dimensional plane becoming a cylinder as the end of each column or row attaches to its respective top forming a cylinder. Filtering on a helix behaves in a similar fashion, except in this case, the bottom of each column attaches to the top of the next column, resulting in a helical mesh. This is illustrated below. The darkened tiles represent the filter coefficients.
If this helical structure is then sliced and unwound into a one-dimensional strip, the same filter coefficients on the 2-d Cartesian plane will match up with the same input data, resulting in an equivalent filtering scheme. This ensures that a two-dimensional convolution will be able to be performed by a one-dimensional convolution operator as the 2D filter has been unwound to a 1D filter with gaps of zeroes separating the filter coefficients.Assuming that some-low pass two-dimensional filter was used, such as:
0 | -1 | 0 | |
-1 | 4 | -1 | |
0 | -1 | 0 |
Then, once the two-dimensional space was converted into a helix, the one-dimensional filter would look as follows:
h(n)=-1,0,...,0,-1,4,-1,0,...,0,-1,0,...
Notice in the one-dimensional filter that there are no leading zeroes as illustrated in the one-dimensional filtering strip after being unwound. The entire one-dimensional strip could have been convolved with; however, it is less computationally expensive to simply ignore the leading zeroes. In addition, none of these backside zero values will need to be stored in memory, preserving precious memory resources.[12]
Helix transformations to implement recursive filters via convolution are used in various areas of signal processing. Although frequency domain Fourier analysis is effective when systems are stationary, with constant coefficients and periodically-sampled data, it becomes more difficult in unstable systems. The helix transform enables three-dimensional post-stack migration processes that can process data for three-dimensional variations in velocity. In addition, it can be applied to assist with the problem of implicit three-dimensional wavefield extrapolation.[13] Other applications include helpful algorithms in seismic data regularization, prediction error filters, and noise attenuation in geophysical digital systems.
One application of multidimensional convolution that is used within signal and image processing is Gaussian convolution. This refers to convolving an input signal with the Gaussian distribution function.
The Gaussian distribution sampled at discrete values in one dimension is given by the following (assuming
\mu=0
\sigma
\mu1=\mu2=...=\muM=0
y(n)=x(n)*G(n)
y(n1,n2,...,nM)=x(n1,n2,...,nM)*...*G(n1,n2,...,nM)
Gaussian convolution can be effectively approximated via implementation of a Finite impulse response (FIR) filter. The filter will be designed with truncated versions of the Gaussian. For a two-dimensional filter, the transfer function of such a filter would be defined as the following:[14]
H(z1,z
|
r1 | |
\sum | |
n1=-r1 |
r2 | |
\sum | |
n2=-r2 |
G(n1,n2){z
-n1 | |
1} |
-n2 | |
{z | |
2} |
where
s(r1,r2)=\sum
r1 | |
n1=-r1 |
r2 | |
\sum | |
n2=-r2 |
G(n1,n2)
Choosing lower values for
r1
r2
Another method for approximating Gaussian convolution is via recursive passes through a box filter. For approximating one-dimensional convolution, this filter is defined as the following:
H(z)= | 1 |
2r+1 |
zr-z-r-1 | |
1-z-1 |
Typically, recursive passes 3, 4, or 5 times are performed in order to obtain an accurate approximation. A suggested method for computing r is then given as the following:[15]
| ||||
\sigma |
K((2r+1)2-1)
Then, since the Gaussian distribution is separable across different dimensions, it follows that recursive passes through one-dimensional filters (isolating each dimension separately) will thus yield an approximation of the multidimensional Gaussian convolution. That is, M-dimensional Gaussian convolution could be approximated via recursive passes through the following one-dimensional filters:
H(z | ||||
|
{z1 | ||||
|
-r1-1 | |
-{z | |
1} |
H(z | ||||
|
{z2 | ||||
|
-r2-1 | |
-{z | |
2} |
\vdots
H(z | ||||
|
{zM | ||||
|
-rM-1 | |
-{z | |
M} |
Gaussian convolutions are used extensively in signal and image processing. For example, image-blurring can be accomplished with Gaussian convolution where the
\sigma