In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.
Deming regression is equivalent to the maximum likelihood estimation of an errors-in-variables model in which the errors for the two variables are assumed to be independent and normally distributed, and the ratio of their variances, denoted δ, is known. In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.
The Deming regression is only slightly more difficult to compute than the simple linear regression. Most statistical software packages used in clinical chemistry offer Deming regression.
The model was originally introduced by who considered the case δ = 1, and then more generally by with arbitrary δ. However their ideas remained largely unnoticed for more than 50 years, until they were revived by and later propagated even more by . The latter book became so popular in clinical chemistry and related fields that the method was even dubbed Deming regression in those fields.
Assume that the available data (yi, xi) are measured observations of the "true" values (yi*, xi*), which lie on the regression line:
\begin{align} yi&=
* | |
y | |
i |
+\varepsiloni,\\ xi&=
* | |
x | |
i |
+ηi, \end{align}
\delta=
| |||||||
|
.
In practice, the variances of the
x
y
\delta
x
y
\delta=1
We seek to find the line of "best fit"
y*=\beta0+\beta1x*,
SSR=
| ||||||||||||||||
\sum | ||||||||||||||||
i=1 |
+
| |||||||
|
)=
1 | ||||||
|
n((y | |
\sum | |
i-\beta |
0-\beta
2 | |
i) |
+
2) | |
\delta(x | |
i) |
\to min | |||||||||||||
|
SSR
See for a full derivation.
The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from i = 1 to n):
\begin{align} \overline{x}&=\tfrac{1}{n}\sumxi&\overline{y}&=\tfrac{1}{n}\sumyi,\\ sxx&=\tfrac{1}{n}\sum
2 | |
(x | |
i-\overline{x}) |
&&=\overline{x2}-\overline{x}2,\\ sxy&=\tfrac{1}{n}\sum(xi-\overline{x})(yi-\overline{y})&&=\overline{xy}-\overline{x}\overline{y},\\ syy&=\tfrac{1}{n}\sum
2 | |
(y | |
i-\overline{y}) |
&&=\overline{y2}-\overline{y}2. \end{align}
Finally, the least-squares estimates of model's parameters will be
\begin{align} &\hat\beta1=
| |||||||||||||
For the case of equal error variances, i.e., when
\delta=1
zj=xj+iyj
(xj,yj)
i
S=\sum{(zj-\overlinez)2}
\overlinez=\tfrac{1}{n}\sumzj
S=0
S ≠ 0
\sqrt{S}
A trigonometric representation of the orthogonal regression line was given by Coolidge in 1913.
In the case of three non-collinear points in the plane, the triangle with these points as its vertices has a unique Steiner inellipse that is tangent to the triangle's sides at their midpoints. The major axis of this ellipse falls on the orthogonal regression line for the three vertices. The quantification of a biological cell's intrinsic cellular noise can be quantified upon applying Deming regression to the observed behavior of a two reporter synthetic biological circuit.
When humans are asked to draw a linear regression on a scatterplot by guessing, their answers are closer to orthogonal regression than to ordinary least squares regression.[1]
The York regression extends Deming regression by allowing correlated errors in x and y.[2]