Root mean square deviation explained

The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on the other.The deviation is typically simply a differences of scalars; it can also be generalized to the vector lengths of a displacement, as in the bioinformatics concept of root mean square deviation of atomic positions.

RMSD of a sample

The RMSD of a sample is the quadratic mean of the differences between the observed values and predicted ones. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation (and are therefore always in reference to an estimate) and are called errors (or prediction errors) when computed out-of-sample (aka on the full set, referencing a true value rather than an estimate). The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.[1]

RMSD is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSD is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used.

RMSD is the square root of the average of squared errors. The effect of each error on RMSD is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to outliers.[2] [3]

Formulas

Estimator

\hat{\theta}

with respect to an estimated parameter

\theta

is defined as the square root of the mean squared error:

\operatorname{RMSD}(\hat{\theta})=\sqrt{\operatorname{MSE}(\hat{\theta})}=\sqrt{\operatorname{E}((\hat{\theta}-\theta)2)}.

For an unbiased estimator, the RMSD is the square root of the variance, known as the standard deviation.

Samples

If is a sample of a population with true mean value

x0

, then the RMSD of the sample is

\operatorname{RMSD}=\sqrt{

1
n
n(X
\sum
i-x
2}
0)
.

The RMSD of predicted values

\hatyt

for times t of a regression's dependent variable

yt,

with variables observed over T times, is computed for T different predictions as the square root of the mean of the squares of the deviations:
\operatorname{RMSD}=\sqrt{
T
\sum(yt-\hat
2
y
t)
t=1
T
}.

(For regressions on cross-sectional data, the subscript t is replaced by i and T is replaced by n.)

In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series

x1,t

and

x2,t

,the formula becomes

\operatorname{RMSD}=\sqrt{

T
\sum(x1,t-x2,t)2
t=1
T
}.

Normalization

Normalizing the RMSD facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, common choices are the mean or the range (defined as the maximum value minus the minimum value) of the measured data:[4]

NRMSD=

RMSD
ymax-ymin
or

NRMSD=

RMSD
\bary

.

This value is commonly referred to as the normalized root mean square deviation or error (NRMSD or NRMSE), and often expressed as a percentage, where lower values indicate less residual variance. This is also called Coefficient of Variation or Percent RMS. In many cases, especially for smaller samples, the sample range is likely to be affected by the size of sample which would hamper comparisons.

Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the interquartile range (IQR). When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable.

RMSDIQR=

RMSD
IQR
where

IQR=Q3-Q1

with

Q1=CDF-1(0.25)

and

Q3=CDF-1(0.75),

where CDF−1 is the quantile function.

When normalizing by the mean value of the measurements, the term coefficient of variation of the RMSD, CV(RMSD) may be used to avoid ambiguity.[5] This is analogous to the coefficient of variation with the RMSD taking the place of the standard deviation.

CV(RMSD)=

RMSD
\bary

.

Mean absolute error

Some researchers have recommended the use of the mean absolute error (MAE) instead of the root mean square deviation. MAE possesses advantages in interpretability over RMSD. MAE is the average of the absolute values of the errors. MAE is fundamentally easier to understand than the square root of the average of squared errors. Furthermore, each error influences MAE in direct proportion to the absolute value of the error, which is not the case for RMSD.

Applications

See also

Notes and References

  1. Hyndman. Rob J.. Koehler. Anne B.. Another look at measures of forecast accuracy. International Journal of Forecasting. 2006. 679–688. 10.1016/j.ijforecast.2006.03.001. 22. 4. 10.1.1.154.9771. 15947215 .
  2. Pontius. Robert. Thontteh. Olufunmilayo. Chen. Hao. 2008. Components of information for multiple resolution comparison between maps that share a real variable. Environmental Ecological Statistics. 15. 2. 111–142. 10.1007/s10651-007-0043-y. 2008EnvES..15..111P . 21427573 .
  3. Willmott. Cort. Matsuura. Kenji. 2006. On the use of dimensioned measures of error to evaluate the performance of spatial interpolators. International Journal of Geographical Information Science. 20. 1 . 89–102. 10.1080/13658810500286976. 2006IJGIS..20...89W . 15407960 .
  4. Web site: Coastal Inlets Research Program (CIRP) Wiki - Statistics. 4 February 2015.
  5. Web site: FAQ: What is the coefficient of variation?. 19 February 2019.
  6. Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons . Armstrong . J. Scott . Collopy . Fred . International Journal of Forecasting . 8 . 69–80 . 1992 . 10.1016/0169-2070(92)90008-w . 1. 10.1.1.423.508 . 11034360 .
  7. Book: Anderson, M.P. . Applied Groundwater Modeling: Simulation of Flow and Advective Transport . Academic Press . 1992 . Woessner, W.W. . 2nd.
  8. http://www.ocgy.ubc.ca/projects/clim.pred/NN/3.1/model.html Ensemble Neural Network Model
  9. http://www.bpi.org/Web%20Download/BPI%20Standards/BPI-2400-S-2012_Standard_Practice_for_Standardized_Qualification_of_Whole-House%20Energy%20Savings_9-28-12_sg.pdf ANSI/BPI-2400-S-2012: Standard Practice for Standardized Qualification of Whole-House Energy Savings Predictions by Calibration to Energy Use History
  10. https://kalman-filter.com/root-mean-square-error/ https://kalman-filter.com/root-mean-square-error