PRESS statistic explained

In statistics, the predicted residual error sum of squares (PRESS) is a form of cross-validation used in regression analysis to provide a summary measure of the fit of a model to a sample of observations that were not themselves used to estimate the model. It is calculated as the sum of squares of the prediction residuals for those observations.^[1] ^[2] ^[3] Specifically, the PRESS statistic is an exhaustive form of cross-validation, as it tests all the possible ways that the original data can be divided into a training and a validation set.

Procedure

Instead of fitting only one model on all data, leave-one-out cross-validation is used to fit N models (on N observations) where for each model one data point is left out from the training set. The out-of-sample predicted value is calculated for the omitted observation in each case, and the PRESS statistic is calculated as the sum of the squares of all the resulting prediction errors:^[4]

\operatorname{PRESS}

	n
=\sum
	i=1

(y_i-\hat{y}_i,)²

Usage

Given this procedure, the PRESS statistic can be calculated for a number of candidate model structures for the same dataset, with the lowest values of PRESS indicating the best structures.Models that are over-parameterised (over-fitted) would tend to give small residuals for observations included in the model-fitting but large residuals for observations that are excluded.The PRESS statistic has been extensively used in lazy learning and locally linear learning to speed-up the assessment and the selection of the neighbourhood size.^[5] ^[6]

Notes and References

Web site: Statsoft Electronic Statistics Textbook - Statistics Glossary . May 13, 2016 . May 10, 2016 . https://web.archive.org/web/20160510081437/http://www.statsoft.com/Textbook/Statistics-Glossary/P#PRESS%20Statistic . live .
Allen, D. M. (1974), "The Relationship Between Variable Selection and Data Augmentation and a Method for Prediction," Technometrics, 16, 125–127
Tarpey, Thaddeus (2000) "A Note on the Prediction Sum of Squares Statistic for Restricted Least Squares", The American Statistician, Vol. 54, No. 2, May, pp. 116–118
Web site: R Graphical Manual:Allen's PRESS (Prediction Sum-Of-Squares) statistic, aka P-square. February 27, 2018. February 27, 2018. https://web.archive.org/web/20180227214135/https://www.rdocumentation.org/packages/qpcR/versions/1.4-0/topics/PRESS. live.
Atkeson . Christopher G. . Moore . Andrew W. . Schaal . Stefan . Locally Weighted Learning . Artificial Intelligence Review . 1 February 1997 . 11 . 1 . 11–73 . 10.1023/A:1006559212014 . 9219592 . en . 1573-7462 . 25 September 2020 . 6 May 2021 . https://web.archive.org/web/20210506162200/https://link.springer.com/article/10.1023%2Fa%3A1006559212014 . live .
Bontempi . Gianluca . Birattari . Mauro . Bersini . Hugues . Lazy learning for local modelling and control design . International Journal of Control . 1 January 1999 . 72 . 7–8 . 643–658 . 10.1080/002071799220830.

PRESS statistic explained

Procedure

Usage

See also

Notes and References