In statistical theory, Chauvenet's criterion (named for William Chauvenet[1]) is a means of assessing whether one piece of experimental data from a set of observations is likely to be spurious – an outlier.[2]
The idea behind Chauvenet's criterion finds a probability band that reasonably contains all n samples of a data set, centred on the mean of a normal distribution. By doing this, any data point from the n samples that lies outside this probability band can be considered an outlier, removed from the data set, and a new mean and standard deviation based on the remaining values and new sample size can be calculated. This identification of the outliers will be achieved by finding the number of standard deviations that correspond to the bounds of the probability band around the mean (
Dmax
where
Dmax
| ⋅ |
x
\barx
sx
In order to be considered as including all
n
n-\tfrac12
n=3
n-\tfrac12
n=3
n
n-\tfrac12
n-1
n=3
n
n-1
P
n-\tfrac12
n
where
P
n
The quantity
\tfrac1{2n}
P
P
where
Pz
n
Eq.1 is analogous to the
Z
where
Z
Z
x
\mu=0
\sigma=1
Based on Eq.4, to find the
Dmax
Pz
Z
Dmax
Pz
Dmax
Dmax
To apply Chauvenet's criterion, first calculate the mean and standard deviation of the observed data. Based on how much the suspect datum differs from the mean, use the normal distribution function (or a table thereof) to determine the probability that a given data point will be at the value of the suspect data point. Multiply this probability by the number of data points taken. If the result is less than 0.5, the suspicious data point may be discarded, i.e., a reading may be rejected if the probability of obtaining the particular deviation from the mean is less than
\tfrac1{2n}
For instance, suppose a value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50, and we want to find out if 50 is an outlier.
First, we find
Pz
Pz=1-
| |||||||
Then we find
Dmax
Pz
Dmax=Q(Pz) ≈ 1.7317
Then we find the z-score of 50.
z= | 50-\barx | = |
sx |
50-16.67 | |
16.34 |
≈ 2.04
From there we see that
z>Dmax
Another method for eliminating spurious data is called Peirce's criterion. It was developed a few years before Chauvenet's criterion was published, and it is a more rigorous approach to the rational deletion of outlier data.[3] Other methods such as Grubbs's test for outliers are mentioned under the listing for Outlier.
Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while Chauvenet's criterion provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known.