Freedman–Diaconis rule explained

In statistics, the Freedman - Diaconis rule can be used to select the width of the bins to be used in a histogram.^[1] It is named after David A. Freedman and Persi Diaconis.

For a set of empirical measurements sampled from some probability distribution, the Freedman–Diaconis rule is designed approximately minimize the integral of the squared difference between the histogram (i.e., relative frequency density) and the density of the theoretical probability distribution.

In detail, the Integrated Mean Squared Error (IMSE) is

IMSE=E\left[\int_I(H(x)-f(x))²\right]

where

is the histogram approximation of

on the interval

computed with

data points sampled from the distribution

E[ ⋅ ]

denotes the expectation across many independent draws of

data points. Under mild conditions, namely that

and its first two derivatives are

L²

, Freedman and Diaconis show that the integral is minimised by choosing the bin width

h^*=\left(6/

	infty
\int
	-infty

f'(x)²dx\right)^1/3n^-1/3

A formula which was derived earlier by Scott.^[2] Swapping the order of the integration and expectation is justified by Fubini's Theorem. The Freedman–Diaconis rule is derived by assuming that

is a Normal distribution, making it an example of a normal reference rule. In this case

\intf'(x)²=(4\sqrt{\pi}\sigma³⁾^-1

.^[3]

Freedman and Diaconis use the interquartile range to estimate the standard deviation:

\sigma\sim\phi^-1(0.75)-\phi^-1(0.25)

^[4] where

\Phi

is the cumulative distribution function for a normal density. This gives the rule

Binwidth=2{IQR(x)\over{\sqrt[3]{n}}}

where

\operatorname{IQR}(x)

is the interquartile range of the data and

is the number of observations in the sample

. In fact if the normal density is used the factor 2 in front comes out to be

\sim2.59

, but 2 is the factor recommended by Freedman and Diaconis.

Other approaches

With the factor 2 replaced by approximately 2.59, the Freedman–Diaconis rule asymptotically matches Scott's Rule for data sampledfrom a normal distribution.

Another approach is to use Sturges's rule: use a bin width so that there are about

1+log_2n

non-empty bins, however this approach is not recommended when the number of data points is large.For a discussion of the many alternative approaches to bin selection, see Birgé and Rozenholc.^[5]

Notes and References

Freedman. David. Diaconis. Persi. December 1981. On the histogram as a density estimator: L₂ theory. Probability Theory and Related Fields. 57. 4. 453–476. 10.1007/BF01025868. 0178-8051. David Freedman (statistician). Persi Diaconis. 10.1.1.650.2473. 14437088 .
10.1093/biomet/66.3.605. D.W. Scott . 1979. On optimal and data-based histograms. Biometrika. 66. 3. 605–610 . 2335182.
Scott . D.W. . 2009 . Sturges' rule . WIREs Computational Statistics . 1 . 3. 303–306 . 10.1002/wics.35. 197483064 .
10.1002/wics.103. D.W. Scott . 2010. Scott's Rule. Wiley. Wiley Interdisciplinary Reviews: Computational Statistics. 2. 4. 497–502.
Birgé . L. . Rozenholc . Y. . 2006 . How many bins should be put in a regular histogram . ESAIM: Probability and Statistics . 10 . 24–45 . 10.1051/ps:2006001. 10.1.1.3.220 . free .