In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.
Hoeffding's inequality is a special case of the Azuma–Hoeffding inequality and McDiarmid's inequality. It is similar to the Chernoff bound, but tends to be less sharp, in particular when the variance of the random variables is small. It is similar to, but incomparable with, one of Bernstein's inequalities.
Let be independent random variables such that
ai\leqXi\leqbi
Sn=X1+ … +Xn.
Then Hoeffding's theorem states that, for all,
\begin{align} \operatorname{P}\left(Sn-E\left[Sn\right]\geqt\right)&\leq\exp\left(-
2t2 | |||||||||||||||
|
\right)\\ \operatorname{P}\left(\left|Sn-E\left[Sn\right]\right|\geqt\right)&\leq2\exp\left(-
2t2 | |||||||||||||||
|
\right) \end{align}
Here is the expected value of .
Note that the inequalities also hold when the have been obtained using sampling without replacement; in this case the random variables are not independent anymore. A proof of this statement can be found in Hoeffding's paper. For slightly better bounds in the case of sampling without replacement, see for instance the paper by .
Letbe independent observations such thatY1,...,Yn
and\operatorname{E}(Yi)=0
. Letai\leYi\lebi
. Then, for any\epsilon>0
,[1]t>0
Suppose
ai=0
bi=1
\begin{align} \operatorname{P}\left(Sn-E\left[Sn\right]\geqt\right)&\leq
2/n)\\ \operatorname{P}\left(\left|S | |
\exp(-2t | |
n |
-E\left[Sn\right]\right|\geqt\right)&\leq2\exp(-2t2/n) \end{align}
for all
t\geq0
Hoeffding's inequality can be extended to the case of bounded from above random variables.
Let be independent random variables such that
EXi=0
Xi\leqbi
2= | |
\begin{align} C | |
i |
\left\{
2 | |
\begin{array}{ll} EX | |
i |
,&
2 | |
if EX | |
i |
\geq
2 | |
b | |
i |
,\\ \displaystyle
1 | |
4 |
\left(bi+
| |||||||
bi |
\right)2,&rm{otherwise}. \end{array}\right. \end{align}
Hoeffding's inequality for bounded from aboved random variables states that for all
t\geq0
P\left(\left|
n | |
\sum | |
i=1 |
Xi\right|\geqt\right)\leq2\exp\left(-
t2 | |||||||||
|
\right).
2 | |
EX | |
i |
\geq
2 | |
b | |
i |
i
t\geq0
P\left(\left|
n | |
\sum | |
i=1 |
Xi\right|\geqt\right)\leq2\exp\left(-
t2 | |||||||||||||||
|
\right).
The proof of Hoeffding's inequality can be generalized to any sub-Gaussian distribution. Recall that a random variable is called sub-Gaussian, if
P(|X|\geqt)\leq
-ct2 | |
2e |
,
for some
c>0
P(|X|\geqt)=0\leq
-ct2 | |
2e |
t>T
-cT2 | |
2e |
\leq
-ct2 | |
2e |
t\leqT
c=log(2)/T2
P(|X|\geqt)\leq1\leq
-cT2 | |
2e |
\leq
-ct2 | |
2e |
,
for
t\leqT
For a random variable, the following norm is finite if and only if is sub-Gaussian:
\VertX
\Vert | |
\psi2 |
:=inf\left\{c\geq0:E\left(
X2/c2 | |
e |
\right)\leq2\right\}.
Then let be zero-mean independent sub-Gaussian random variables, the general version of the Hoeffding's inequality states that:
P\left(\left|
n | |
\sum | |
i=1 |
Xi\right|\geqt\right)\leq2\exp\left(-
ct2 | |||||||||||||||
|
\right),
where c > 0 is an absolute constant.
The proof of Hoeffding's inequality follows similarly to concentration inequalities like Chernoff bounds. The main difference is the use of Hoeffding's Lemma:
Suppose is a real random variable such that
X\in\left[a,b\right]
E\left[es\left\right]\leq\exp\left(\tfrac{1}{8}s2(b-a)2\right).
Using this lemma, we can prove Hoeffding's inequality. As in the theorem statement, suppose are independent random variables such that
Xi\in[ai,bi]
Sn=X1+ … +Xn
Then for, Markov's inequality and the independence of implies:
\begin{align} \operatorname{P}\left(Sn-E\left[Sn\right]\geqt\right)&=\operatorname{P}\left(\exp(s(Sn-E\left[Sn\right]))\geq\exp(st)\right)\\ &\leq\exp(-st)E\left[\exp(s(Sn-E\left[Sn\right]))\right]\\ &=\exp(-st)
nE | |
\prod | |
i=1 |
\left[\exp(s(Xi-E\left[Xi\right]))\right]\\ &\leq\exp(-st)
n | ||
\prod | \exp( | |
i=1 |
| |||||||||||||
8 |
)\\ &=\exp\left(-st+\tfrac{1}{8}s2
n(b | |
\sum | |
i-a |
2\right) \end{align} | |
i) |
This upper bound is the best for the value of minimizing the value inside the exponential. This can be done easily by optimizing a quadratic, giving
s= | 4t | ||||||||||||||
|
.
Writing the above bound for this value of, we get the desired bound:
\operatorname{P}\left(Sn-E\left[Sn\right]\geqt\right)\leq\exp\left(-
2t2 | |||||||||||||||
|
\right).
Hoeffding's inequality can be used to derive confidence intervals. We consider a coin that shows heads with probability and tails with probability . We toss the coin times, generating samples
X1,\ldots,Xn
\operatorname{P}(H(n)\geqk)=
n | |
\sum | |
i=k |
\binom{n}{i}pi(1-p)n-i,
where is the number of heads in coin tosses.
When for some, Hoeffding's inequality bounds this probability by a term that is exponentially small in :
\operatorname{P}(H(n)-pn>\varepsilonn)\leq\exp\left(-2\varepsilon2n\right).
Since this bound holds on both sides of the mean, Hoeffding's inequality implies that the number of heads that we see is concentrated around its mean, with exponentially small tail.
\operatorname{P}\left(|H(n)-pn|>\varepsilonn\right)\leq2\exp\left(-2\varepsilon2n\right).
Thinking of
\overline{X}=
1 | |
n |
H(n)
\alpha
p
\alpha=\operatorname{P}( \overline{X}\notin[p-\varepsilon,p+\varepsilon])\leq
-2\varepsilon2n | |
2e |
Finding for opposite inequality sign in the above, i.e. that violates inequality but not equality above, gives us:
n\geq
log(2/\alpha) | |
2\varepsilon2 |
Therefore, we require at least
style | log(2/\alpha) |
2\varepsilon2 |
style(1-\alpha)
stylep\pm\varepsilon
Hence, the cost of acquiring the confidence interval is sublinear in terms of confidence level and quadratic in terms of precision. Note that there are more efficient methods of estimating a confidence interval.