In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value (typically, its expected value). The deviation or other function of the random variable can be thought of as a secondary random variable. The simplest example of the concentration of such a secondary random variable is the CDF of the first random variable which concentrates the probability to unity. If an analytic form of the CDF is available this provides a concentration equality that provides the exact probability of concentration. It is precisely when the CDF is difficult to calculate or even the exact form of the first random variable is unknown that the applicable concentration inequalities provide useful insight.
Another almost universal example of a secondary random variable is the law of large numbers of classical probability theory which states that sums of independent random variables, under mild conditions, concentrate around their expectation with a high probability. Such sums are the most basic examples of random variables concentrated around their mean.
Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them.
See main article: Markov's inequality. Let
X
a>0
\Pr(X\geqa)\leq
\operatorname{E | |
(X)}{a}. |
Note the following extension to Markov's inequality: if
\Phi
\Pr(X\geqa)=\Pr(\Phi(X)\geq\Phi(a))\leq
\operatorname{E | |
(\Phi(X))}{\Phi |
(a)}.
See main article: Chebyshev's inequality. Chebyshev's inequality requires the following information on a random variable
X
\operatorname{E}[X]
\operatorname{Var}[X]=\operatorname{E}[(X-\operatorname{E}[X])2]
Then, for every constant
a>0
\Pr(|X-\operatorname{E}[X]|\geqa)\leq
\operatorname{Var | |
[X]}{a |
2},
or equivalently,
\Pr(|X-\operatorname{E}[X]|\geqa ⋅ \operatorname{Std}[X])\leq
1 | |
a2 |
,
where
\operatorname{Std}[X]
X
Chebyshev's inequality can be seen as a special case of the generalized Markov's inequality applied to the random variable
|X-\operatorname{E}[X]|
\Phi(x)=x2
See main article: Vysochanskij–Petunin inequality. Let X be a random variable with unimodal distribution, mean μ and finite, non-zero variance σ2. Then, for any
Pr(\left|X-\mu\right|\geqλ\sigma)\leq
4 | |
9λ2 |
.
(For a relatively elementary proof see e.g.[1]).
For a unimodal random variable
X
r\geq0
Pr(X-E[X]\geqr)\leq \begin{cases} \dfrac{4}{9}\dfrac{\operatorname{Var}(X)}{r2+\operatorname{Var}(X)}&forr2\geq\dfrac{5}{3}\operatorname{Var}(X),\\[5pt] \dfrac{4}{3}\dfrac{\operatorname{Var}(X)}{r2+\operatorname{Var}(X)}-\dfrac{1}{3}&otherwise. \end{cases}
See main article: Paley–Zygmund inequality. In contrast to most commonly used concentration inequalities, the Paley-Zygmund inequality provides a lower bound on the deviation probability.
See main article: Cantelli's inequality.
See main article: Gauss's inequality.
See main article: Chernoff bound. The generic Chernoff bound[3] requires the moment generating function of
X
tX | |
M | |
X(t):=\operatorname{E}\left[e |
\right].
t>0
\Pr(X\geqa)\leq
\operatorname{E | |
[e |
tX]}{eta
and for every
t<0
\Pr(X\leqa)\leq
\operatorname{E | |
[e |
tX]}{eta
There are various Chernoff bounds for different distributions and different values of the parameter
t
See main article: article and Mill's Inequality. Let
Z\simN(0,1)
See main article: Hoeffding's inequality, Azuma's inequality, McDiarmid's inequality, Bennett's inequality and Bernstein inequalities (probability theory). Let
X1,X2,...,Xn
ai\leqXi\leqbi
ci:=bi-ai
\foralli:ci\leqC
Sn
En
Vn
Sn:=
n | |
\sum | |
i=1 |
Xi
En:=\operatorname{E}[Sn]=
n | |
\sum | |
i=1 |
\operatorname{E}[Xi]
Vn:=\operatorname{Var}[Sn]=
n | |
\sum | |
i=1 |
\operatorname{Var}[Xi]
It is often interesting to bound the difference between the sum and its expected value. Several inequalities can be used.
1. Hoeffding's inequality says that:
\Pr\left[|Sn-En|>t\right]\le2\exp\left(-
2t2 | |||||||||||||||
|
\right)\le2\exp\left(-
2t2 | |
nC2 |
\right)
2. The random variable
Sn-En
S0-E0=0
\Pr\left[|Sn-En|>t\right]<2\exp\left(-
2t2 | |||||||||||||||
|
\right)<2\exp\left(-
2t2 | |
nC2 |
\right)
3. The sum function,
Sn=f(X1,...,Xn)
bi-ai<C
\Pr\left[|Sn-En|>t\right]<2\exp\left(-
2t2 | |||||||||||||||
|
\right)<2\exp\left(-
2t2 | |
nC2 |
\right)
4. Bennett's inequality offers some improvement over Hoeffding's when the variances of the summands are small compared to their almost-sure bounds C. It says that:
\Pr\left[|Sn-En|>t\right]\leq 2\exp\left[-
Vn | h\left( | |
C2 |
Ct | |
Vn |
\right)\right],
h(u)=(1+u)log(1+u)-u
5. The first of Bernstein's inequalities says that:
\Pr\left[|Sn-En|>t\right]<2\exp\left(-
t2/2 | |
Vn+C ⋅ t/3 |
\right)
6. Chernoff bounds have a particularly simple form in the case of sum of independent variables, since
t ⋅ Sn | |
\operatorname{E}[e |
]=
n | |
\prod | |
i=1 |
t ⋅ Xi | |
{\operatorname{E}[e |
]}
For example,[6] suppose the variables
Xi
Xi\geqE(Xi)-ai-M
1\leqi\leqn
\Pr[Sn-En<-λ]\leq\exp\left(-
λ2 | ||||||||||||||||||
|
\right)
If
Xi
Xi\leqE(Xi)+ai+M
\Pr[Sn-En>λ]\leq\exp\left(-
λ2 | ||||||||||||||||||
|
\right)
If
Xi
|Xi|\leq1
\sigma2
Xi
\Pr[|Sn|\geqk\sigma]\leq
-k2/4n | |
2e |
for0\leqk\leq2\sigma.
7. Similar bounds can be found in: Rademacher distribution#Bounds on sums
The Efron–Stein inequality (or influence inequality, or MG bound on variance) bounds the variance of a general function.
Suppose that
X1...Xn
X1'...Xn'
Xi'
Xi
i
Let
X=(X1,...,Xn),X(i)=(X1,...,Xi-1,Xi',Xi+1,...,Xn).
Var(f(X))\leq
1 | |
2 |
n | |
\sum | |
i=1 |
E[(f(X)-f(X(i)))2].
A proof may be found in e.g.,.[7]
Bretagnolle–Huber–Carol Inequality bounds the difference between a vector of multinomially distributed random variables and a vector of expected values.[8] [9] A simple proof appears in [10] (Appendix Section).
If a random vector
(Z1,Z2,Z3,\ldots,Zn)
(p1,p2,\ldots,pn)
Z1+Z2+...+Zn=M,
\Pr\left(
n | |
\sum | |
i=1 |
|Zi-Mpi|\geq2M\varepsilon\right) \leq2n
-2M\varepsilon2 | |
e |
.
This inequality is used to bound the total variation distance.
The Mason and van Zwet inequality[11] for multinomial random vectors concerns a slight modification of the classical chi-square statistic.
Let the random vector
(N1,\ldots,Nk)
n
(p1,\ldots,pk)
pi>0
i<k.
C>0
\delta>0
a,b,c>0,
n\geq1
λ,p1,\ldots,pk-1
λ>Cnmin\{pi|1\leqi\leqk-1\}
k-1 | |
\sum | |
i=1 |
pi\leq1-\delta,
\Pr\left(
k-1 | |
\sum | |
i=1 |
| |||||||||||||
npi |
>λ\right) \leqaebk-cλ.
See main article: Dvoretzky–Kiefer–Wolfowitz inequality.
The Dvoretzky–Kiefer–Wolfowitz inequality bounds the difference between the real and the empirical cumulative distribution function.
Given a natural number
n
X1,X2,...,Xn
Fn
Fn(x)=
1n | |
\sum |
n | |
i=1 |
1 | |
\{Xi\leqx\ |
F(x)
X
x
Fn(x)
x
Then
\Pr\left(\supx\inRl(Fn(x)-F(x)r)>\varepsilon\right)\le
-2n\varepsilon2 | |
e |
forevery\varepsilon\geq\sqrt{\tfrac1{2n}ln2}.
Anti-concentration inequalities, on the other hand, provide an upper bound on how much a random variable can concentrate, either on a specific value or range of values. A concrete example is that if you flip a fair coin
n
1 | |
\sqrt{n |
\beta,\delta>0
C>0
k
2n(1
x\in\{\pm1\}n
\Pr\left(\langlex,Y\rangle=k\right)\le
C | |
\sqrt{n |
Y
\{\pm1\}n
Such inequalities are of importance in several fields, including communication complexity (e.g., in proofs of the gap Hamming problem[13]) and graph theory.[14]
An interesting anti-concentration inequality for weighted sums of independent Rademacher random variables can be obtained using the Paley–Zygmund and the Khintchine inequalities.[15]