Data processing inequality explained
The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.
Statement
, implying that the conditional distribution of
depends only on
and is
conditionally independent of
. Specifically, we have such a Markov chain if the joint probability mass function can be written as
p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)
In this setting, no processing of
, deterministic or random, can increase the information that
contains about
. Using the
mutual information, this can be written as :
with the equality
if and only if
. That is,
and
contain the same information about
, and
also forms a Markov chain.
[1] Proof
One can apply the chain rule for mutual information to obtain two different decompositions of
:
I(X;Z)+I(X;Y\midZ)=I(X;Y,Z)=I(X;Y)+I(X;Z\midY)
By the relationship
, we know that
and
are conditionally independent, given
, which means the
conditional mutual information,
. The data processing inequality then follows from the non-negativity of
.
See also
External links
- http://www.scholarpedia.org/article/Mutual_information
Notes and References
- Book: Elements of information theory . Cover . Thomas . 2012 . John Wiley & Sons.