In statistics, the Fisher transformation (or Fisher z-transformation) of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficient r is near 1 or -1, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ.[1] [2] [3] The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r.
Given a set of N bivariate sample pairs (Xi, Yi), i = 1, ..., N, the sample correlation coefficient r is given by
r=
\operatorname{cov | |
(X,Y)}{\sigma |
X\sigmaY}=
| |||||||||
)(Y |
i-\bar{Y})}{\sqrt{\sum
N | |
i=1 |
(Xi-\bar{X})2}\sqrt{\sum
N | |
i=1 |
(Yi-\bar{Y})2}}.
Here
\operatorname{cov}(X,Y)
X
Y
\sigma
z={1\over2}ln\left({1+r\over1-r}\right)=\operatorname{artanh}(r),
If (X, Y) has a bivariate normal distribution with correlation ρ and the pairs (Xi, Yi) are independent and identically distributed, then z is approximately normally distributed with mean
{1\over2}ln\left({{1+\rho}\over{1-\rho}}\right),
{1\over\sqrt{N-3}},
This transformation, and its inverse
r=
\exp(2z)-1 | |
\exp(2z)+1 |
=\operatorname{tanh}(z),
Hotelling gives a concise derivation of the Fisher transformation.[4]
To derive the Fisher transformation, one starts by considering an arbitrary increasing, twice-differentiable function of
r
G(r)
N
\kappa3
\kappa | ||||
|
\kappa3=0
G
G(\rho)=\operatorname{artanh}(\rho)
Similarly expanding the mean m and variance v of
\operatorname{artanh}(r)
m =
\operatorname{artanh}(\rho)+
\rho | |
2N |
+O(N-2)
v =
1 | + | |
N |
6-\rho2 | |
2N2 |
+O(N-3)
The extra terms are not part of the usual Fisher transformation. For large values of
\rho
N
z-\operatorname{artanh | )- | |
(\rho |
\rho | |
2N |
The application of Fisher's transformation can be enhanced using a software calculator as shown in the figure. Assuming that the r-squared value found is 0.80, that there are 30 data, and accepting a 90% confidence interval, the r-squared value in another random sample from the same population may range from 0.656 to 0.888. When r-squared is outside this range, the population is considered to be different.
The Fisher transformation is an approximate variance-stabilizing transformation for r when X and Y follow a bivariate normal distribution. This means that the variance of z is approximately constant for all values of the population correlation coefficient ρ. Without the Fisher transformation, the variance of r grows smaller as |ρ| gets closer to 1. Since the Fisher transformation is approximately the identity function when |r| < 1/2, it is sometimes useful to remember that the variance of r is well approximated by 1/N as long as |ρ| is not too large and N is not too small. This is related to the fact that the asymptotic variance of r is 1 for bivariate normal data.
The behavior of this transform has been extensively studied since Fisher introduced it in 1915. Fisher himself found the exact distribution of z for data from a bivariate normal distribution in 1921; Gayen in 1951[7] determined the exact distribution of z for data from a bivariate Type A Edgeworth distribution. Hotelling in 1953 calculated the Taylor series expressions for the moments of z and several related statistics[8] and Hawkins in 1989 discovered the asymptotic distribution of z for data from a distribution with bounded fourth moments.[9]
An alternative to the Fisher transformation is to use the exact confidence distribution density for ρ given by[10] [11] where
F
\nu=N-1>1
While the Fisher transformation is mainly associated with the Pearson product-moment correlation coefficient for bivariate normal observations, it can also be applied to Spearman's rank correlation coefficient in more general cases.[12] A similar result for the asymptotic distribution applies, but with a minor adjustment factor: see the cited article for details.