In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]
Here we find the biweight midcorrelation of two vectors
x
y
i=1,2,\ldots,m
x1,x2,\ldots,xm
y1,y2,\ldots,ym
\operatorname{med}(x)
x
\operatorname{mad}(x)
ui
vi
\begin{align} ui&=
xi-\operatorname{med | |
(x)}{9 |
\operatorname{mad}(x)},\\ vi&=
yi-\operatorname{med | |
(y)}{9 |
\operatorname{mad}(y)}. \end{align}
Now we define the weights
(x) | |
w | |
i |
(y) | |
w | |
i |
(x) | |
\begin{align} w | |
i |
&=
2\right) | |
\left(1-u | |
i |
2I\left(1-|ui|\right)\\ w
(y) | |
i |
&=
2\right) | |
\left(1-v | |
i |
2I\left(1-|vi|\right) \end{align}
where
I
I(x)=\begin{cases}1,&ifx>0\\ 0,&otherwise\end{cases}
Then we normalize so that the sum of the weights is 1:
\begin{align} \tilde{x}i&=
\left(xi-\operatorname{med | |
(x)\right) |
(x) | |
w | |
i |
Finally, we define biweight midcorrelation as,
bicor\left(x,y\right)=
m | |
\sum | |
i=1 |
\tilde{x}i\tilde{y}i
Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.
Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor
as part of the WGCNA package[3]
Also implemented in the Raku programming language as the function bi_cor_coef
as part of the Statistics module.[4]