Biweight midcorrelation explained

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]

Derivation

Here we find the biweight midcorrelation of two vectors

x

and

y

, with

i=1,2,\ldots,m

items, representing each item in the vector as

x1,x2,\ldots,xm

and

y1,y2,\ldots,ym

. First, we define

\operatorname{med}(x)

as the median of a vector

x

and

\operatorname{mad}(x)

as the median absolute deviation (MAD), then define

ui

and

vi

as,

\begin{align} ui&=

xi-\operatorname{med
(x)}{9

\operatorname{mad}(x)},\\ vi&=

yi-\operatorname{med
(y)}{9

\operatorname{mad}(y)}. \end{align}

Now we define the weights

(x)
w
i
and
(y)
w
i
as,
(x)
\begin{align} w
i

&=

2\right)
\left(1-u
i

2I\left(1-|ui|\right)\\ w

(y)
i

&=

2\right)
\left(1-v
i

2I\left(1-|vi|\right) \end{align}

where

I

is the identity function where,

I(x)=\begin{cases}1,&ifx>0\\ 0,&otherwise\end{cases}

Then we normalize so that the sum of the weights is 1:

\begin{align} \tilde{x}i&=

\left(xi-\operatorname{med
(x)\right)
(x)
w
i
}\\\tilde_i &= \frac.\end

Finally, we define biweight midcorrelation as,

bicor\left(x,y\right)=

m
\sum
i=1

\tilde{x}i\tilde{y}i

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.[4]

Notes and References

  1. Book: Wilcox. Rand. Introduction to Robust Estimation and Hypothesis Testing. January 12, 2012. Academic Press. 978-0123869838. 455. 3rd.
  2. Song. Lin. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 9 December 2012. 13. 328. 328 . 10.1186/1471-2105-13-328. 23217028. 3586947 . free .
  3. Web site: Langfelder. Peter. WGCNA: Weighted Correlation Network Analysis (an R package). CRAN. 2018-04-06.
  4. Web site: Khanal. Suman. Statistics: Raku module for doing statistics . GitHub. 2022-03-11.