Biweight midcorrelation explained

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.^[1]

Derivation

Here we find the biweight midcorrelation of two vectors

and

, with

i=1,2,\ldots,m

items, representing each item in the vector as

x_1,x_2,\ldots,x_m

and

y_1,y_2,\ldots,y_m

. First, we define

\operatorname{med}(x)

as the median of a vector

and

\operatorname{mad}(x)

as the median absolute deviation (MAD), then define

u_i

and

v_i

as,

\begin{align} u_i&=

	x_i-\operatorname{med
	(x)}{9

\operatorname{mad}(x)},\\ v_i&=

	y_i-\operatorname{med
	(y)}{9

\operatorname{mad}(y)}. \end{align}

Now we define the weights

	(x)
w
	i

and

	(y)
w
	i

as,

	(x)
\begin{align} w
	i

	2\right)
\left(1-u
	i

²I\left(1-|u_{i|\right)\\
w}

	(y)

	i

	2\right)
\left(1-v
	i

²I\left(1-|v_{i|\right)
\end{align}}

where

is the identity function where,

I(x)=\begin{cases}1,&ifx>0\\ 0,&otherwise\end{cases}

Then we normalize so that the sum of the weights is 1:

\begin{align} \tilde{x}_i&=

	\left(x_i-\operatorname{med
	(x)\right)

	(x)
w
	i

}\\\tilde_i &= \frac.\end

Finally, we define biweight midcorrelation as,

bicor\left(x,y\right)=

	m
\sum
	i=1

\tilde{x}_i\tilde{y}_i

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,^[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package^[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.^[4]

Notes and References

Book: Wilcox. Rand. Introduction to Robust Estimation and Hypothesis Testing. January 12, 2012. Academic Press. 978-0123869838. 455. 3rd.
Song. Lin. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 9 December 2012. 13. 328. 328 . 10.1186/1471-2105-13-328. 23217028. 3586947 . free .
Web site: Langfelder. Peter. WGCNA: Weighted Correlation Network Analysis (an R package). CRAN. 2018-04-06.
Web site: Khanal. Suman. Statistics: Raku module for doing statistics . GitHub. 2022-03-11.