Biweight midcorrelation

What is biweight midcorrelation

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.

This correlation was suggested by the author of Weighted Gene Coexpression Network Analysis (WGCNA)

Derivation

x,yR1×m\underline{x}, \underline{y}\in\R^{1\times m}

ui=ximed(x)9mad(x)u_i={{x_i-med(x)}\over{9mad(x)}}
vi=yimed(y)9mad(y)v_i={{y_i-med(y)}\over{9mad(y)}}

where,

med(x)med(x): median of xx

mad(x)mad(x): median absolute deviance of xx

Define weights

wi(x)=(1ui2)2I(1ui)w_i^{(x)}=(1-u_i^2)^2I(1-|u_i|)

wi(y)=(1vi2)2I(1vi)w_i^{(y)}=(1-v_i^2)^2I(1-|v_i|)

where,

I(x)=1I(x)=1 if x>0x>0 else 00

Then the normalized vector so that the sum of the weights is 1,

x~i=(ximed(x))wi(x)j=1m[(xjmed(x))wj(x)]2\tilde{x}_i={{(x_i-med(x))w_i^{(x)}}\over{\sqrt{\sum_{j=1}^m{[(x_j-med(x))w_j^{(x)}]^2}}}}

y~i=(yimed(y))wi(y)j=1m[(yjmed(y))wj(y)]2\tilde{y}_i={{(y_i-med(y))w_i^{(y)}}\over{\sqrt{\sum_{j=1}^m{[(y_j-med(y))w_j^{(y)}]^2}}}}

bicor(x,y)=i=1mx~iy~ibicor(x, y)=\sum_{i=1}^m{\tilde{x}_i\tilde{y}_i}

Since biweight midcovariance estimator is both resistant and robust of efficiency, it is a robust statistic.

Half-thresholding method (BMHT)

The soft thresholding method in WGCNA is good for considering the continuity of connectivity, but it is not a good approach when there are so many noise values in betweeness. Ultimately, we want to check the two data set, normal data and disease data so that we can use the two informations.

Calculate the bicors separtely from the 2 data set under normal condition and disease condition. After calculating all the pairs of each two data set, we can apply thresholding to each pair two times. If there is no greater value than the threshold among the two correlation coefficients, then the connectivity is non-informative correlation pair.

After filtering the non-informative correlation pairs, we can caculate the differentical coexpression (dc) value for the two conditions using the following equation.

dci(BMHT)=(xi1yi1)2+(xi2yi2)2+...+(xinyin)2ndc_i(BMHT)= \sqrt{{{(x_{i1}-y_{i1})^2+(x_{i2}-y_{i2})^2+...+(x_{in}-y_{in})^2}\over{n}}}

Where nn is the number of the module filtered out the non-informative correlation pairs. This calculates the average coexpression change between a gene and its informative coexpression genes. Then we can use the dc values to rank genes.

References

NCBI - WWW Error Blocked Diagnostic
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271563/
Biweight Midcovariance
https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/biwmidc.htm
Biweight Scale
https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/biwscale.htm