Biweight midcorrelation
What is biweight midcorrelation
In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.
This correlation was suggested by the author of Weighted Gene Coexpression Network Analysis (WGCNA)
Derivation
where,
: median of
: median absolute deviance of
About the constant factor 9…
Mosteller and Tukey suggest utilizing the MAD or interquartile range for preliminary analysis where moderate efficiency in diverse circumstances is satisfactory.
Define weights
About
The weights goes to 1, if is near
The weights goes to 0, if differs from more than
The element of i is outlier when
where,
if else
Then the normalized vector so that the sum of the weights is 1,
→
Since biweight midcovariance estimator is both resistant and robust of efficiency, it is a robust statistic.
Half-thresholding method (BMHT)
The soft thresholding method in WGCNA is good for considering the continuity of connectivity, but it is not a good approach when there are so many noise values in betweeness. Ultimately, we want to check the two data set, normal data and disease data so that we can use the two informations.
Calculate the bicors separtely from the 2 data set under normal condition and disease condition. After calculating all the pairs of each two data set, we can apply thresholding to each pair two times. If there is no greater value than the threshold among the two correlation coefficients, then the connectivity is non-informative correlation pair.
After filtering the non-informative correlation pairs, we can caculate the differentical coexpression (dc) value for the two conditions using the following equation.
Where is the number of the module filtered out the non-informative correlation pairs. This calculates the average coexpression change between a gene and its informative coexpression genes. Then we can use the dc values to rank genes.
References