Many methods of identifying differential expression in genes depend on testing the null hypothesis of equal mean expression for each gene across two groups, even though a difference in the mean does not imply any difference in the distribution center. This can lead to many genes considered differentially expressed that might only differ in the tails of their expression distributions. A more conservative approach is to specifically test whether distributions differ in a parameter of location that does not depend on the tails. This can be accomplished by bootstrapping outlier-rejecting estimators of location parameters. Genes identified as differentially expressed can then be used in classification.
In distinguishing microarrays from patients with different types of leukemia, the expression values of many more genes were found to differ in their means than were found to differ in their central values. The data was preprocessed using a transform that approaches a logarithmic transform for large intensities, but approaches a linear transform for small intensities, so that the effect of spurious ratios of small intensities was avoided; negative AD values were not arbitrarily truncated.
|