Abstract:
|
Developments in genomics, i.e. the analysis of the entire genome, have led to many difficult statistical problems. Two of these will be discussed. The first is the testing of whether a comparatively short segment is significantly similar to some segment of DNA in the vast databanks, involving many species. One commonly used method for doing this is by the use of a BLAST search. The theory for such a search uses standard statistical testing theory as well as the theory of generalized random walks. An outline of the statistical aspects of these searches will be presented. The second problem relates to the analysis of expression array data.
These data describe the extent to which the genes in a genome are expressed in some tissue, and a frequently asked question is whether a gene is expressed significantly differently in normal and disease tissue. Since data arise for thousands of genes, multiple testing problems arise, and stringent approaches are required if the genome-wide Type I error is controlled at a low level. Some approaches do not attempt to control this error and rely instead on assessing the false discovery rate. Aspects of these issues will be discussed.
|