Abstract:
|
It has been suggested that the future of identifying disease genes reside in association studies, where thousands of SNPs will be typed in well-matched control and case populations. To make this process feasible, very efficient SNP typing and scoring methods will be required. Typically, the typing of a single SNP will result in two quantitative measurements, relating to the presence/absence of the first and second bases (say A and T), respectively. Thresholds are established to determine when a base is considered present. However, analyzing these two quantities separately can be highly inefficient, and it makes more sense to treat this as a classification problem, where each individual has one of four possible genotypes: AA, AT, TT or failure. We present here a Bayesian classifier based on an underlined mixture model that takes into consideration prior information available on the measurements to make an accurate classification. As a by-product, posterior probabilities are obtained that can be used in improving the scoring accuracy. The method is easily generalizable to allow for more than one measurement per base and can be applied to various typing methods.
|