Abstract:
|
With the introduction of large genetic studies that interrogate many thousands of positions across the human genome, many markers have produced more than the three genotype clusters expected from biallelic SNPs. While some complicated genetic structures such as copy number variation (CNV) and multi-allelic markers are known to produce data with features beyond the standard three-cluster architecture, other patterns with unknown causes have appeared consistently in human genetic data as well. These multi-cluster data spaces must be identified as having more than three clusters, and the complex clusters must be located correctly. These clusters may take any size, shape, or position in the genotyping space. We have developed a clustering algorithm that addresses these clustering issues, while allowing the freedom necessary to handle unknown cluster size, shape, and position.
|