Abstract:
|
Imputing genotypes or haplotypes can be slow and lead to impossible genotype sets among family members. We improve upon Chi et al.'s imputation method, which employs a fast majorization-minimization algorithm based on matrix completion and linkage disequilibrium between neighboring genotypes, by incorporating family relationships. This leads to fewer inconsistencies with similar accuracy and speed. We explore several approaches. The first varies and tunes a mathematical penalty on rank. The second fixes rank for a given iterate and compares results achieved from various fixed ranks. Both can include haplotype reference panels for improved imputation. To improve accuracy through Mendelian error correction and haplotype phasing, we develop pre- and post-processing tools that project to the nearest result consistent with the reference. Pre-haplotyping provides a quick imputation approximation that improves speed and accuracy, while post- haplotyping improves accuracy. The imputed haplotypes can be used as covariates in a regression framework for inference of genome regions associated with a disease/trait, thus allowing for potential interactions between adjacent genetic structures.
|