|Friday, February 16|
|PS2 Poster Session 2 and Refreshments||
Fri, Feb 16, 5:15 PM - 6:30 PM
Machine Learning Methods for Predicting Zygosity (303620)
Keywords: twin studies, machine learning, classification models
Zygosity is the characterization of twinning in terms of the combination of alleles for particular hereditary traits. The degree of genetic similarity within each twin pair is commonly classified as monozygotic (identical) and dizygotic (fraternal). The ability to estimate twin zygosity using self-report methods is important in research because DNA-based zygosity testing is often cost-prohibitive for a large sample of twins. Using data from the Washington State Twin Registry, we developed five algorithms for predicting zygosity using questionnaire data and DNA-based zygosity results for 787 twin pairs. Models were trained on 70% of the data and tested on 30% of the remaining data to determine accuracy. When compared to a logistic regression model, which accurately classified only 76% of dizygotic twins, the machine learning methods significantly improved accuracy. The random forest model performed best, correctly classifying 94% of dizygotic twins and 96% of monozygotic twins. Some twin pairs remain difficult to classify because of differences in each twins’ perception of similarity. Accurate classification models ensure that researchers are accurately analyzing twin populations.