Abstract:
|
De novo mutations in the probands (the first affected family member) are one main kind of causes of Primary Amenorrhea and are very useful in diagnosis and treatment of this disease. However, because of the lack of parents' data, de novo mutations cannot be identified from inherited variants by Exome sequencing. In order to detect the de novo mutations from inherited variants, we started with a fully labeled (de novo or inherited) Autism Spectrum Disorder (ASD) dataset (2317 trios). We found 4 useful features about de novo mutations and then built a classification model using methods for imbalanced dataset. A new 'retraining' (or transfer learning) method for imbalanced data was also proposed to make the model fit better in Primary Amenorrhea dataset. Using this model, we found 66 possible de novo Loss of Function mutations and 230 possible de novo missense3 mutations among 7001 rare variants of 100 probands. The result fits well with the labelled part of the Primary Amenorrhea data. Based on this result, we also got a risk ranking of genes using TADA-denovo function, which was proposed by Xin He et.al in 2013.
|