Online Program

Return to main conference page
Friday, May 31
Machine Learning
Machine Learning E-Posters, II
Fri, May 31, 3:00 PM - 4:00 PM
Grand Ballroom Foyer

Statistical Approaches for Identifying Untargeted Metabolites Prognostic for Kidney Disease Progression in Type 2 Diabetic Patients: Application to the Chronic Renal Insufficiency Cohort Study (305178)

Manjula Darshi, Department of Medicine, University of Texas Health Science Center at San Antonio 
Tobias Fuhrer, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland 
Brian Kwan, UCSD Moores Cancer Center 
Daniel Montemayor, Department of Medicine, University of Texas Health Science Center at San Antonio 
Loki Natarajan, Department of Family Medicine and Public Health, University of California, San Diego 
Kumar Sharma, Department of Medicine, University of Texas Health Science Center at San Antonio 
*Jing Zhang, UCSD Moores Cancer Center 

Keywords: Type 2 diabetes, filtering, Lasso, random forest

Background: Diabetic kidney disease (DKD) is a major comorbidity of Type 2 diabetes (T2DM). There is an urgent need to identify novel biomarkers that can reliably predict future DKD. Sample: Urine samples from 995 T2DM CRIC patients and 198 quality controls (QC) were assayed in duplicate for relative metabolite abundance yielding 15434 untargeted features (1899 annotated). Data Processing: We developed stringent filtering criteria to eliminate noisy features. Using technical duplicate QC samples, we computed Spearman & Pearson correlations (QC CC), intraclass correlation (QC ICC) and coefficient of variation (QC CV) for each metabolite. We used the 995 subjects to calculate intraclass correlations (CRIC ICC) for each metabolite. Metabolites with low reliability (QC CC < 0.85, QC ICC = 0.05, QC CV = 0.05), or low biological variation CRIC ICC < 0.35 were excluded. Statistical Modeling: After filtering, we fit prognostic models for kidney function decline (defined as eGFR slope), using penalized (Lasso) and machine-learning (Random forest) models, with metabolites and clinical predictors (age, gender, race, smoking, baseline BMI, blood pressure, HbA1c, eGFR, albuminuria). The models with lowest prediction error were further evaluated on the time-to-ESRD outcome via C-statistics. Five-fold cross validation was repeated 100 times to obtain the median and 95% CI of c-statistics. Results: The sample was 56% male, with mean (SD) age 59.9(9.4) yrs, eGFR 40.6(11.2) ml/min/1.732, HbA1C 7.6 (1.5)% and annual eGFR slope -1.8(1.9). After filtering, we had ~2000 reliably measured features (700 annotated). The eGFR slope models selected 9 - 122 features depending on lasso penalty and random forest variable importance metric. The best ESRD model with 20 metabolites & 9 clinical factors, had median (95% CI) c-statistic of 0.85 (0.85, 0.86). Conclusion: Modern statistical methods applied to untargeted metabolomics can reveal novel insights in DKD.