Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 231 - SPEED: SPAAC SESSION I
Type: Topic-Contributed
Date/Time: Wednesday, August 11, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistics in Epidemiology
Abstract #318822
Title: Machine Learning Algorithm for Diabetes Prediction Using Social Risk Factors in a Nationally Representative Data Set
Author(s): Srikanta Banerjee* and Matthew K Jones
Companies: Walden University and Northwest Emergent Solutions
Keywords: NHANES; Surveillance; Social Disparities; Diabetes; Equity
Abstract:

Diabetes is a significant public health problem which significantly affects vulnerable populations. Social determinants of health are critical to understanding how disadvantaged groups face barriers. While the predictors of diabetes are well established, the relative importance of each sociodemographic risk is unclear. We proposed a machine learning (ML)-based system for predicting diabetes disease using a nationally representative sample. We used NHANES 2009-10 data to study cross-sectional associations between social (age, income level, education, marital status, race, employment status) and health (smoking status and depression) risk factors and diabetes. We applied several machine learning algorithms, in which the classifiers had a training-to-test split of 80% to 20%. In comparison to naïve Bayes (NB) and decision tree (DT), we found random forest plots to have superior accuracy to predict diabetes. Age had the strongest association with diabetes; income and education, had the second strongest association with diabetes. While age was expected to be a strong predictor, income level had a high relative importance. Socioeconomic status should be applied to disease prediction.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program