Online Program

Return to main conference page

All Times ET

Thursday, June 3
Practice and Applications
Data-Driven Healthcare
Thu, Jun 3, 1:10 PM - 2:45 PM

Identification of latent relationships between disability rates and socio-geographic variables in veterans utilizing Machine Learning methods (309781)

*Gina McKernan, University of Pittsburgh 

Keywords: Socio-geographic, machine learning, disability, veterans, American Community Survey, CHAID, Neural networks

Intro This study evaluated the predictive ability of household and census-tract level sociodemographic and health status variables on veteran disability status, using the American Community Survey (ACS) and CDC 500 cities project (CDC) data sources. We were particularly interested in the performance and classification accuracy of Machine Learning (ML) models, such as neural networks (NN), Chi-Square Automatic Interaction Detector (CHAID), and Bayesian Network Analysis (BNA) methodologies. Methods The two independent data sets were obtained from (ACS) and (CDC). Data transformations and manipulations were performed and the 2 data sets were merged by census tract, using SAS 9.4. Disability status was coded as binary (yes/no). Feature selection was performed on the combined data set in order to identify a subset of related features, using SPSS Modeler. NN, BNA, and CHAID decision trees were constructed iteratively and coincidence matrices were obtained to examine the predicted vs. actual rate of disability classification for each model. Results Of the 2,568,009 individuals in the dataset, we identified 170,474 veterans; 17.6% (n=29,958) flagged for having a disability and 84% (n=140,516) reporting no disability. 117 features from the 513 total set were included in modeling building. CHAID, NN, and BNA correctly predicted disability at least 80% of the time. A BNA that included household level variables, such as household income, employment, work experience, type of health insurance, and number of vehicles; combined with geographic health status variables, such as the per capita incidence high blood pressure medication prescribed and annual physical and dental checkup adherence resulted in 81.21% correct classifications of disability status. Conclusion The use of ML methods and advanced statistical techniques to combine, manipulate, and examine variable interactions across datasets containing both socio-geographic and disability data provides a level