Online Program

Return to main conference page
Tuesday, January 7
Tue, Jan 7, 9:00 AM - 10:45 AM
West Coast Ballroom
Statistical Learning Methods for Health Care Innovation

WITHDRAWN - Using Machine Learning to Identify Factors Affecting Neighborhood Cardiovascular Health (307820)

*Yan Li, The New York Academy of Medicine 

Keywords: Machine Learning, Identify Factors, Affecting, Neighborhood Cardiovascular Health

Background

Cardiovascular disease (CVD) is the leading cause of death in the United States, accounting for one in every four deaths. The burden of CVD is particularly high among older adults, posing a serious threat to a society of healthy aging. The adoption of healthy behaviors and effective prevention strategies are important for reducing CVD risk. In addition, neighborhood sociodemographic characteristics and social environment play an important role in shaping individual’s cardiovascular health. However, little research has been done to understand the role of unhealthy behaviors and prevention measures on neighborhood cardiovascular health. This research aims to fill the research gap by using a machine learning approach to explore the complex relationship between CVD and its related health behaviors and prevention measures at the neighborhood level.

Methods

The study sample was from the 2017 500 Cities data, which include data on the prevalence of health behaviors, prevention measures, and chronic disease measures for 28,004 census tracts that cover about one third of the population in the United States. We merged sociodemographic data from the 2011-2015 American Community Survey (ACS) 5-Year Estimates with the 500 Cities Data using matched census tracts. We used random forest—a widely used machine learning approach—to identify and rank important predictors of neighborhood cardiovascular health outcomes. We focused on 2 important cardiovascular health outcomes—neighborhood prevalence of coronary heart disease (CHD) and stroke. For each analysis, 500 trees were generated with 6 variables tried per split. We ranked all the predicting variables using mean decrease in Gini index, which is a commonly used index to measure variable importance.

Results

Our analysis showed that demographics, health behaviors, and prevention measures explained the vast majority of the variance: 93.2% for CHD and 96.0% for stroke. For CHD prevalence, the top five ordered predictors were the prevalence of taking medicine for high blood pressure control, binge drinking, being aged 65 years or older, lack of leisure-time physical activity, and obesity. For stroke prevalence, the top five ordered predictors were the prevalence of obesity, lack of leisure-time physical activity, taking medicine for high blood pressure, being black, and binge drinking.

Conclusion

Machine learning has the potential to inform public health practitioners, researchers, and policy makers in identifying neighborhood-level hotspots for designing effective and efficient programs and interventions to improve cardiovascular health and achieve healthy aging.