Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 246 - Data Science
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistical Computing
Abstract #318230
Title: Statistical Analysis on Factors Influencing Life Expectancy
Author(s): Meichen Huang* and Akash Roy
Companies: The University of Texas at Dallas and Duke University
Keywords: Life Expectancy; Missing Imputation; Multiple Linear Regression; Best subset regression model; Accuracy measures
Abstract:

In the past 15 years, there has been a tremendous amount of development in improving the human mortality rate. The concentration of living a healthier and longer life has been emphasized and implemented in the modern lifestyle. The dataset chosen from the WHO data repository documents 22 health-related factors over 193 countries that correspond to life expectancy. The research's primary purpose is to identify the contributed factors of life expectancy by constructing a multiple linear regression model. Around 83% of the data comes from developing countries. The accuracy of the prediction models for life expectancy is reasonably intensified for the developing countries comparing to the developed countries. Once the outliers were elicited and removed, we proceed with three subset selection methods to establish the regression model, including the Forward Selection, Backward Selection, and Best Subset Selection. For model accuracy measures, RMSE, R2, MAE are used to compare the three developed and three developing counties. Several ML methods, e.g., decision tree and random forest, are also used and compared for the prediction purpose.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program