Abstract:
|
In the past 15 years, there has been a tremendous amount of development in improving the human mortality rate. The concentration of living a healthier and longer life has been emphasized and implemented in the modern lifestyle. The dataset chosen from the WHO data repository documents 22 health-related factors over 193 countries that correspond to life expectancy. The research's primary purpose is to identify the contributed factors of life expectancy by constructing a multiple linear regression model. Around 83% of the data comes from developing countries. The accuracy of the prediction models for life expectancy is reasonably intensified for the developing countries comparing to the developed countries. Once the outliers were elicited and removed, we proceed with three subset selection methods to establish the regression model, including the Forward Selection, Backward Selection, and Best Subset Selection. For model accuracy measures, RMSE, R2, MAE are used to compare the three developed and three developing counties. Several ML methods, e.g., decision tree and random forest, are also used and compared for the prediction purpose.
|