Online Program

Return to main conference page

All Times EDT

Friday, September 25
Fri, Sep 25, 11:45 AM - 12:45 PM
Virtual
Poster Session

PS40-Supervised Machine Learning to Identify Social Behavioral Health Care Risks for COVID-19--Related Mortality and Inform Targets for Treatment and Prevention (301156)

View Presentation

Brian Griner, Learning Labs, Data Science & Learning Systems LLC 
*Chelsea Jin, Bristol Myers Squibb 

Keywords: supervised learning, artificial intelligence, prediction, COVID-19

The COVID-19 pandemic has made unprecedented impacts on the society. In the United States, until April 13, 2020, the total number of confirmed cases was 576,774, and the mortality rate was 4.05%. To ensure the data representativeness, this work was carried on the readily aggregated state level data.

The aim of this work was to use supervised learning algorithms to uncover the key risks from the social economic, behavioral and healthcare risks associated with COVID-19 related mortality in the United States.

The data were collected from the official public postings of a few healthcare research centers and agencies, including the incidence and mortality rates of 2020 from the JHU CSSE, the BRFSS of 2018, the US Health Ranking Data prior to 2020, and the hospital capacity data from the Harvard Global Health Institute. All data had been readily aggregated at the state level with a total of 51 records and 242 risks, and the risks were collected prior to the coronavirus outbreak, so the work was done with the state level data and carried by a cross-sectional fashion.

The data was split into training (n = 36) and test (n = 15) samples. Three-fold cross-validation (CV) was used to train different algorithms to determine which algorithm was most predictive of COVID-19 related mortality risk. The algorithms tested were: Lasso, Ridge regression, KNN, SVM, GBM, CART, RFR and NN. A final model to examine the relative importance of different model inputs in predicting COVID-19 related mortality risk based on permutation and partial dependence was selected using the minimum mean squared prediction error on the test sample. The data process and modeling were based on Python 3 and R.

23 over 242 factors were identified able to produce the smallest MSE within 1.

Supervised machine learning algorithms were able to identify a subset of model inputs that can be predictive of COVID-19, and the models, e.g. GBM can address data sparsity.