Online Program Home
My Program

Abstract Details

Activity Number: 250 - Topics in Statistical Learning
Type: Contributed
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329499 Presentation
Title: Prediction Using Machine Learning Algorithms by Small Sample Size Data
Author(s): Yan Wang* and Honghu Liu and Jian L Zhang
Companies: Field School of Public Health, UCLA and UCLA and Kaiser Permanente
Keywords: Machine Learning; XGboost; Small size; Oral health; Noise; Prediction

Machine learning methods have been used widely in health service research and biomedical research to predict the risk for disease and the summary measure of health status. For example, the Extreme Gradient Boosting (XGBoost) algorithm is used for classification and rank prediction. The machine learning algorithm usually requires large sample size of the training set, validation set and test set. The survey research always face to the problem of limited sample size to cover the response space. The response pattern distributed skewly within certain categories and other categories are seldom endorsed by the subjects. In this paper, we developed a method to manually introduce Gaussian noise into the simple random sample with replacement (SRSwR) procedure to the original response space. This bootstrap sample with random noise is used to develop the robust machine learning prediction algorithm. The prediction results for categorical variables can be evaluated by sensitivity and specificity. The prediction results for continuous variables can be evaluated by square root of the mean square error (RMSE) and correlation. Finally the method is applied to the survey for pediatric oral health.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program