Activity Number:
|
250
- Topics in Statistical Learning
|
Type:
|
Contributed
|
Date/Time:
|
Monday, July 30, 2018 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #329499
|
Presentation
|
Title:
|
Prediction Using Machine Learning Algorithms by Small Sample Size Data
|
Author(s):
|
Yan Wang* and Honghu Liu and Jian L Zhang
|
Companies:
|
Field School of Public Health, UCLA and UCLA and Kaiser Permanente
|
Keywords:
|
Machine Learning;
XGboost;
Small size;
Oral health;
Noise;
Prediction
|
Abstract:
|
Machine learning methods have been used widely in health service research and biomedical research to predict the risk for disease and the summary measure of health status. For example, the Extreme Gradient Boosting (XGBoost) algorithm is used for classification and rank prediction. The machine learning algorithm usually requires large sample size of the training set, validation set and test set. The survey research always face to the problem of limited sample size to cover the response space. The response pattern distributed skewly within certain categories and other categories are seldom endorsed by the subjects. In this paper, we developed a method to manually introduce Gaussian noise into the simple random sample with replacement (SRSwR) procedure to the original response space. This bootstrap sample with random noise is used to develop the robust machine learning prediction algorithm. The prediction results for categorical variables can be evaluated by sensitivity and specificity. The prediction results for continuous variables can be evaluated by square root of the mean square error (RMSE) and correlation. Finally the method is applied to the survey for pediatric oral health.
|
Authors who are presenting talks have a * after their name.