Online Program

Return to main conference page

All Times ET

Program is Subject to Change

Monday, June 14
Mon, Jun 14, 10:30 AM - 12:00 PM
Topics in Classification and Frame Development

Predicting the number of freshman and graduates at Universidad Nacional de Rosario using random forest techniques (308135)

Maria Belen Allasia, Universidad Nacional de Rosario 
Daniela Fernanda Dianda, Universidad Nacional de Rosario 
*Diego Marfetan Molina, Universidad Nacional de Rosario 
Marta Beatriz Quaglino, Universidad Nacional de Rosario 
Maria Eugenia Tesser, Universidad Nacional de Rosario 

Keywords: Regression Trees, Data Mining, Time Series, Forecast, Higher Education

The goal of this study is to contribute to the quality of the global management of Universidad Nacional de Rosario (UNR) through the implementation of objective, systematic and comparable quality measures that enable the supervision of the institution’s situation, carrying out strategic plans that favorably impact both academic activities and the assignment of necessary material resources and infrastructure, efficiently and transparently administering UNR’s economic resources. In this work we obtain predictions for two outcomes of interest: i) the number of undergraduate students who will enroll during 2018 at UNR, and ii) the number of students who graduated from UNR during 2018. We achieve this by applying recently proposed random forest techniques, aimed at periodically collected data. This method proves to be efficient even when the time series used for forecasting can be regarded as short. One-step ahead predictive models were developed by means of the random forest technique using the lagged values of the variables of interest as predictors. We first determine the optimal number of lagged variables for each one of the 12 schools constituting UNR. Afterwards, we built the random forest model including the 1982-2017 series in the training set, make predictions for 2018 and compute the absolute percentage error between predictions and real observations, which can be found online in databases made available by the UNR’s Statistics Department. Computations were performed using the R programming language. Adding up the predictions yields a total of 15,953 forecasted first-year students, only 0.14% away from the observed value of 15,931; moreover, the predicted number of college graduates is 3,070, which is 4.36% away from the real value of 3,210. Thus, the overall accuracy of the predictions is very satisfactory. The application of this technique allowed us to find interesting practical results that contribute to meeting the goals of this research study.