Abstract:
|
Data from an extensive survey conducted by the National Center for Education Statistics (NCES) is used for predicting qualified secondary school teachers across public schools in the U.S. The SAS survey family of procedures such as Proc Surveyfreq and Proc Surveylogistic is used for all model building and analysis. The residuals from a logistic regression do not necessarily follow the normal distribution that is so often assumed in residual analysis. Furthermore, in dealing with survey data, the weights of the observations must be accounted for, as these affect the variance of the observations. To adjust for this, rather than looking at the difference in the observed and predicted values, the difference between the actual and expected counts is calculated by using the weights on each observation and the predicted probability from the logistic model for the observation. A simulation study is also performed to better understand the correct distribution of the residuals accounting for the complex survey design. The purpose is to identify which type of residuals best satisfy the assumption of normality while also accounting for the complex survey design.
|