Online Program

Return to main conference page

All Times EDT

Friday, June 5
Practice and Applications
Practice and Applications Posters, Part 1
Fri, Jun 5, 10:00 AM - 1:00 PM
TBD
 

Two Notes About the Two Faces of R-Squared (308224)

*Gyasi K Dapaa, Indeed Inc 

Keywords: R-Squared, Model Validation, Regression Models

R-squared can be estimated in two main ways: 1)Quotient of the variances of the predicted and actual outcomes and 2)Difference between unity and the ratio of the variance of the residual error to that of actual outcomes. These two methods yield identical results and are effective linear model validation measures under two conditions: 1)In-sample validation: when they are computed on the same data on which the models are fitted; and 2)When the model parameters directly estimated from data are used in the R-squared computations. However, in real world, at least one of the above conditions is almost always violated: It’s recommended practice for modern-day scientists to rather validate models on unseen data (i.e. out-of-sample validation): Most model validation in the machine learning era involve computing goodness of fit metrics on unseen data using parameters from competing models. Also, it’s common for scientists to select away from model parameters estimated from data for reasons other than statistical significance. For instance, in insurance, an actuary can adjust any subset of the estimated model factors for reasons related to marketing, underwriting, regulation or any other he or she deems relevant. When at least one of the above two conditions is violated, the two R-squared methods, contrary to what have been discussed in statistical textbooks, yield different results, some of which are too consequential to ignore. My paper discusses two of them.