All Times ET
Keywords: Logistic Regression, Classification, Sparse Data
Several Authors (Walker and Smith 2020) have researched logistic-regression classifiers in presence of actual or induced data sparsity. We center on computing logistic regression classifiers for sparse contingency tables classified in part by discrete latent variables as is common in record-linkage applications. Such tables can be high-dimensional and typically include a large proportion of zero cells. Identifying logistic regressions to give a probabilistic characterization of the cells in the table can be challenging. Often the preferred nominal model is not estimable because of sparsity. The number of alternative models can be large. We extend the methods of Fienberg and Rinaldo(2012) who estimate log-linear models in presence of "likelihood zeros". We use their approach to identify estimable logistic regression models that are "optimal" among all submodels of the nominal model in the sense they exploit all the residual information characterizing the nominal model and do not introduce constraints on the surviving parameters.