Abstract:

There has been an enormous advance in statistical computing since the time of R.A. Fisher. Inference methods based on likelihoood are now implemented with computing tools like the EM algorithm and MCMC sampling, and are used in an array of probability models vastly more complex than those analyzed by Fisher. One important class of models, which I shall call hidden variable models, are the natural partners of the EM algorithm as well as simulationbased methods. There exists a wide variety of data structures in which the use of these models is quite natural. A partial list includes latent class models, missing data models, measurement error models, mixture models, censoring models, and hidden Markov models. In each hidden variable model, there is a hypothetical complete data set (X,H). This complete data might well be described by a probability model with a relatively simple structure. However, the variables H are hidden from the statistician, leaving only the X variables for the data analysis. Although the resulting model for X is often computationally complex, there is a large advantage to this kind of model. This is because the complete data model for (X,H) provides a simple framework for the interpretation of the model parameters. In practice, sometimes the hidden variables really exist, and so are truly missing data, but many times their existence is fictional. That is, they have been fabricated to help us tell a story about the data. In this talk I will review advances in statistical knowledge about likelihood methods in hidden variable models. I will focus on the modern computational era, which corresponds roughly with my professional life. There will be particular emphasis on the way that theory, methodology, and applications have interacted to move us forward.
