Abstract:
|
We consider the problem of multivariate outlier testing from a population from which a training sample is available. A new observation is available, and we test whether the new observation came from the same population as the training sample. Problems of this type arise in a number of applications, including nuclear monitoring, handwriting identification, and medical diagnosis. In this paper, we consider a modified likelihood ratio test that is applicable to the case in which: a.) the training data follow a mixture-of-normals distribution; b.) all labels in the training sample are missing; c.) some vectors in the training sample have missing information; and d.) the number of components in the mixture is unknown.
The approach often used in practice to handle missing data in this setting is to perform the test based only on data vectors with full data which, of course, may lead to loss of valuable information. An alternative procedure is to use all available data via the EM algorithm. We use simulations and examples to compare use of the EM algorithm on the entire data set with use of only the complete data vectors.
|