Abstract:
|
In astronomical data, measurement error uncertainties are often given in the data. However, many popular classification methods are unable to account for this unique property. We propose a model-agnostic method to incorporate heteroscedastic error into existing classification methods. First, we simulate pseudo-datasets from the Bayesian posterior predictive distribution of a measurement error model. Then, the classifier is fit to each simulation. The variation of any quantity across the simulations reflects the uncertainty propagated from the errors in both the training and test set. We demonstrate the approach via two studies: (1) a simulation study applying the procedure to SVM and random forest, and (2) identifying high-z (2.9 < = z < = 5.1) quasars from a merged catalog of the Sloan Digital Sky Survey, the Spitzer IRAC Equatorial Survey, and the Spitzer-HETDEX Exploratory Large-area survey. The proposed method reveals that out of 10,520 high-z quasars identified by a random forest without incorporating measurement error, 2,273 are potential misclassifications. In addition, out of ~1.8 million objects not identified as high-z quasars, 765 can be considered new candidates.
|