Online Program Home
My Program

Abstract Details

Activity Number: 667 - Statistics, Science, and Society
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: IMS
Abstract #330656
Title: Classification of Healthcare Data: When Scarcity of Labeled Data Is the Norm Semi-Supervised Learning Methods Can Come to the Rescue
Author(s): Didem Egemen* and Paulo Macedo and Sewit Araia
Companies: The George Washington University and Integrity Management Services Inc. and Integrity Management Services Inc.
Keywords: outlier detection; classification; fraud detection; semi-supervised learning; healthcare fraud
Abstract:

Fighting healthcare fraud is very crucial in terms of improving the quality of healthcare services however it is not a straightforward process. In this study, we propose combining and comparing different methods, which are Classifier Adjusted Density Estimation (CADE), Mahalanobis distance, and Singular Value Decomposition. Our goal is to obtain outliers within healthcare providers by taking into account different information provided by these three methods and try to reduce the number of false positives. For this purpose in the first part of this presentation, we will introduce these methods and compare the observations flagged differently in each method by using Multivariate Gini coefficient and Multivariate Lorenz curves and the equivalence of these Lorenz curves are tested by using the Bootstrapping method. In the second part, we will focus more on CADE method and will be working on some improvements on this method (variable selection, different classifiers such as logistic regression, random forest, and support vector machines). We will also show an application by using CMS's publicly available dataset and justify the results by comparing them with the excluded providers' data


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program