Keywords: ridit, sampling, importance sampling, fraud, prevalence
A key metric in abuse-fighting is the prevalence of abuse. In addition to measuring its baseline value, it is crucial to monitor it continuously and detect any surges, possibly corresponding to fraud campaigns. However, since the labeling of abuse cases typically relies on human judgement, limit in human review resource often makes prevalence infeasible to measure, associated with wide margins of error, or difficult to monitor on an ongoing basis.
Here we develop a statistical methodology consisting of optimal sampling and ridit analysis for ongoing monitoring. In several abuse-fighting scenarios, we demonstrate that with the help of an ML classifier trained on historical data, prevalence of abuse can be accurately measured and continuously monitored without a continuous supply of human labels.