Online Program Home
  My Program

Abstract Details

Activity Number: 375 - Negative Results: They're Essential!
Type: Invited
Date/Time: Tuesday, August 1, 2017 : 2:00 PM to 3:50 PM
Sponsor: Committee on Professional Ethics
Abstract #322175 View Presentation
Title: Big effects of negative results in Big Data: classification errors with differential effects arise from unmodeled latent classes in value-added modeling
Author(s): Futoshi Yumoto* and Rochelle E Tractenberg
Companies: Collaborative for Research on Outcomes and Metrics; Comcasts and Georgetown University
Keywords: big data ; classification errors ; negative results ; value-added modeling

Classification is an extremely important aspect in the analysis of Big Data - whether natural or simulated and whether "sort of big" and "massively big". Statistical models used in evaluation, classification or decision-making are used to study or discover patterns of variation. Sources of heterogeneity can complicate data collection design, and if they are neglected, they can distort analysis and interpretation of the relationships that the data are meant to reveal. Simulated data can be arbitrarily big/massive, as can modern data in biomedical, educational, and business applications. One type of "negative results" is "classification error". The negative results (classification errors) from models in two examples, one from education and one from epidemiology, are explored in this article. The epidemiology example discusses the differences in decisions based on thyroid hormone levels (T3/T4) in pregnancy that depend on the assay used to determine the hormone levels (simple model). The education example is fully developed and described, a simulation study designed to document bias arising from a complex model used to assess the contributions of individual teachers to student learning. ASA Guidelines for the integrity of the professional and the data are both met by a formal examination of classification errors; as are the responsibilities to statisticians and the profession and employers. Ignoring these errors, whether the data are simple or complex, "big" or "small", is not consistent with the ASA Ethical Guidelines for Statistical Practice. Ignoring them may also have important, unanticipated, implications for those about whom the modeling is specifically intended to be informative.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association