Abstract:
|
In the U.S., oncology survival statistics are estimated from patients who live in SEER registry areas or from clinical trial participants. These estimates may be biased because patients who live in SEER areas or enroll in clinical trials may differ from the general population (e.g., by socioeconomic status or race), or from a target population of interest. Further, it is difficult to conduct population-level studies of oncology care and quality because stage is often unreliably recorded in health care claims data. We develop three-stage (stages I-II, stage III, and stage IV) lung cancer stage classification algorithms in linked SEER-Medicare claims data. The classification algorithms are fit using a cohort of individuals who received chemotherapy after a diagnosis with a new lung cancer in 2010-2011. To explore generalizability, we employ a subsequent SEER-Medicare cohort (diagnosed in 2012-2013) for validation and compare to a similar cohort of commercially insured individuals in Massachusetts. Using a set-valued classification approach that leverages split conformal inference, we obtain prediction sets for each observation, and assess set coverage for each algorithm.
|