Abstract:
|
Pathologic complete response (PCR) is defined as the complete removal of residual invasive disease in the breast at the completion of chemotherapy. Breast cancer patients who achieve PCR often have a better chance of avoiding disease recurrence. PCR is relatively rare, occurring in around 20% percent of patients. If the goal is to predict PCR when considering chemotherapy, the case where non-achievers form a dominating class in a data set must be considered. In this paper, we used clinical and micro array data to predict PCR status using a sample of 271 training cases and 64 testing cases. Roughly 81% of the patients in this data set do not achieve PCR. Modeling without adjustment for the dominating class will lead to a top classification accuracy of 83% by over representing non-achievers (25% PPV, 96% NPV). We adjust for the dominant class using three approaches: random removal of PCR negative data, adding a modification to the standard DLDA, and using a modified loss function. Using these approaches we significantly increase the PPV to 75%, while still keeping a NPV of 71%.
|