Abstract:
|
This talk is motivated by a recent cancer micro-environment study for breast cancer. In the study, tumor and adjacent normal tissues from cancer patients were collected. Then several spots of interest were selected on a tissue slide. Fibers of each spot on a slide were measured using a customized software and for each fiber, 19 measurements including length, curvature, etc were collected. In the dataset, each slide contained a maximum of 8 spots and each spot had hundreds of fibers. As the number of fibers of each spot is large, it is reasonable to view the data of each spot as an empirical distribution of fiber features. The challenge is that the label for each spot may be unobserved. If a slide is 'tumor' then we know at least one spot on this slide should be labeled 'tumor'. If a slide is 'normal', then all the spots of this slide should be labeled 'normal'. Therefore, for a 'tumor' slide, we were not able to observe spot labels. But for a 'normal' slide, we observed all spot labels. We consider developing classification rules based on such data.
|