Keywords: deep learning, medical device, AI, evaluation, statistical methods
Recent advances in deep learning, a subfield of artificial intelligence, have allowed for the creation of computer models that can accurately solve many visual tasks involving object detection, localization, and classification. Within medical imaging, deep learning has shown immense initial promise at tasks such as predicting the severity of diabetic retinopathy from retinal fundus images, classifying skin lesions, and analyzing histopathology. This has led to an explosion in the number of medical devices being built that rely on deep learning techniques, yet understanding in the industry of the appropriate methods to evaluate these devices is lagging far behind the innovations. In this talk, we provide a cautionary overview of some of the challenges associated with evaluating the safety and efficacy of deep learning-based medical devices. We begin by providing a brief introduction into deep learning and compare and contrast it with traditional connectionist modeling and statistical machine learning approaches. Then, through a series of real-world case studies, we illustrate how challenges common to these other modeling areas, such as confounding variable problems, are exacerbated by the black-box and highly expressive nature of standard deep learning models. Finally, we offer recommendations and guidance on how to proactively identify and fix these problems so that the audience comes away with a basic understanding of how to critically evaluate deep learning software in the medical field.