Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 74 - Text Analysis in Machine Learning and Statistical Models
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Government Statistics Section
Abstract #313751
Title: Detecting Pharmaceutical Innovations in Text-Based Data Using Machine Learning
Author(s): Devika Mahoney-Nair* and Gizem Korkmaz and Gary Anderson and Neil Alexander and Aaron Schroeder and Sallie Ann Keller
Companies: University of Virginia and University of Virginia and National Science Foundation and University of Virginia and University of Virginia and Distinguished Professor in Biocomplexity, U of Virginia
Keywords: innovation; business; pharmaceutical; sec filings; natural language processing; machine learning
Abstract:

Innovation is traditionally measured through surveys of selected companies such as the Business R&D and Innovation Survey (BRDIS) that focuses on innovation incidence, i.e., the number of innovating firms. This paper aims to develop machine learning methods to measure business innovation using non-traditional data sources to enrich and complement innovation measures obtained through these surveys.

We focus on product innovation in the pharmaceutical sector (drugs and medical devices) that is heavily regulated by the Food and Drug Administration (FDA). The non-traditional data sources include publicly available opportunity and administrative data such as financial filings, and news articles obtained from Dow Jones, a business news and data provider. We collect and scrape around 3K filings, and parse 2M news articles, and develop text-mining methods to link datasets by companies. We develop machine learning methods to estimate the number of new products companies introduce to the market. Our findings are compared to the publicly available approval data provided by the FDA to study the fraction of innovation activity that could be captured using these novel data sources.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program