Abstract:
|
Innovation is traditionally measured through surveys of selected companies such as the Business R&D and Innovation Survey (BRDIS) that focuses on innovation incidence, i.e., the number of innovating firms. This paper aims to develop machine learning methods to measure business innovation using non-traditional data sources to enrich and complement innovation measures obtained through these surveys.
We focus on product innovation in the pharmaceutical sector (drugs and medical devices) that is heavily regulated by the Food and Drug Administration (FDA). The non-traditional data sources include publicly available opportunity and administrative data such as financial filings, and news articles obtained from Dow Jones, a business news and data provider. We collect and scrape around 3K filings, and parse 2M news articles, and develop text-mining methods to link datasets by companies. We develop machine learning methods to estimate the number of new products companies introduce to the market. Our findings are compared to the publicly available approval data provided by the FDA to study the fraction of innovation activity that could be captured using these novel data sources.
|