Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 398 - Beyond Traditional Approaches: Evolving Artificial Intelligence and Machine Learning to Advance Clinical Research and Drug Development
Type: Topic Contributed
Date/Time: Wednesday, August 5, 2020 : 1:00 PM to 2:50 PM
Sponsor: Biometrics Section
Abstract #311103
Title: Recent Advances in the Application of Natural Language Processing to Unstructured and Semi-Structured Data in the Pharmaceutical Industry
Author(s): Peter Henstock*
Companies: Pfizer Inc
Keywords: text mining; natural language processing; NLP; classification; data mining

Like most industries, pharmaceutical companies capture the vast majority of their experimental or generated data in structured relational databases. These provide rapid access to tabular data suitable for statistical or machine learning analyses, even at scale. However, most of the non-experimental data come from external sources in unstructured or semi-structured formats such as journal articles, conference proceedings, news, emails, and reports. While most of the scientific decisions are based on the structured experimental data, many of our strategic decisions rely upon the unstructured data. We require the ability to transform this data into actionable insights using analytical techniques due to the ever-increasing volume of information and our inability to effectively consume it.

This paper focuses on a case study that involve text classification as part of an extended text search. The challenge addressed is to identify and classify sentences that are related to an extended semantic concept defined by a list of words and phrases. The particular problem focuses on providing oversight for the sales force in their online training materials, but the solution is applicable to a diverse set of problems and industries.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program