![IconGems-Print](images/IconGems-Print.png)
398 – Beyond Traditional Approaches: Evolving Artificial Intelligence and Machine Learning to Advance Clinical Research and Drug Development
Recent Advances in the Application of Natural Language Processing to Unstructured and Semi-Structured Data in the Pharmaceutical Industry
Peter V. Henstock
Pfizer Inc.
Search and filtering methods are key technologies for unstructured or semi-structured texts. This paper focuses on a method to leverage lexical, morphological, semantic, and syntactic levels of linguistics to classify and rank text based on a semantic concept which is broader than a typical online query. Our approach leverages word2vec to identify the closest words followed by a random forest classifier. Although our approach can be applied broadly to many domains, the problem addressed in this paper is to rank order sentences in large corpus of online course transcriptions based on their need for human review for oversight.