Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 65 - Causal Inference with Latent Variables
Type: Invited
Date/Time: Monday, August 9, 2021 : 10:00 AM to 11:50 AM
Sponsor: Mental Health Statistics Section
Abstract #316578
Title: Natural Language Processing Algorithms and Their Relationship to Latent Variable Modeling
Author(s): Brian Lee Egleston* and Slobodan Vucetic
Companies: Fox Chase Cancer Center and Temple University
Keywords: Cluster-Corrected Standard Errors; Electronic Health Records; Natural Language Processing; Probability Models; Word2vec; Latent Variable Modeling

Models for the Pointwise Mutual Information (PMI) statistic underlie many popular natural language processing algorithms. We have found that the models can be used to identify latent classes of patients based on procedures, diagnoses, and health histories using free text notes from electronic health records. We previously demonstrated some of the asymptotic properties of PMI estimators that account for clustering of words within patient notes. In our current work, we further explore the relationship of natural language processing algorithms to latent variable modeling in general. We present examples of how natural language processing using clinical notes from electronic health records can potentially identify latent classes of HIV or cancer patients. We discuss the possibility of extending this work to causal inference applications.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program