Activity Number:
|
65
- Causal Inference with Latent Variables
|
Type:
|
Invited
|
Date/Time:
|
Monday, August 9, 2021 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Mental Health Statistics Section
|
Abstract #316578
|
|
Title:
|
Natural Language Processing Algorithms and Their Relationship to Latent Variable Modeling
|
Author(s):
|
Brian Lee Egleston* and Slobodan Vucetic
|
Companies:
|
Fox Chase Cancer Center and Temple University
|
Keywords:
|
Cluster-Corrected Standard Errors;
Electronic Health Records;
Natural Language Processing;
Probability Models;
Word2vec;
Latent Variable Modeling
|
Abstract:
|
Models for the Pointwise Mutual Information (PMI) statistic underlie many popular natural language processing algorithms. We have found that the models can be used to identify latent classes of patients based on procedures, diagnoses, and health histories using free text notes from electronic health records. We previously demonstrated some of the asymptotic properties of PMI estimators that account for clustering of words within patient notes. In our current work, we further explore the relationship of natural language processing algorithms to latent variable modeling in general. We present examples of how natural language processing using clinical notes from electronic health records can potentially identify latent classes of HIV or cancer patients. We discuss the possibility of extending this work to causal inference applications.
|
Authors who are presenting talks have a * after their name.