Online Program

Return to main conference page

All Times EDT

Wednesday, June 3
Practice and Applications
Applying Network and Graph Analysis
Wed, Jun 3, 1:15 PM - 2:50 PM

Learning Social Network from Text Data (308338)


Nynke Niezink, Carnegie Mellon University 
Rebecca Nugent, Carnegie Mellon University 
*Xiaoyi Yang, Carnegie Mellon University 

Keywords: Poisson Graphical Model, Lasso Penalty, Record Linkage, Text Analysis, Social Networks

Historians often focus on important figures in history, taking on an individual-level perspective. However, by considering the networks these individuals are embedded in we may discover historic figures in key network positions that have so far received less attention. Text data, like biographies, can be a useful source for learning the structure of a social network in early history. Identifying links from text data is challenging, since (1) people who are mentioned in the same part of a text may not necessarily know each other, and the co-mention may only be due to the fact they both know another person; and (2) duplicated and partial names are ubiquitous in historical texts, and it is unclear how to assign their mentions to individuals. In this work, we compare the use of current network models for text data and incorporate additional biographical information through adaptive penalty adjustment.

First, we explore and compare three conditional independence models, the Local Poisson Graphical Lasso Model, the Gaussian Graphical Model, and the Poisson Log-normal Model, to understand when these models work best and what kind of links they are unable to recover. Second, we manipulate the lasso penalty to include information like sex, birth/death year, family name, and social group membership. The links between people with the same last name or the same social group will get a relatively smaller penalty so that they are more likely to occur. This manipulation should be able to help to close the gap between the original social networks and their conditional independence representation. Finally, while natural language processing tools are good at recognizing names, they do not uniquely link these names to individuals. We plan to make use of record linkage techniques and information of each biography to better identify the attribution of each name to an individual.