Abstract:
|
In the world of text analysis there is little existing methodology for drawing causal conclusions when the covariates or outcomes are aspects of the text itself. We summarize the challenges for causal analysis in this domain and propose a general framework for estimating treatment effects in studies where the covariates and/or outcomes are summary measures built from text. First, we extend prior work on matching documents on features generated using text analysis methods. After matching, we estimate differential word use and sentiment using other text analysis tools. We demonstrate our procedure by comparing partisan bias across US news sources, as measured by their rates of coverage of issues and, given the same coverage, their different representation of topics. Here both the covariates (i.e., topics covered) and the outcome (i.e., language used and sentiment of covered content) are measured from the text. Our approach allows for investigation of two questions: "are news sources selecting content at different rates depending on their partisan alignment?", and furthermore, "when covering the same topics, are news sources presenting content using different language or sentiment?".
|