Online Program Home
My Program

Abstract Details

Activity Number: 686
Type: Topic Contributed
Date/Time: Thursday, August 4, 2016 : 10:30 AM to 12:20 PM
Sponsor: Social Statistics Section
Abstract #318753 View Presentation
Title: Detecting Text Reuse in State Legislative Bills
Author(s): Joe Walsh* and Matthew Burgess and Eugenia Giraudy and Julian Katz-Samuels and Derek Willis and Rayid Ghani
Companies: The University of Chicago and University of Michigan and YouGov and University of Michigan and ProPublica and The University of Chicago
Keywords: social good ; transparency ; genetic algorithms ; data science

Journalists, researchers, and concerned citizens would like to know who's actually writing legislative bills, but trying to read those bills, let alone trace their source, is tedious and time consuming. This is especially true at the state level, where important policy decisions are made every day. State legislatures consider roughly 60,000 bills each year, covering taxes, education, healthcare, crime, transportation, and more.

To solve this problem, we have created a tool we call the "Legislative Influence Detector." LID helps watchdogs turn a mountain of text into digestible insights about the origin and diffusion of policy ideas and the real influence of various lobbying organizations. For each piece of legislation, LID uses ElasticSearch Lucene scores to choose comparison documents and a Smith-Waterman algorithm to find matches. LID draws on more than 550,000 state bills (collected by the Sunlight Foundation) and 2,400 pieces of model legislation written by lobbyists (collected by us, ALEC Exposed, and other groups), searches for similarities, and flags them for review. LID users can then investigate the matches to look for possible lobbyist and special interest influence.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association