Online Program Home
My Program

Abstract Details

Activity Number: 463 - Novel Uses of Text Analysis in Government Agencies
Type: Topic Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Government Statistics Section
Abstract #329494 Presentation
Title: The CFR Miner: Natural Language Processing of the Code of Federal Regulations Using R Studio and Shiny
Author(s): Richard Schwinn*
Companies: U.S. Small Business Administration
Keywords: R; Shiny; Natural Language Processing; visualization; interactive; machine learning
Abstract:

A huge amount of effort is spent analyzing new rules during the promulgation process, but little attention is paid to analyzing the existing stock of rules. The CFR Miner uses a variety of natural language processing (NLP) techniques to enable users to easily and effectively analyze the Code of Federal Regulations (CFR). Its interactive visualizations let users browse the 4.5 million word corpus with a swipe of the mouse. Force vector maps and collapsible trees reveal structure and accentuate the interconnections between rules. Summarization algorithms provide succinct k-sentence summaries of any granularity of CFR content ranging from paragraphs to volumes. Users can upload, copy & paste, or provide links to their own content, such as business plans or proposed rules, to identify related CFR entries. Classification and clustering analysis techniques, also known as supervised and unsupervised learning, respectively, identify explicit and implicit CFR features by identifying the latent structure of the corpus. The CFR miner is primarily written in R and XPath but relies heavily upon C++, javascript, HTML, and CSS through Shiny and various R packages.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program