Online Program

Return to main conference page

All Times EDT

Thursday, October 1
Thu, Oct 1, 10:00 AM - 12:00 PM
Virtual
Poster Session 1

Data Science Corps - Wrangle, Analyze, Visualize: Project VentureWell (309590)

Benjamin S. Baumer, Smith College 
Joyce Huang, Smith College 
Sunni Grace Raleigh, Smith College 
*Emma Semenuk Scott, Smith College 
Rachel Yan, Smith College 
Annabel Yim, Smith College 

Keywords: workforce development, community engagement, text analysis, experiential learning

Data Science Corps - Wrangle, Analyze, Visualize is a multi-institution, NSF-funded workforce development project that supports twenty-five students and five community partners per semester. Our team of five Smith College students is working with VentureWell, an innovation incubator. To maximize creativity, VentureWell’s E-Teams Grant Program accepts applications in an open-ended format. However, without consistency in structure or filetype, searching and analyzing proposals manually is inefficient. Our goal was to extract proposal text data, wrangle it into a data frame, and count keywords. Using the Scrum agile framework, we researched text extraction packages and outlined a workflow. We wrote a set of R functions that output a data frame of extracted text from a .pdf, .doc, or .docx file in a directory. A supplemental script counts the occurrence of keywords. Our final product provides a way to systematically extract, store, and analyze text data. Possible future work includes analyzing results to increase diversity in application selection and modifying the functions to preserve formatting and to return each keyword’s surrounding sentence.