All Times EDT
Virtual
Data Science Corps - Wrangle, Analyze, Visualize: Project VentureWell (309590)
Benjamin S. Baumer, Smith CollegeJoyce Huang, Smith College
Sunni Grace Raleigh, Smith College
*Emma Semenuk Scott, Smith College
Rachel Yan, Smith College
Annabel Yim, Smith College
Keywords: workforce development, community engagement, text analysis, experiential learning
Data Science Corps - Wrangle, Analyze, Visualize is a multi-institution, NSF-funded workforce development project that supports twenty-five students and five community partners per semester. Our team of five Smith College students is working with VentureWell, an innovation incubator. To maximize creativity, VentureWell’s E-Teams Grant Program accepts applications in an open-ended format. However, without consistency in structure or filetype, searching and analyzing proposals manually is inefficient. Our goal was to extract proposal text data, wrangle it into a data frame, and count keywords. Using the Scrum agile framework, we researched text extraction packages and outlined a workflow. We wrote a set of R functions that output a data frame of extracted text from a .pdf, .doc, or .docx file in a directory. A supplemental script counts the occurrence of keywords. Our final product provides a way to systematically extract, store, and analyze text data. Possible future work includes analyzing results to increase diversity in application selection and modifying the functions to preserve formatting and to return each keyword’s surrounding sentence.