Keywords: Knowledge Discovery, Development, Text Mining Machine, Cowdsourcing
I'll first talk about how to develop professionally and move out of one's comfort zone, and then the TextM. TextM: Crowdsourcing has become an important tool to amass huge amounts of data rapidly for scientific research. However, crowdsourcing approaches have not been standardized for applications. We investigate challenges, trends, and classifications within Crowdsourcing in Epidemiology (CrowdEpi) and provide guidelines for an effective crowdsourcing protocol with quality control. To obtain data for CrowdEpi, we developed a Java-based text mining machine ``TextM" and used an XML crawler to collect and query relevant articles from 5 sources. To study the trends, we identified 4 utilities within the CrowdEpi and developed a classifier for automatically grouping articles into these four CrowdEpi areas and the fifth, nonCrowdEpi area. Our TextM can serve as a general-purpose text mining tool that is customizable to a particular study. Our guidelines for crowdsourcing protocol should have applications beyond crowdsourcing in epidemiology. This work is a product from an interface of statistics and computer science for data science. TextM is joint work with Pezzort, Wang & Carter.