Online Program

Return to main conference page

All Times ET

Thursday, June 3
Practice and Applications
Classification and Simulation: Methods, Analyses, and Applications
Thu, Jun 3, 10:00 AM - 11:35 AM
TBD
 

How SciLine Solved its Multi-Label Classification Problem (309835)

*Joshua Logan Colburn, SciLine, AAAS 

Keywords: multi-label, classification, Naïve Bayes, prior probabilities, NLP, text classification

For organizations dealing with large databases of customer profiles, accurate labeling of those customers is crucial to a variety of practices, from marketing segmentation to creating new product strategies. At SciLine, a free service based at the AAAS that connects journalists to subject-matter experts who can help with their news stories, many "customers" are scientists, and knowing their scientific discipline is critical to matching them to specific reporter needs. When a journalist asks to speak to someone about a recent hurricane in the Gulf, for example, readily knowing who, within SciLine’s large database of experts, is a climatologist or remote sensing expert or meteorologist allows the organization to respond more efficiently. Separately, knowing that there are, for example, 600 climatologists but 3,000 anthropologists in that database can help SciLine make decisions about where to advertise for recruitment. The problem SciLine currently faces is that while it has some amount of information about the research expertise of every one of the more than 20k experts in its database, only a little over half have a discipline label. Even more challenging, most experts do not fall under just one discipline category: depending on their research interests, a social psychologist might fall under psychology, sociology, and public health, for example. Thus the question: how can SciLine use the information it has about a given expert to determine which discipline label(s) appropriately describe them? Manual labeling is highly accurate but is both cost- and time intensive. Through testing, most multi-label classification algorithms provided useless results--largely due to under-classification from bias in the training data. But through leveraging a pre-built aspect of a common single-label model, the prior probabilities of Naïve Bayes, a methodology was successfully developed that provides promising results without intensive processing, programming, or manual review.