Online Program Home
My Program

Abstract Details

Activity Number: 432
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321027 View Presentation
Title: Efficient Sampling Strategy for SVM Through Semi-Supervised Active Learning
Author(s): Yaru Shi* and Yoonsang Kim and Ganna Kostygina and Sherry Emery
Companies: University of Illinois at Chicago and University of Illinois at Chicago and University of Illinois at Chicago and University of Illinois at Chicago
Keywords: active learning ; support vector machine ; semi-supervised learning

Active learning is a machine learning method that actively chooses training data to improve a classifier's performance and minimize the size of training data. One of main strategies, uncertainty sampling, queries the classifier's most uncertain sample. Simple Margin and MaxMin Margin sampling were proposed for SVM but both are inefficient, because they address single uncertain instance sequentially. To improve the efficiency, we propose a semi-supervised active learning method that queries a set of uncertain instances via K-means clustering. We compare its efficiency with Simple Margin, MaxMin Margin, and random sampling. We collected 9584 tweets using little cigar/cigarillos (LCC) related keywords in Oct 2014 via Gnip's PowerTrack. The response was labeled for LCC relevance. We trained a SVM on a random sample of tweets, updated by adding more training instances according to each method, and assessed the SVM's performance based on F1 scores. To reach F1=0.9 , the random sampling needed 1200 instances, while the proposed method needed 500, and Simple Margin needed 700. The proposed method demonstrated a more efficient learning curve, reducing the need of large training data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association