Online Program Home
My Program

Abstract Details

Activity Number: 127
Type: Contributed
Date/Time: Monday, August 1, 2016 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #320818 View Presentation
Title: The Biglasso Package: Extending Lasso Model Fitting to Big Data in R
Author(s): Yaohui Zeng* and Patrick Breheny
Companies: University of Iowa and University of Iowa
Keywords: Lasso ; Big data ; High-dimensional ; Memory-mapping ; GWAS ; Data analysis
Abstract:

Penalized regression models such as Lasso have been extensively applied to analyzing high-dimensional data sets. However, due to memory limitations, existing R packages like glmnet are not capable of fitting Lasso models for ultrahigh-dimensional, multi-gigabyte data sets that are increasingly seen in many areas such as genetics, biomedical imaging, and high-frequency finance. In this study, we implement an R package called biglasso that enables to tackle this challenge. Built upon existing APIs, biglasso utilizes memory-mapped files to store the massive data on the disk and read those into memory whenever necessary during model fitting. Benchmarking experiments demonstrate that our biglasso package, as compared to package glmnet, is roughly equivalent in terms of computation speed but is much more memory-efficient. This advantage opens doors for carrying out powerful big data analysis procedures on an ordinary laptop. We further demonstrate the capability of our package in analyzing massive data sets that cannot be accommodated by existing R packages using real data from large-scale genome-wide association studies of prematurity and its complications.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association