Online Program Home
  My Program

Abstract Details

Activity Number: 240 - Computationally Intensive Methods for Estimation and Inference
Type: Contributed
Date/Time: Monday, July 31, 2017 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #323315 View Presentation
Title: Efficient Solution of Large Fixed Effects Problems with Clustered Standard Errors
Author(s): Thomas Balmat* and Jerry Reiter
Companies: Duke University and Department of Statistical Science, Duke University
Keywords: Large data least squares ; fixed effects estimation ; clustered standard errors ; sparse matrix methods ; high performance computing ; parallel computing

Large fixed effects regression problems, involving 100 million observations and thousands of effects levels, present special computational challenges but, also, a special performance opportunity because of the large proportion of entries in the expanded design matrix that are zero. For many problems, the proportion of zero entries is above 0.99995, which would be considered sparse. In this presentation, we demonstrate an efficient method for solving large, sparse fixed effects OLS problems without creation of the expanded design matrix and avoiding computations involving zero-level effects. This leads to minimal memory usage and optimal execution time. A feature, often desired in social science applications, is to estimate parameter standard errors clustered about a key identifier, such as employee ID. For large problems, with ID counts in the millions, this presents a significant computational challenge. We present a sparse matrix indexing algorithm that produces clustered standard error estimates that, for large fixed effects problems, is many times more efficient than standard sandwich matrix operations.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association