All Times EDT

Key:

Computational Statistics

Data Visualization

Education

Machine Learning

Practice and Applications

Software & Data Science Technologies

Thursday, June 4

Computing in Data Privacy Fill out evaluation
Invited

Thu, Jun 4, 10:00 AM - 11:35 AM

Organizer(s): Aleksandra Slavkovic, Penn State University

Chair(s): Aleksandra Slavkovic, Penn State University

10:05 AM

Formally Private Microdata at Scale: Reducing the Magnitude of Upward Bias
Philip Leclerc, United States Census Bureau

10:35 AM

OpenDP: An Open-Source Suite of Differential Privacy Tools
James Honaker, Harvard University

11:05 AM

Encode, Shuffle, Analyze Revisited: Strong Privacy Despite High Epsilon
Abhradeep Guha Thakurta, Google Research Brain Team and UC Santa Cruz

Visualization for Big Data and AI Fill out evaluation
Invited

Thu, Jun 4, 10:00 AM - 11:35 AM

Organizer(s): Andee Kaplan, Colorado State

Chair(s): Haley Jeppson, Iowa State University

10:05 AM

Telling a Visual Story Within Big Data: Case Studies on Interactive Visualizations for Supercomputer Data
Presentation Claire McKay Bowen, Urban Institute

10:35 AM

Protoshiny: Interactive Exploration of Dendrograms with Prototypes
Jacob Bien, University of Southern California

11:05 AM

Visualizing Complex Science
Presentation Samuel F. Way, Spotify

Data Science Using R Fill out evaluation
Invited

Thu, Jun 4, 10:00 AM - 11:35 AM

Organizer(s): Brad Price, West Virginia University

Chair(s): Jim Harner, West Virginia University

10:05 AM

Bayesian Methods for Data Science Using R
Christina Knudson, University of St. Thomas

10:35 AM

Process Automation as the Backbone of Reproducible Science
Presentation Brian Lee Yung Rowe, Pez.AI

11:05 AM

Training Large Deep Learning Models Using Spark, TensorFlow, and R
Presentation Javier Luraschi, RStudio

Anomaly Detection in Complex Data Fill out evaluation
Invited

Thu, Jun 4, 10:00 AM - 11:35 AM

Organizer(s): Sarah Rajtmajer, Penn State University

Chair(s): Sarah Rajtmajer, Penn State University

10:05 AM

High Temperature Structure Detection in Ferromagnets
Presentation Matey Neykov, University of Pittsburgh

10:35 AM

Toward Secure and Interpretable AI: Scalable Methods, Interactive Visualizations, and Practical Tools
Presentation Polo Chau, Georgia Tech

11:05 AM

Detecting Anomalies in Graph-Structured Data
James Sharpnack, UC Davis

Education 1 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 10:00 AM - 11:35 AM

Chair(s): Donna LaLonde, American Statistical Association

10:05 AM

Teaching the Gestalt Principles to Help Undergraduate Students Design Effective Tables and Graphs
Silas Bergen, Winona State University

10:20 AM

Bringing Visual Inference to the Classroom
Presentation Adam Loy, Carleton College

10:35 AM

Data Management with Data Verbs
Presentation Todd Iverson, Winona State University

10:50 AM

Beyond NYC Flights in Intro to Data Science: Curtis Flowers and the Role of Race in Jury Selection
Presentation Paul Roback, St. Olaf College

11:05 AM

Q&A

Practice and Applications 1 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 10:00 AM - 11:35 AM

Chair(s): David Hunter, Penn State University

10:05 AM

Leveraging Methods for Subsampling: Toward a Realistic Evaluation
Presentation Changrui Liu, University of Kentucky

10:35 AM

Improving Cloud Infrastructure Capacity Planning Decisions with Scalable Human-in-the-loop Scenario Forecasting
Presentation Jiaping Zhang, Salesforce

11:05 AM

Finite Sample Properties of an Exponential-Compound Symmetric Covariance Structure
Amber K. Weydert, University of West Florida

Education and Data Visualization Posters
E-Poster

Thu, Jun 4, 10:00 AM - 1:00 PM

Poster Q&A will be available during these designated hours as part of the virtual conference.

Exploring Technical Competencies Needs for Future Information Technology Workforce
Ana Valentin, Marymount University

WITHDRAWN The Arcus Learning Exchange: Cross-Departmental Education Development at the Children's Hospital of Philadelphia

Increasing Diversity in Biomedical Data Science: Implementation and Impact of Best Practices
Judith E Canner, California State University, Monterey Bay

ACM Draft 2 Computing Competencies for Undergraduate Data Science
Karl Schmitt, Valparaiso University

A Statistician Teaches Deep Learning: From Fundamentals to Applications
David Han, The University of Texas at San Antonio

Identifying Academic At-Risk Students with Consistence Validation Using Predictive Analytics
Jianbin Zhu, University of Central Florida

Educational Tool and Active-Learning Class Activity for Teaching Agglomerative Hierarchical Clustering
Xizhen Cai, Williams College

REDCap and RShiny Together to Survey and Deliver Personalized Feedback of a Well-Being Assessment
Duncan Grade Vos, WMU School of Medicine

Modified Box Plots for Arithmetic, Geometric, and Harmonic Observations
Mian Arif Shams Adnan, Bowling Green State University

Geometries of the Connections of the Graphical Presentations of Several Statistical Tools
Mian Arif Shams Adnan, Bowling Green State University

CatViz for Visual Exploration of High-Dimensional Categorical Data Sets
Raif Rustamov, AT&T Labs Research

A Range-Based Box Plot
Mian Arif Shams Adnan, Bowling Green State University

Building an Open-Sourced Geospatial Visualization Shiny Application in R for Healthcare Providers and Evaluators
Dar'ya Y Pozhidayeva, Oregon Health & Science University

Harnessing the Power of Data to Promote Institutional Change at Higher Education Institutions Fill out evaluation
Invited

Thu, Jun 4, 11:40 AM - 12:45 PM

Organizer(s): Kameryn Denaro, UC Irvine

Chair(s): Wendy Martinez, Bureau of Labor Statistics

11:45 AM

Making an Impact in an Institutional Research Office: On Data Champions and Machine Learning
Presentation Richard A. Levine, San Diego State University

12:15 PM

A Data-Driven Approach to Promoting Innovation and Excellence in Teaching at Higher Education Institutions
Presentation Kameryn Denaro, UC Irvine

Parallel Computing Fill out evaluation
Invited

Thu, Jun 4, 11:40 AM - 12:45 PM

Organizer(s): Sean Blanchard, Los Alamos National Laboratory

Chair(s): Sean Blanchard, Los Alamos National Laboratory

11:45 AM

Democratizing Calculations in the Cloud
Presentation Andrew Glenn Shewmaker, OpenEye Scientific

12:15 PM

Adaptive MCMC for Everyone
Presentation Jeffrey S. Rosenthal, University of Toronto

Modern Inference in Statistical Machine Learning Fill out evaluation
Invited

Thu, Jun 4, 11:40 AM - 12:45 PM

Organizer(s): Ryan Tibshirani, Carnegie Mellon University

Chair(s): Nicholas Schmidt, BLDS

11:45 AM

Predictive Inference with Random Forests
Lucas Mentch, University of Pittsburgh

12:15 PM

Semiparametric Estimation in High Dimensions
Presentation Mladen Kolar, U Chicago Booth

Machine Learning 5 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 11:40 AM - 12:45 PM

Chair(s): Thomas Carpenito, Northeastern University

11:45 AM

Modernizing k-Nearest Neighbors Software
Presentation Norm Matloff, UC Davis

12:15 PM

Heterogeneous Treatment Effects of Medicaid and Efficient Policies
Presentation Shishir Shakya, West Virginia University

Machine Learning 6 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 11:40 AM - 12:45 PM

Chair(s): Yirui Hu, Geisinger

11:45 AM

Functional Singular Spectrum Analysis
Mehdi Maadooliat, Department of MSSC at Marquette University

12:15 PM

Statistical Learning and Energy Statistics for High-Dimensional Time Series
John Steven Schuler, George Mason University

Practice and Applications 5 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 11:40 AM - 12:45 PM

Chair(s): Lauren Alpert Sugden, Duquesne University

11:45 AM

Estimation Graphics: Essential Data Analysis for Biomedical Science
Joses Ho, Institute for Molecular and Cell Biology

12:00 PM

Trial-by-Trial Mid-Frontal Theta Power Predicts Emotional Decision Processes in Response Inhibition Task
Siddharth Nayak, Institute of Statistical Science, Academia Sinica

12:15 PM

A Paradigm for Managing Computational Reproducibility in a Changing Software Package Landscape
Kiegan Rice, Iowa State University

Interactive Machine Learning Fill out evaluation
Invited

Thu, Jun 4, 1:20 PM - 2:55 PM

Organizer(s): James Sharpnack, UC Davis

Chair(s): James Sharpnack, UC Davis

1:25 PM

On the Global Convergence of Policy Optimization in Deep Reinforcement Learning
Zhaoran Wang, Northwestern University

1:55 PM

WITHDRAWN: Marginal Posterior Sampling for Slate Bandits

2:25 PM

Interactive Learning Using Labels and Comparisons
Aarti Singh, Carnegie Mellon University

Community Engagement Through Data Science Education Fill out evaluation
Invited

Thu, Jun 4, 1:20 PM - 2:55 PM

Organizer(s): Leah Jager, Johns Hopkins Bloomberg School of Public Health

Chair(s): Leah Jager, Johns Hopkins Bloomberg School of Public Health

1:25 PM

Can Data Science Education Be Used as a Tool for Upward Mobility?
Presentation Aboozar Hadavand, Johns Hopkins University, Bloomberg School of Public Health

1:55 PM

Incorporating Community-Based Learning Into the Classroom
Presentation Lynne Steuerle Schofield, Swarthmore College

2:25 PM

Statistics in the Community: Community-University Partnerships Fostering Data Science Education
Presentation Stephen Salerno, Department of Biostatistics, University of Michigan

Cloud Computing: The Future for Data Science Applications Fill out evaluation
Invited

Thu, Jun 4, 1:20 PM - 2:55 PM

Organizer(s): Ming Li, Amazon

Chair(s): Ruth Hummel, JMP

1:25 PM

End-to-End Data Science Project Cycle
Ming Li, Amazon

1:55 PM

Machine Learning and Cloud Computing for Statisticians
Robert Winston Blanchard, SAS

2:25 PM

Q&A

Divide and Recombine for Big Data Analysis and Visualization Fill out evaluation
Invited

Thu, Jun 4, 1:20 PM - 2:55 PM

Organizer(s): Susan Vanderplas, Iowa State University

Chair(s): Susan Vanderplas, Iowa State University

1:25 PM

Divide and Recombine (D&R) with R-RHIPE-Hadoop Software
William S. Cleveland, Purdue University

1:55 PM

Rethinking Climate Data Analysis and Visualization in the Era of Big Data
Wen-wen Tung, Purdue University

2:25 PM

Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior
Sanvesh Srivastava, University of Iowa

Computational Statistics 1 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 1:20 PM - 2:55 PM

Chair(s): Sujay Datta, University of Akron

1:25 PM

Nonparametric Estimation of Blood Alcohol Concentration from Transdermal Alcohol Measurements Using Alcohol Biosensor Devices
Presentation Bryan Edward Vader, Naval Base Ventura County

1:55 PM

Parameter-Expanded Data Augmentation for Analyzing Correlated Binary Data Using Multivariate Probit Models
Xiao Zhang, Michigan Technological University

2:25 PM

Streaming Data Analysis with Dynamic Regression Trees
Presentation Simon Paul Wilson, Trinity College Dublin

2:40 PM

Q&A

Practice and Applications 2 Fill out evaluation
Contributed Refereed

Thu, Jun 4, 1:20 PM - 2:55 PM

Chair(s): Mitra Devkota, University of North Georgia

1:25 PM

Scaleable Correlated Topic Modelling for Job Matching
Simon Paul Wilson, Trinity College Dublin

1:40 PM

Bayesian Inference for Polycrystalline Materials
James Matuk, The Ohio State University

1:55 PM

SVM Model for Blood Cell Classification Using Interpretable Features Outperforms CNN-Based Approaches
Presentation William Franz Lamberti, George Mason University

2:10 PM

Visualizing the Food Landscape of Durham with Tableau
Joseph Lewis Graves, NCAT

2:25 PM

A Spatiotemporal Case Crossover Model of Asthma Attacks in the City of Houston
Julia Schedler, Rice University

2:40 PM

Q&A

Machine Learning and Software and Data Science Technologies Posters
E-Poster

Thu, Jun 4, 2:00 PM - 5:00 PM

Poster Q&A will be available during these designated hours as part of the virtual conference.

WITHDRAWN: Prediction of Hospital Readmissions: A Comparison of Predictive Methods on Binary and Survival Outcomes

WITHDRAWN Prediction of Inpatient Quality Indicators: A Comparison of Predictive Methods with and Without Random Hospital Effect

Can Big Data Algorithms Be Used to Improve Cybersecurity?
Allen Sina Rahrooh, University of Central Florida

WITHDRAWN: BLNN: An R Package for Training Neural Networks

WITHDRAWN Decision Tree Model-Based Gene Selection and Classification for Breast Cancer Risk Prediction

Learning the Stock Market States via a Logistic Regression Model and Its Applications
Qiyu Wang, Zhejiang Univ of Finance and Econ

Multiple Sequence Alignment Using Tensor Analysis
Mian Arif Shams Adnan, Bowling Green State University

Investigation of the Interplay Between Random Forest and Kernel Methods in Big Data
Richard Baumgartner, Merck&Co., Inc.

Interfacing Statistical Software Packages with R and Python
Neil Polhemus, Statgraphics Technologies, Inc.

TF-IDF-Weighted Similarity Estimates for Unseen Categories
Handong David Bang, UNC Chapel Hill Department of Biostatistics

Developing a Computational Framework for Precise TAD Boundary Prediction Using Genomic Elements
Spiro C Stilianoudakis, Virginia Commonwealth University

Predicting 30-Day Readmission After Surgery Among Colorectal Cancer Patients
Anshul Saxena, Baptist Health south Florida

R Package mase
Iris Griffith, Reed College

Ethics and Bias in Algorithms
Panel Discussion

Thu, Jun 4, 3:00 PM - 4:30 PM

Chair(s): Wendy Martinez, Bureau of Labor Statistics

In this mini-workshop, we discuss some of the social and ethical challenges of statistical and machine learning algorithms with a panel of experts from academia and industry.

3:05 PM

Ethics and Bias in Algorithms
Presentation Jie Chen, Wells Fargo; Jim Rosenberger, NISS; Aleksandra Slavkovic, Penn State University; Robert Tibshirani, Stanford University

SC1 SOLD OUT - Big Data, Data Science, and Deep Learning for Statisticians, Part 2 (Ticket Required)
Short Course

Thu, Jun 4, 3:00 PM - 6:30 PM

Instructor(s): Ming Li, Amazon

Continuation of course.

SC5 - CANCELLED: Building Advanced Computer Vision Models Using SAS Software (Ticket Required)
Short Course

Thu, Jun 4, 3:00 PM - 6:30 PM

SC6 - Data Science Workflows Using R and Spark (Ticket Required)
Short Course

Thu, Jun 4, 3:00 PM - 6:30 PM

Instructor(s): Jim Harner, West Virginia University

R is a flexible, extensible statistical computing environment, but it is limited to single-core execution. Spark is a distributed computing environment that treats R as a first-class programming language. This course introduces data structures in R and their use in functional programming workflows relevant to data science.

The course covers the initial steps in the data science process: - extracting data from source systems, - transforming data into a tidy form, - loading data into distributed file systems, distributed data warehouses, and NoSQL databases, i.e., ETL.

These R-based workflows are illustrated by using dplyr directly and as a frontend to SQL databases. The sparklyr package with its dplyr interface to Spark is then used for modeling big data using regression and classification supervised learning methods. Unsupervised learning methods, such as clustering and dimension reduction, are also covered. Finally, methods for analyzing streaming data are presented. Student accounts are provided to allow attendees to interactively run the R Markdown content in Amazon’s cloud (AWS). The computing infrastructure and the content is containerized which allows the complete course environment to be downloaded and run on Docker-supported laptops.

SC7 - Visualizing Big Data (Ticket Required)
Short Course

Thu, Jun 4, 3:00 PM - 5:00 PM

Instructor(s): Leland Wilkinson, H2O.ai and University of Illinois at Chicago

Big datasets (many rows, many columns, many items, ...) present special problems for visualization. Even when trying to plot simple rectangular datasets, we encounter complexity (many functions are polynomial or exponential in rows or columns), the curse of dimensionality (distances approach a constant as dimensionality heads toward infinity), choke points (data bus or network bandwidth), and limited display resolution (even with megapixel displays). This workshop covers recent strategies that exploit aggregation and projection to reduce datasets to manageable proportions. It also covers graphic representations that are most suitable for exploring multivariate data.

Online Program

Key:

ASA Meetings Department