Fairness in Data Science: Criteria, Algorithms, and Open Problems — Professional Development Continuing Education Course
ASA, Section on Statistics in Epidemiology
Instructor(s): Ilya Shpitser, Johns Hopkins University; Daniel Malinsky, Columbia University; Razieh Nabi, Emory University
Systematic biases present in our society influence the way data is collected and stored, the way variables are defined, and the way scientific findings are put into practice as policy. Automated decision procedures and learning algorithms applied to such data may serve to perpetuate existing injustice or unfairness in our society. Increasing commoditization of statistical and machine learning methods led to a series highly publicized instances of learning algorithms producing inappropriate, discriminatory, or otherwise harmful outputs. As a response, a flurry of research activity aimed to quantitatively describe various aspects of fairness and bias in data science, as well as develop new approaches to learning and estimation from data that takes fairness criteria into account. In this one day short course, we will review a variety of fairness criteria that have been developed, along with algorithms that aim to be ‘fairness-aware’ in various ways, with a particular emphasis on methods rooted in causal inference. We will conclude by describing a variety of methodological and translational problems that remain in this rapidly growing subfield of data science. The course assumes basic familiarity with statistical inference, maximum likelihood, basic predictive modeling (classification/regression). Some knowledge of causal inference is a plus, but not necessary.
Recent Advances in Statistical Methods Applied to Racial Equity Research — Invited Papers
ENAR, American Public Health Association, Section on Statistics in Epidemiology, Justice Equity Diversity and Inclusion Outreach Group, Caucus for Women in Statistics
Organizer(s): Ruby Lee Bayliss, Dornsife School of Public Health, Drexel University
Chair(s): Loni Philip Tabb, Dornsife School of Public Health, Drexel University
2:05 PM
Predicting Asthma Morbidity in Multiethnic Urban Rhode Island Children Anarina Murillo, New York University; Loni Philip Tabb, Dornsife School of Public Health, Drexel University; Rachel Gaither, Brown University; Sheryl J Kopel, Brown University; Michelle L Rogers, Brown University; Melanie Morales Aquino, Brown University; Patrick M Vivier, Brown University; Daphne Koinis-Mitchell, Brown University
2:20 PM
A Spatial Assessment of the Impact of Residential Segregation on Racial/Ethnic Cardiovascular Health Inequities Ruby Lee Bayliss, Dornsife School of Public Health, Drexel University; Harrison Quick, Drexel University; Loni Philip Tabb, Dornsife School of Public Health, Drexel University; Sharrelle Barber, Dornsife School of Public Health, Drexel University; Mahasin Mujahid, University of California, Berkeley School of Public Health; Kiarri Kershaw, Northwestern University, Feinburg School of Medicine
Yu B Chen, CDC Sarah Conderino, NYU Grossman School of Medicine Imaani Easthausen, Aetion Emily Pfaff, University of North Carolina at Chapel Hill Judy Zhong, New York University Grossman School of Medicine Bo Cai, University of South Carolina, Arnold School of Public Health
Data Fusion for Time-to-Event Outcomes Fatema Shafie Khorassani, University of Michigan; Jeremy Taylor, University of Michigan; Xu Shi, University of Michigan
Practical Solutions for Working with Electronic Health Records Data — Professional Development Continuing Education Course
ASA, Section on Statistics in Epidemiology
Instructor(s): Rebecca Hubbard, University of Pennsylvania; Yong Chen, University of Pennsylvania
This short course will introduce participants to the basic structure of EHR data and provide a practical set of tools to analyze this rich data resource through a combination of lecture and hands-on exercises in R. The first part of the course will cover issues related to the structure and quality of EHR data, including data types and methods for extracting variables of interest; sources of missing data; error in covariates and outcomes extracted from EHR data; and data capture considerations such as informative visit processes and medical records coding procedures. In the second half of the course, we will discuss statistical methods to mitigate data quality issues arising in EHR, including missing data, error in EHR-derived covariates and outcomes, and data integration across multiple clinical practices. R code will be provided for implementation of the presented methods, and hands-on exercises will be used to compare results of alternative approaches. This short course is of interest to researchers without prior experience working with EHR data as well as more experienced individuals interested in learning practical solutions to some common analytic challenges.
Statistics for Mobile and Wearable Device Data — Invited Papers
Business and Economic Statistics Section, Section on Medical Devices and Diagnostics, Section on Statistics in Epidemiology
Organizer(s): Nathaniel Josephs, Yale University
Chair(s): Nathaniel Josephs, Yale University
10:35 AM
Impact of Close Interpersonal Contact on COVID-19 Incidence: Evidence from One Year of Mobile Device Data Forrest W. Crawford, Yale University; Sydney A. Jones, CDC/CT DPH; Matthew Cartter, CT DPH; Samantha Dean, Yale School of Public Health; Joshua L. Warren, Yale School of Public Health; Zehang Richard Li, UCSC; Jacqueline Barbieri, Whitespace Ltd; Jared Campbell, Whitespace Ltd; Patrick Kenney, Whitespace Ltd; Thomas Valleau, Whitespace Ltd; Olga Morozova, Stony Brook University
Heterogeneous Causal Effects Estimation via Semiparametric Bayesian Models Xinyi Xu, The Ohio State University; Bo Lu, The Ohio State University; Steven MacEachern, The Ohio State University; Ling Wang, Michigan State University; Yuxuan Xin, The Ohio State University; Rui Zhang, The Ohio State University
Novel Approaches for Assessment of Health Outcomes and Multi-Cohort Data Integration Using Wearable Devices in Large-Scale Biomedical Studies — Topic Contributed Papers
Section on Statistics in Epidemiology, Biometrics Section, ENAR
Organizer(s): Lucia Tabacu , Old Dominion University
Harmonization of Open-Source and Proprietary Accelerometry-Based Physical Activity Measures Marta Karas, Johns Hopkins University; John Muschelli, Johns Hopkins University; Andrew Leroux, University of Colorado ; Jacek K. Urbanek, Johns Hopkins University; Amal A. Wanigatunga, Johns Hopkins University; Jiawei Bai, Johns Hopkins University; Ciprian M. Crainiceanua, Johns Hopkins University; Jennifer A. Schrack, Johns Hopkins University
Novel Methodology Development in High-Dimensional Longitudinal Data Analysis — Invited Papers
ENAR, Section on Statistics in Epidemiology, Section on Statistics and the Environment
Organizer(s): Lance Ford, The University of Oklahoma Health Sciences Center
Chair(s): Chao Xu, University of Oklahoma Health Sciences Center
8:35 AM
Regression Analysis of Correlations for Correlated Data
Jie Hu, University of Science and Technology of China; Yu Chen, University of Science and Technology of China; Chenlei Leng, University of Warwick; Cheng Yong Tang, Temple University
Dallas Wayne Anderson, National Institute on Aging, National Institutes of Health Kiros T Berhane, Columbia University Douglas Landsittel, Indiana University School of Public Health-Bloomington Michelle D Shardell, University of Maryland School of Medicine Donna Spiegelman, Yale School of Public Health Molin Wang, Harvard T.H. Chan School of Public Health
R2-Based Mediation Analysis with High-Dimensional Omics Mediators Peng N/A Wei, The University of Texas MD Anderson Cancer Center; Tianzhong Yang, University of Minnesota; Sunyi N/A Chi, The University of Texas MD Anderson Cancer Center; Zhichao Xu, The University of Texas MD Anderson Cancer Center; Chunlin Li, University of Minnesota; Bin Shi, The University of Texas MD Anderson Cancer Center; Xuelin Huang, The University of Texas MD Anderson Cancer Center
Estimating Random Effects in a Finite Markov Chain with Absorbing States: Application to Cognitive Data
Pei Wang, Miami of Ohio; Erin L Abner, University of Kentucky; Changrui Liu, University of Kentucky; David W Fardo, University of Kentucky; Frederick A Schmitt, University of Kentucky; Gregory A Jicha, University of Kentucky; Linda J Van Eldik, University of Kentucky; Richard Kryscio, University of Kentucky
All-Cause and Cause-Specific Mortality in a Cohort of WTC-Exposed and Non-WTC-Exposed Firefighters Ankura Singh, Fire Department of the City of New York; Rachel Anna Zeig-Owens, Montefiore Medical Center ; Madeline Cannon, Fire Department of the City of New York; Mayris P Webber, Fire Department of the City of New York; David J Prezant, Fire Department of the City of New York; Paolo Boffetta, Stony Brook Cancer Center; Charles B Hall, Albert Einstein College of Medicine
Developing Weighting Methodology for an Epidemiological Study Among American Indians: The Strong Heart Liver Study Jean Leidner, University of Oklahoma Health Sciences Center; Sixia Chen, University of Oklahoma Health Sciences Center; Michael Middleton, University of California, San Diego; Claude Sirlin, University of California, San Diego; Walter Henderson, University of California, San Diego; Justin Dvorak, University of Oklahoma Health Sciences Center; Tauqeer Ali, University of Oklahoma Health Sciences Center; Alvin C Silva, Mayo Clinic-Phoenix; Jason G Umans, Georgetown University Medical Center; Shelley A Cole, Texas Biomedical Research Institute; Ying Zhang, University of Oklahoma Health Sciences Center
Accommodating Population Differences in Model Validation Ruth Pfeiffer, National Cancer Institute; Yiyao Chen, Technical University of Munich; Mitchell Gail, National Cancer Institute; Donna P. Ankerst, Technical University of Munich
What We Know About What We Don’t Know: Overcoming Incomplete Data in Practice — Invited Papers
ENAR, Caucus for Women in Statistics, Section on Statistics in Epidemiology
Organizer(s): Sarah C. Lotspeich, University of North Carolina at Chapel Hill
Chair(s): Marissa C. Ashner, University of North Carolina at Chapel Hill
2:05 PM
Missing Data in the Baseline Health Surveys of the All of Us Research Program and the Opportunity from Multiple Information Sources Qingxia Chen, Vanderbilt University Medical Center; Robert M Cronin, The Ohio State University; Xiaoke Feng, Vanderbilt University Medical Center; Lina Sulieman, Vanderbilt University Medical Center; Brandy Mapes, Vanderbilt University Medical Center; Shawn Garbett, Vanderbilt University Medical Center; Ashley Able, Vanderbilt University Medical Center; Rebecca Johnston, Vanderbilt University Medical Center; Mick P. Couper, University of Michigan; Brian K Ahmedani, Henry Ford Health System
Innovative Approaches for Modeling Time-to-Event Data in the Presence of Competing Risks and/or Time-Varying Covariates — Topic Contributed Papers
Biometrics Section, Section on Statistics in Epidemiology, ENAR
Organizer(s): Mulugeta Gebregziabher, Medical University of South Carolina; Ralph Ward, Heath Equity and Rural Outreach Innovation Center, Ralph H. Johnson VAMC
Chair(s): Valerie Durkalski-Mauldin, Medical University of South Carolina
2:05 PM
Weighted Least-Squares Regression with Competing Risks Data Dipankar Bandyopadhyay, Virginia Commonwealth University; Sangbum Choi, Korea University, Seoul, South Korea; Taehwa Choi, Korea University; Hyunsoon Choi, National Cancer Center, South Korea
2:25 PM
Incorporating Cross-Sectional Information into a Joint Model of Longitudinal and Survival Data Using a Power Prior Juned Siddique, Northwestern University Feinberg School of Medicine; Michael Daniels, University of Florida; Hongyan Ning, Northwestern University Feinberg School of Medicine; Norrina Allen, Northwestern University Feinberg School of Medicine; John Wilkins, Northwestern University Feinberg School of Medicine; Donald Lloyd-Jones, Northwestern University Feinberg School of Medicine
Air Quality Modeling for Exposure Assessment Qi Ying, Texas A&M University; Xiaohui Xu, Texas A&M University; Eun Sug Park, TTI; Richard Smith, University of North Carolina Chapel Hill; Eric Whitsel, University of North Carolina at Chapel Hill; James Stewart, University of North Carolina at Chapel Hill; Melinda Power, George Washington University
3:05 PM
Understanding Critical Windows of Exposure in Longitudinal Analysis of Air Pollution and Cognitive Function
Xiaohui Xu, Texas A&M University; Eric Whitsel, University of North Carolina at Chapel Hill; Qi Ying, Texas A&M University; Richard Smith, University of North Carolina Chapel Hill; James Stewart, University of North Carolina at Chapel Hill; Eun Sug Park, TTI; Erin Bennett, George Washington University; Katie Lynch, George Washington University; Melinda Power, George Washington University; Vixey Fang, Texas A&M University; Xiaohui Xu, Texas A&M University
3:25 PM
Accounting for Exposure Measurement Error in Air Pollution and Neuroimaging Analysis Eun Sug Park, TTI; Richard Smith, University of North Carolina Chapel Hill; Xiaohui Xu, Texas A&M University; Eric Whitsel, University of North Carolina at Chapel Hill; James Stewart, University of North Carolina at Chapel Hill; Qi Ying, Texas A&M University; Katie Lynch, George Washington University; Erin Bennett, George Washington University; Melinda Power, George Washington University
Minimizing Estimation Bias in Analyses Utilizing Electronic Health Records Data Zhibao Mi, VA Cooperative Studies Program Coordinating Center; Ellen J Dematt, VA Cooperative Studies Program Coordinating Center; Eileen M Stock, VA Cooperative Studies Program Coordinating Center; Min Zhan, VA Cooperative Studies Program Coordinating Center; Kousick Biswas, VA Cooperative Studies Program Coordinating Center
Melody Goodman, New York University Michele Andrasik, Fred Hutch Cancer Research Center Yates Coley, Kaiser Permanente Washington Health Research Institute Sahar Z Zangeneh, RTI International
Analyzing Lipidomic Profiling and Perceived Stress Data Collected in the Strong Heart Family Study Megan Eisele, The University of Oklahoma Health Sciences Center; Guanhong Miao, University of Florida, Gainesville; Oliver Fiehn, University of California, Davis; Tauqeer Ali, University of Oklahoma Health Sciences Center; Shelley A Cole, Texas Biomedical Research Institute; Amanda Fretts, University of Washington, Seattle; Jason G Umans, Georgetown University Medical Center; Jessica Reese, University of Oklahoma Health Sciences Center; Kimberly Malloy, University of Oklahoma Health Sciences Center; Richard B Devereux, Weill Cornell Medicine; Lyle G Best, Missouri Breaks Industries Research Inc.; Barbara V Howard, MedStar Research Institute; Elisa T Lee, University of Oklahoma Health Sciences Center; Jinying Zhao, University of Florida, Gainesville; Ying Zhang, University of Oklahoma Health Sciences Center
Dynamic Single-Index Scalar-on-Function Models Yiwei Li, New York University; Yuyan Wang, New York University Department of Population Health ; Mengling Liu, New York University Grossman School of Medicine
9:20 AM
Quantifying Bacterial Strain-Host Associations with ANPAN Andrew Ghazi, Broad Institute; Yan Yan, Harvard TH Chan School of Public Health; Eric A. Franzosa, Harvard T. H. Chan School of Public Health; Curtis Huttenhower, Harvard T.H. Chan School of Public Health
Cancer Incidence, Latency, and Survival in World Trade Center Rescue/Recovery Workers Charles B Hall, Albert Einstein College of Medicine; Andrew Christian Todd, Icahn School of Medicine at Mount; James E Cone, New York City Department of Health and Mental Hygiene, World Trade Center Health Registry; Jiehui Li, New York City Department of Health and Mental Hygiene, World Trade Center Health Registry; David G Goldfarb, Montefiore Medical Center, Fire Department of the City of New York, City University of NY; Rachel Anna Zeig-Owens, Montefiore Medical Center ; Paolo Boffetta, Stony Brook Cancer Center
Using Propensity Scores in Convenience Samples Olivia M. Bernstein Morgan, University of California, Irvine; Brian G. Vegetabile, RAND Corporation; Joshua D. Grill, University of California, Irvine; Daniel L Gillen, University of California Irvine
Mike Baiocchi, Stanford University Andrew Gelman, Columbia University Debashis Ghosh, Colorado School of Public Health Arman Oganisian, Brown University Ani Eloyan, Brown University Elizabeth Ogburn, Johns Hopkins University Anna Neufeld, University of Washington
Analyzing Lipidomic Profiling and Perceived Stress Data Collected in the Strong Heart Family Study Megan Eisele, The University of Oklahoma Health Sciences Center; Guanhong Miao, University of Florida, Gainesville; Oliver Fiehn, University of California, Davis; Tauqeer Ali, University of Oklahoma Health Sciences Center; Shelley A Cole, Texas Biomedical Research Institute; Amanda Fretts, University of Washington, Seattle; Jason G Umans, Georgetown University Medical Center; Jessica Reese, University of Oklahoma Health Sciences Center; Kimberly Malloy, University of Oklahoma Health Sciences Center; Richard B Devereux, Weill Cornell Medicine; Lyle G Best, Missouri Breaks Industries Research Inc.; Barbara V Howard, MedStar Research Institute; Elisa T Lee, University of Oklahoma Health Sciences Center; Jinying Zhao, University of Florida, Gainesville; Ying Zhang, University of Oklahoma Health Sciences Center
Dynamic Single-Index Scalar-on-Function Models Yiwei Li, New York University; Yuyan Wang, New York University Department of Population Health ; Mengling Liu, New York University Grossman School of Medicine
11:
Quantifying Bacterial Strain-Host Associations with ANPAN Andrew Ghazi, Broad Institute; Yan Yan, Harvard TH Chan School of Public Health; Eric A. Franzosa, Harvard T. H. Chan School of Public Health; Curtis Huttenhower, Harvard T.H. Chan School of Public Health
Cancer Incidence, Latency, and Survival in World Trade Center Rescue/Recovery Workers Charles B Hall, Albert Einstein College of Medicine; Andrew Christian Todd, Icahn School of Medicine at Mount; James E Cone, New York City Department of Health and Mental Hygiene, World Trade Center Health Registry; Jiehui Li, New York City Department of Health and Mental Hygiene, World Trade Center Health Registry; David G Goldfarb, Montefiore Medical Center, Fire Department of the City of New York, City University of NY; Rachel Anna Zeig-Owens, Montefiore Medical Center ; Paolo Boffetta, Stony Brook Cancer Center
Using Propensity Scores in Convenience Samples Olivia M. Bernstein Morgan, University of California, Irvine; Brian G. Vegetabile, RAND Corporation; Joshua D. Grill, University of California, Irvine; Daniel L Gillen, University of California Irvine
Bhramar Mukherjee, University of Michigan Dean Follmann, National Institute of Allergy and Infectious Diseases Alex Luedtke, University of Washington Jeffrey Morris, University of Pennsylvania Usha Govindarajulu, Icahn School of Medicine at Mount Sinai
Omitted Variable Bias in Machine-Learned Causal Models
Victor Chernozhukov, MIT Economics; Carlos Cinelli, University of Washington; Whitney Newey, MIT Economics; Amit Sharma, Microsoft; Vasilis Syrgkanis, Microsoft Research
Nonparametric Estimation of the Potential Impact Fraction and the Population Attributable Fraction Colleen Elise Chan, Yale University; Rodrigo Zepeda-Tello, National Institute of Public Health of Mexico; Dalia Camacho-García-Formentí, National Institute of Public Health of Mexico; Frederick Cudhea, Tufts University; Rafael Meza, University of Michigan School of Public Health; Eliane Rodrigues, Universidad Nacional Autónoma de México; Donna Spiegelman, Yale School of Public Health; Tonatiuh Barrientos-Gutiérrez, National Institute of Public Health of Mexico; Xin Zhou, Yale School of Public Health
3:20 PM
The Functional Synthetic Control Method Aaron Shev, University of California, Davis; Andrew Farris, University of California, Davis; Chris McCort, University of California, Davis; Veronica Pear, University of California, Davis; Hannah Laqueur, University of California, Davis; Rose Kagawa, University of California, Davis