This course describes and demonstrates effective strategies for using propensity score methods to address the potential for selection bias in observational studies comparing the effectiveness of treatments or exposures. We review the main analytical techniques associated with propensity score methods, (focusing on matching, weighting and double robust techniques) and describe key strategic concerns related to effective propensity score estimation, assessment and display of covariate balance, choice of analytic technique, stability and sensitivity analyses, and communicating results effectively.
The workshop provides a useful experience for statisticians and others who are familiar with regression models and have some experience with studies comparing treatments, but does not mandate prior experience with applying propensity methods. Attendees will receive access to R Markdown files and other software tools to carry out techniques demonstrated in the session. Demonstrations will use the latest versions of R and R Studio, and attendees are free to either follow along on their own machines, or review the code later, at their convenience.
This workshop is a new version of the speaker's prior award-winning workshops in this field, to make deeper use of modern software tools available in R. Although we focus on established approaches to dealing with design and analytical challenges, we conclude the session by reviewing recent methodological advances in propensity scores and application of propensity score methods to problems in health policy research.
The speaker is Thomas E. Love, Ph.D., who is Professor of Medicine, Epidemiology & Biostatistics at Case Western Reserve University, and has wide experience with applying and teaching statistical methods for observational studies to diverse audiences. Dr. Love is a Fellow of the American Statistical Association.
The purpose of this Workshop is to facilitate the use of the Medical Expenditure Panel Survey Household Component (MEPS HC) public use data files by the health services research community. To meet this objective, participants are provided with a general overview of the MEPS, a description of available data files, information about on-line data tools, and some examples of the type of research projects the MEPS data can support. Major changes have taken place in the Nation's health care delivery system over the last decade. Consider the rapid expansion of managed care arrangements such as health maintenance organizations, preferred provider organizations, and other provider networks that seek to minimize increases in health care costs. The MEPS is a vital national data resource designed to continually provide health service researchers, policymakers, health care administrators, businesses, and others with timely, comprehensive information about health care use and costs in the United States. Newly released MEPS public use files provide analysts with opportunities to create unique analytic files for policy relevant analysis in the field of health services research, such as access to care and health disparities. In order to capture the unparalleled scope and detail of the MEPS HC, analysts need to understand the complexities of MEPS data files and data file linkages. This workshop will provide the knowledge necessary to formulate research plans utilizing the various MEPS HC files and linkage capabilities.
Development of national policies on health must be informed by valid and current national estimates. For over four decades National Health and Nutrition Examination Survey (NHANES) has been monitoring the Nation’s nutrition and health through a series of cross-sectional surveys on the U.S. noninstitutionalized population. By combining home interviews with in-person examinations and bio-specimen collection, NHANES gathers comprehensive data including major chronic and infectious diseases, environmental exposures, food consumption, dietary supplement and prescription medication use that inform national policies. In the proposed session scientists from the U.S. Health and Human Services department - the Division of Health and Nutrition Examination Surveys (DHANES), National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC); and Office of Disease Prevention and Health Promotion, Office of Assistant Secretary of Health (OASH) – and the University of Maryland School of Public Health will describe the NHANES program (components, data collection and public release) and demonstrate the key uses of NHANES data in nutrition and health policies in the U.S. In addition accessing publically-released NHANES data online and restricted use data through DHANES will be demonstrated and case-studies/ examples on working with such NHANES data will be shared. Guidance on use of NHANES tutorials, and weights to obtain nationally representative estimates will be provided.
This session is relevant to the conference on health policy statistics. It will illustrate the access, and use of NHANES data in informing researchers and policy makers in developing and tracking key policies around chronic disease, infectious disease, and environmental health; as well as nutrition policies focusing on dietary intake and nutritional status of Americans.
The session is planned as four presentations with opportunities for Q/A and disucssion.
1. NHANES: Overview and role in informing nutrition policy over 50 years: Data release, access, and tutorials for data use and analysis. Namanjeet Ahluwalia PhD, Nutrition Monitoring Advisor, DHANES, NCHS (30 min) 2. Dietary Guidelines for Americans: An example illustrating NHANES nutrition data informing policy. Kellie Casavale PhD RD, Nutrition Advisor, OADHP, OASH (15 min) 3. NHANES in informing health policies: NHANES data on health and disease prevalence, bio-specimen program and requesting restricted use data. Susan Lukacs DO, Science Officer, DHANES, NCHS (30 min) 4. Case studies on working with NHANES public and restricted use data. Edmond Shenassa, PhD, Associate Professor, University of Maryland (45 min)
In observational studies of causal effects, methods to address selection bias are used to approximate the ideal study that would be conducted if it were possible to do it by controlled experimentation. In this workshop, we will discuss new advancements in matching methods for this purpose. The first hour of the workshop will focus on new approaches to more flexibly attain covariate balance and build representative matched samples, and the second hour will focus on matching methods extended to longitudinal studies with time-varying treatment initiation. Specifically, the first hour will present new methods that allow the investigator to overcome three limitations of standard matching approaches by: (i) directly obtaining flexible forms of covariate balance; (ii) producing self-weighting matched samples that are representative of target populations by design; and (iii) handling multiple treatment doses without resorting to a generalization of the propensity score (PS). As time permits, we will also discuss extensions to matching with instrumental variables and in discontinuity designs. These methods will be illustrated with the statistical software package 'designmatch' for R.
In the second hour of the workshop, we will consider extensions to matching procedures appropriate for longitudinal data, when treatment may be initiated at varying points over a longer span of time. Extending methods to address selection bias for longitudinal studies requires additional considerations; these include determining measurement timing of time-varying confounders, defining “time zero”, determining time-varying eligibility, and handling individuals who change treatment course during follow-up. Recent work in this area has contributed new methods, including sequential stratification matching and PS matching with time-dependent covariates. We will introduce and review these methods, distinguishing different causal questions which require different methods. With the goal of mimicking a randomized trial, we will walk through the steps of implementing these approaches. For each, we will highlight decisions and alternatives, discuss advantages and disadvantages, and the implications to results and interpretation. Detailed case studies will be provided walking through implementation of each method and interpretation of results.
Speakers: Jose Zubizarreta, Harvard University; Valerie Smith, Duke University, Department of Veterans Affairs; Laine Thomas, Duke University
Objective: To introduce health policy stakeholders to recent enhancements and new data products that greatly improve the accessibility and utility of the MCBS, including a downloadable Public Use File. Abstract: The MCBS, a continuous, longitudinal survey of a nationally representative sample of the Medicare population conducted by the Centers for Medicare and Medicaid Services (CMS), has been the definitive resource for researchers to study the Medicare program and enrolled beneficiaries for 25 years. The survey, linked to CMS administrative data, is a uniquely rich resource which allows one to determine expenditures and sources of payment for all services, as well as to examine self-reported measures of health status, access to care, satisfaction with care and functional limitations. The release of the 2015 MCBS (Summer/Fall 2017) will include a number of enhancements that greatly improve the accessibility and utility of the survey. The 2015 MCBS includes updated content to better align with HHS standards, a refreshed sample design to allow for an accelerated release timeline, a new oversample of Hispanic beneficiaries, and a downloadable MCBS Public Use File. Attendees of the 2018 ICHPS would be among the first to receive detailed information on the 2015 enhancements and new user resources.
Presenters: Debra Reed-Gillette (CMS), Nicholas Schluterman (CMS), Felicia LeClere (NORC)
This workshop discusses approaches for interpreting patient-reported outcome (PRO) data that are intended for labeling and promotional claims. PRO measures used for claims must have interpretation guidelines to be useful as efficacy endpoints in clinical trials. We describe two ways to interpret PRO scores: anchor-based and distribution-based methods. Anchor-based approaches use a criterion measure that is clinically interpretable and correlated with the targeted PRO measure of interest. Approaches include percentages based on thresholds, criterion-group interpretation, content-based interpretation, clinically important difference and clinically important responder. Responder analyses help stakeholders interpret between-treatment group differences based on a continuous PRO measure. Distribution-based approaches use the statistical distribution of the data to gauge the meaning of PRO scores. Examples include effect size, probability of relative benefit and cumulative distribution functions. We also discuss the interpretation of PRO data in the presence of missing data and approaches for imputing only certain missing items on a questionnaire (missing items) or all items on a questionnaire that is administered daily (missing days). We discuss the strengths and weaknesses of single imputation and multiple imputation approaches. We first describe proration, a single imputation technique. An example is the “50% imputation rule,” which assigns the mean of the non-missing items of a questionnaire to the missing item(s) if less than 50% of a questionnaire’s items are missing. Proration is also used for missing days. However, proration does not allow for the uncertainty introduced by missing items or days. We describe other methods, such as multiple imputation, that account for the uncertainty associated with imputation and which may be more appropriate than proration. The workshop concludes with a discussion of regulatory thinking about labeling claims for PRO endpoints.
SCOPE AND OBJECTIVES: This workshop presents an overall framework for formulating and evaluating a comparative effectiveness research (CER) question using observational data and causal inference methods. The last half-hour will feature a panel discussion of speakers from the other sessions within the "Methods for Observational Studies of Comparative Effectiveness" workshops.
OUTLINE:Participants will learn how to: 1) ask a CER question and draw causal graphs, 2) assess adequacy of the design and assumptions, 3) define potential outcomes and causal effects, 4) model the assignment mechanism and form the pseudo population, 5) state the outcomes model and estimate effectiveness, 6) conduct sensitivity analyses, 7) identify and apply an instrument for unmeasured confounding, and 8) describe alternative approaches and define an analysis plan. The workshop leverages our Decision Tool for Causal Inference and Observational Data Analysis Methods in Comparative Effectiveness Research (DECODE CER) and online course in CER (funded by PCORI and AHRQ, respectively).
RELEVANCE TO THE CONFERENCE THEME: Evaluating comparative effectiveness, and eventually influencing policies and care, and improving subsequent outcomes and health, increasingly depends on appropriate use of observational data and causal inference. Although such methods are, in some sense, well described in the literature, many variations exist, and the sheer of volume of that literature greatly complicates our ability to effectively weigh the relative strengths and benefits of, and thus select the most reasonable approach(s) for a given question and corresponding data set.
SPEAKERS: Douglas Landsittel, PhD, is the PI of the noted PCORI and AHRQ funding, directs the CER Track of the Institute for Clinical Research Education, and has given a number of invited talks on related methods.
Most of the data used to inform health policy decisions come from observational studies such as large scale surveys conducted by federal, state and local governments and agencies. Examples include the National Health Interview Survey (NHIS), Behavioral Risk Factors Surveillance Survey (BRFSS) or National Immunization Survey (NIS). Because these survey data violate the i.i.d. assumptions of standard statistical methods, they require special analysis methods – the topic of this workshop.
This workshop provides a crash course in complex survey data analysis for health researchers, statisticians and specialists who need to analyze health data collected through complex survey designs. (If the study design description contains keywords like "multistage sampling", "random digit dialing", "nonresponse adjustment" or "final weights", it is a complex survey data set.) The workshop will highlight the issues associated with complex survey data for researchers who have had no exposure to the topic. If you took courses in sampling or survey data analysis, and you remember the material well, this workshop will have limited benefit for you. If you are using weighted survey data, but not entirely sure how the weights were created, or whether to run weighted or unweighted analyses, or are confused about the meaning of your results and what to report -- this is the right workshop for you.
1. Examples of complex health survey data. Survey design trade-offs: frames, coverage, modes and costs.
2. Features of complex surveys: weights, clusters, strata, nonresponse adjustments, and their impact on estimates and standard errors. (2 hours mark)
3. Available software: R, Stata, SAS, SUDAAN. Syntax specification basics.
4. Fitting statistical models with survey data.
5. Survey data quality control: nonresponse biases, coverage biases, mode effects. (4 hours mark)
Instructor: Stanislav (Stas) Kolenikov, Ph.D., is Senior Survey Scientist at Abt Associates.
Social network analysis (SNA) is an emerging area of research involving multiple disciplines and an under appreciation of the breadth of problems it encompasses. This workshop will describe several types of studies involving social networks with examples drawn from medicine and allied fields. The methods used for analyzing each study-type will be described along with the challenges confronting the ability to draw reliable statistical inferences. Because many questions involving society and organizational structure may be represented by networks of relationships, SNA has tremendous potential to advance research and practice in these fields in novel ways. Whilst social networks have been common place for some time in social science disciplines, where numerous descriptive methods for analyzing them have been proposed, interest among statisticians and other quantitative researchers has only recently blossomed. The workshop will consist of three parts: (i) Introduction to the forms of network data, basic network statistics, and common descriptive measures; (ii) “Regular statistical analyses” applied to studies involving a sample of networks; (iii) Relational models in which the network itself constitutes a multivariate dependent variable; (iv) Models or analyses in which networks are fundamental to the construction of explanatory variables, including estimation methods that seek to distinguish social influence (e.g., peer effects) from other social phenomena such as homophily. The workshop will be at a level that relies of a general as opposed to an in depth knowledge of statistics.
Poster session will take place from 6:00 p.m. - 7:00 p.m.
Poster session will take place from 8:00 a.m. - 8:45 a.m.
Prerequesite: Analysis of Complex Health Survey Data, Part 1
Format: One 2-hour session, including practicum. Laptop is strongly recommended.
Target Audience: Epidemiologists, data scientists, informaticians, data analysts, and statisticians.
The phrase “big data” has become widespread, but what does this mean for the practicing healthcare analyst? How does the presence of big data impact the actual workflow of a practicing analyst in health care? In this workshop, attendees will be exposed to multiple tools useful in the analysis of big healthcare data, including Python, SQL, Hadoop, and Spark. The workshop will consist of a lecture/discussion of these technologies, and then practical examples with code. Students will have the opportunity to follow along and run code throughout the workshop. Through instructor led examples, we will discuss and demonstrate the efficiency of various analytic frameworks for a binary classification problem using synthetic EMR data. We will begin with examples of managing data in SQL and alternatively in a NoSQL environment. Various examples of dimensionality reduction for data relevant to healthcare in the pre-modeling environment will be covered. We will further consider more complex dimensionality reduction techniques requiring an analytic platform beyond a simple database management system. In order to explore different approaches to a classification problem, penalized regression (LASSO), Random Forests, and support vector machines (SVM) will be presented. We will contrast traditional serial optimization approaches (such as Newton Raphson) with parallel optimization approaches (such as stochastic gradient descent). Students will be provided with code to run all models ahead of the workshop, thus no experience in these languages is required. All software used will be open source; students will be expected to set up their computing environment prior to the workshop, further details and guidance will be sent to attendees.
Specific Learning Objectives: • Understand options for managing large healthcare data sources in the pre-modeling environment. • Understand the difference between traditional RDBMS and NoSQL alternatives. • Understand and define the differences between the model, loss function, regularizer, and optimization. • Understand serial versus parallel model optimization techniques and the implications for practical approaches to analysis in the presence of big data. • Understand the impact of increasing dimensionality on different analytic approaches. • Gain a basic understanding of fitting models in Python • Gain a basic understanding of fitting models in Apache Spark (using the Python API)