A common goal of studies of preventive vaccines is to estimate whether and how their efficacy for preventing infection and/or disease varies by the strain of the infecting pathogen. This issue has come to the forefront of public interest over the past year as new strains of SARS-CoV2 have emerged. One of the challenges in learning about strain-specific efficacy of vaccines from randomized trials is the fact that many breakthrough infections have missing sequence data. This missingness can bias estimation of strain-specific efficacy. For example, vaccines often cause less virulent forms of infection, leading to lower levels of viral genetic material in samples taken from breakthrough infections and in turn, a higher probability of sequencing failure. On the other hand, some strains of pathogen may be naturally less virulent irrespective of host vaccination status. Thus, viral load may be an exposure-induced confounder in the context of estimation of strain-specific efficacy. In this talk, we will describe methodology to address this bias using flexible semiparametric methods for causal inference and illustrate these methods using examples from modern vaccine studies.