In this talk I will discuss statistical challenges and opportunities with joint analysis of electronic health records and genomic data through "Phenome-Wide Association Studies (PheWAS)". I will posit a modeling framework that helps us to understand the effect of both selection bias and outcome misclassification in assessing genetic associations across the medical phenome. I will use data from the UK Biobank and the Michigan Genomics Initiative, a longitudinal biorepository at Michigan Medicine, launched in 2012 to illustrate the analytic framework. The examples illustrate that understanding sampling design and selection bias matters for big data, and are at the heart of doing good science with data. This is joint work with Lauren Beesley and Lars Fritsche at the University of Michigan.