Sources of information on human health include randomized trials, surveys, observational cohort studies, disease registries and administrative databases. Such information can be used to estimate disease incidence rates or average treatment effects, to develop models for individual risk prediction, and so on. Essentially all studies have limitations with respect to specific objectives due to factors such as study size, inclusion criteria, definitions and measurement of variables, and missing data. For scientific advancement and for the development of predictive models, the integration of data from separate sources is crucial, but population heterogeneity, study differences and the complexity of health-related processes make integration challenging. I will discuss the need for clear objectives, the relevance of specific models and methodology, and then bias and validity issues, distinguishing between internal bias induced by a study's design or analysis and biases related to the comparison or integration of data from separate sources. Illustrations will involve estimation of average treatment effects and predictive models for women with node negative breast cancer.