Individual cohort studies of cognitive aging are not sufficiently large for genetic analyses or for assessing less-common risks and outcomes, so we would like to combine data from several studies. Rarely have studies administered the same cognitive tests, but if there are items in common, scores for global cognition or for a single cognitive domain can be put on the same metric across cohorts, using item response theory and a structural equations model, even if the tests have methods effects or residual correlations. Unlike z-score techniques previously used, this method measures everyone on the same metric, so that scores represent the same underlying level of cognition across studies. We have used these methods to co-calibrate many large cohort studies of dementia in the United States and Europe, using items common to at least 2 of the cohorts. Test items unique to a particular study are still included in the final score, to maximize precision.