Keywords: Harmonization, measurement, inferences, compatibility, data sets
In the era of big data, multiple sources of health care data are usually merged together to make inferences and answer complicated questions. If done appropriately, the process can foster novel opportunities for policy makers with richer database and more powerful results. These data are inconsistent due to the different measurements, analysis methods, collection schedules, clinical specifications, or systematically designs. Data harmonization methods aim to generate mutually compatible measurements and sets of variables based on statistical models and eventually turn the big database into thick information. We propose a two-stage method for data harmonization process at measurement level and at data sets level as well as the visualization of this harmonization process. The harmonization score metric between data sets can be used to make inferences about the compatibility of the data sources. These methods are lastly applied to a real-world problem – create the common HIV medication adherence measure from multiple data sources.