Activity Number:
|
287
- Contributed Poster Presentations: Government Statistics Section
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 9, 2022 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Government Statistics Section
|
Abstract #322129
|
|
Title:
|
Linking Records and Imputing Missing Values for Multi-Source Data
|
Author(s):
|
Mary Munro* and Hongxun Qin and Dr. Timothy Champney and Angus Chen and Sam Cohen and Yueh Quach and Dr. Yolande Tra
|
Companies:
|
The MITRE Corporation and The MITRE Corporation and The MITRE Corporation and The MITRE Corporation and The MITRE Corporation and The MITRE Corporation and The MITRE Corporation
|
Keywords:
|
Record link;
Missing value;
Imputation;
Geodistance;
Statistical matching;
Fuzzy matching
|
Abstract:
|
Missing values have been a challenging problem. Variables with missing values may not be independent with each other and imputing them independently would distort the integrity of the data. We will explore sequential imputation methods and estimate the computational cost of the methods for large datasets. We will also conduct tests filling missing values for variables from one dataset by using variables from other datasets. Research in most disciplines now requires joining data sets from multiple sources. Even the data from the same source may range over many years and data elements may vary and lack proper identifiers for direct matching. Many techniques for probabilistic linking data are available, but they require more resources for large data sets. For instance, matching addresses by using text match probability estimation is computationally intensive for large data. In this research, we combine numerical methods using latitude and longitude with parsed address parts to demonstrate and test alternative inexact matching approaches to impute missing values from secondary data sources.
Approved for Public Release; Distribution Unlimited. Public Release Case Number 22-1164
|
Authors who are presenting talks have a * after their name.