JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 429
Type: Contributed
Date/Time: Tuesday, August 6, 2013 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Marketing
Abstract - #307826
Title: Data Fusion in Several Algorithms
Author(s): Stan Lipovetsky*+
Companies: GfK Custom Research North America
Keywords: data fusion ; distances ; weighting ; dual canonical correlations and PLS ; ridge-regression ; regression with dummies

Data fusion consists of the process of integrating several datasets with some common variables, and other variables available only in partial datasets. The main problem of data fusion can be described as follows. From one source, having X0 and Y0 datasets (with N0 observations by multiple x and y variables, n and m of those, respectively), and from another source, having X1 data (with N1 observations by the same n x-variables), we need to estimate the missing portion of the Y1 data (of size N1 by m variables) in order to combine all the data into one set. Several algorithms are considered in this work, including estimation of weights proportional to the distances from each i-th observation in the X1 "recipients" dataset to all observations in the X0 "donors" dataset. Or we can use a sample balancing technique with the maximum effective base performed by applying ridge-regression for the Gifi system of binaries obtained from the x-variables for the best fit of the "donors" X0 data to the margins defined by each respondent in the "recipients" X1 dataset. Then the weighted regressions of each y in the Y0 dataset by all variables in the X0 are constructed. For each i-th observation in the dataset X0, these regressions are used for predicting the y-variables in the Y1 "recipients" dataset. If X and Y are the same n variables from different sources, the dual partial least squares technique and a special regression model with dummies defining each of the three available sets are used for prediction of the Y1 data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program

2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.