eventscribe

The eventScribe Educational Program Planner system gives you access to information on sessions, special events, and the conference venue. Take a look at hotel maps to familiarize yourself with the venue, read biographies of our plenary speakers, and download handouts and resources for your sessions.

close this panel
support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

close this panel
←Back

341 – Daily Predictions of Key Estimates and Models to Detect Nonsampling Errors in Census Bureau Household Surveys

Data Fusion in Several Algorithms

Sponsor: Section on Statistics in Marketing
Keywords: Data Fusion, Distances, Weighting, Dual Canonical Correlations and PLS, Ridge-regression, Regression with Dummies

Stan Lipovetsky

GfK Custom Research North America

Data fusion consists of the process of integrating several datasets with some common variables, and other variables available only in partial datasets. The main problem of data fusion can be described as follows. From one source, having X0 and Y0 datasets (with N0 observations by multiple x and y variables, n and m of those, respectively), and from another source, having X1 data (with N1 observations by the same n x-variables), we need to estimate the missing portion of the Y1 data (of size N1 by m variables) in order to combine all the data into one set. Several algorithms are considered in this work, including estimation of weights proportional to the distances from each i-th observation in the X1 "recipients" dataset to all observations in the X0 "donors" dataset. Or we can use a sample balancing technique with the maximum effective base performed by applying ridge-regression for the Gifi system of binaries obtained from the x-variables for the best fit of the "donors" X0 data to the margins defined by each respondent in the "recipients" X1 dataset. Then the weighted regressions of each y in the Y0 dataset by all variables in the X0 are constructed. For each i-th observation in the dataset X0, these regressions are used for predicting the y-variables in the Y1 "recipients" dataset. If X and Y are the same n variables from different sources, the dual partial least squares technique and a special regression model with dummies defining each of the three available sets are used for prediction of the Y1 data.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2013 CadmiumCD