support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

Submit Support Ticket


close this panel
‹‹ Go Back

Futoshi Yumoto, PhD

Collaborative for Research on Outcomes and -Metrics



‹‹ Go Back

Robert L. Wood

Resonate & Wichita State University



‹‹ Go Back

Rochelle E. Tractenberg

Georgetown University



‹‹ Go Back

Please enter your access key

The asset you are trying to access is locked for premium users. Please enter your access key to unlock.


Email This Presentation:

From:

To:

Subject:

Body:

←Back IconGems-Print

340 – SPEED: Applications of Advanced Statistical Techniques in Complex Survey Data Analysis: Small Area Estimation, Propensity Scores, Multilevel Models, and More

Machine Learning to Evaluate the Quality of Patient Reported Epidemiological Data

Sponsor: Survey Research Methods Section
Keywords: data quality, machine learning, Bayesian Network, data trustworthiness, mutual information, data assessment

Futoshi Yumoto, PhD

Collaborative for Research on Outcomes and -Metrics

Robert L. Wood

Resonate & Wichita State University

Rochelle E. Tractenberg

Georgetown University

Patient reported epidemiological data are becoming more widely available. One new such dataset, the Fox Insight (FI) project, was launched in 2017 to encourage the study of Parkinson's disease and will be released for public access in 2019. Early analyses of responses from the earliest participants suggest that there may be significant fatigue effects on elements that occur later in the surveys. These trends point to potential violations of assumptions of missingness at random (MAR) and completely at random (MCAR), which can limit the inferences that might otherwise be drawn from analyses of these data. Here we discuss a machine learning approach that can be used to evaluate the likelihood that an individual respondent is "doing their best" vs. not. Bayesian network structural learning is used to identify the network structure, and data quality scores (DQS) were estimated and analyzed within- across-each section of a set of seven patient reported instruments. The proportion of respondents whose DQS scores fell below what would be considered a cutoff (threshold) for data that is unacceptably or unexpectedly similar to random responses ranges from a low of 13% to a high of 66%. Our results suggest that the method is not unduly influenced by the length of instruments or their internal consistency scores. The method can be used to detect, quantify, and then plan or choose the method of addressing nonresponse bias, if it exists, in any dataset an investigator may choose - including the FI dataset, once that is made available. The method can also be used to diagnose challenges that may arise in one's own dataset, possibly arising from a misalignment of patient and investigator perspectives on the relevance or resonance of the data being collected.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2018 CadmiumCD