Abstract:
|
Patient reported epidemiological data are becoming more widely available. One new such dataset, the Fox Insight (FI) project, was launched in 2017 to encourage the study of Parkinson's disease and will be released for public access in 2019. Early analyses of responses from the earliest participants suggest that there may be significant fatigue effects on elements that occur later in the surveys. These trends point to potential violations of assumptions of missingness at random (MAR) and completely at random (MCAR), which can limit the inferences that might otherwise be drawn from analyses of these data. Here we discuss a machine learning approach that can be used to evaluate the likelihood that an individual respondent is "doing their best" vs. not. Bayesian network structural learning is used to identify the network structure, and data quality scores (DQS) were estimated and analyzed within- across-each section of a set of seven patient reported instruments. The proportion of respondents whose DQS scores fell below what would be considered a cutoff (threshold) for data that is unacceptably or unexpectedly similar to random responses ranges from a low of 13% to a high of 66%. Our results suggest that the method is not unduly influenced by the length of instruments or their internal consistency scores. The method can be used to detect, quantify, and then plan or choose the method of addressing nonresponse bias, if it exists, in any dataset an investigator may choose - including the FI dataset, once that is made available. The method can also be used to diagnose challenges that may arise in one's own dataset, possibly arising from a misalignment of patient and investigator perspectives on the relevance or resonance of the data being collected.
|