Abstract:
|
Synthetic data provides a privacy-safe mechanism for developing, benchmarking, testing, and showcasing analysis plans and data processing pipelines. Many crucial types of data, however, involve inter-related rectangular sets of data, with columns in one table acting as keys within another. This is notably true of the CDISC data standard for clinical trial data, which has some tables which contain one row per patient (ADSL) and others in which a patient might have 0, 1, or many rows (ADAE).
The synthetic.cdisc.data package uses our novel respectables framework - implemented in the open source package of the same name - to create fully synthetic clinical trial readout data which follows the CDISC data standard. Furthermore, respectables can be used to customize the recipes implemented in synthetic.cdisc.data to specify the behavior of variables individually, conditionally on other variables, or jointly with other variables.
respectables and synthetic.cdisc.data will be released open source - approval granted prior to abstract submission - and available on github at the time of the presentation.
|