Abstract:
|
Pairwise dependency graphs are useful tools in data science that summarize which pairs of variables among $d$ variables are dependent, but it is difficult to choose a particular estimator for this task amongst so many possible estimators. For example, certain estimators might capture only linear relations, suffer in presence of outliers, or require large sample sizes to properly assess dependency. This work describes a model diagnostics system to visually explore pairwise relationships in the dataset efficiently via scatter plots in order to select the most appropriate estimator. The system learns how the data scientist interprets whether two variables in the dataset are dependent by having her label a small number of scatter plots, and automatically constructs a pairwise dependency graph based on these learned preferences. By comparing this pairwise dependency graph with those numerically estimated, the system can inform the scientist which estimator is best suited for the dataset at hand. We design each component of this system to be interpretable so this system can be used as supporting evidence to explain why one estimator was chosen over another.
|