Abstract:
|
Dysmorphic features are low-frequency variations of body structures among the general population, which are more common among those with syndromes. In a case-control study, we examined distributions of over 300 dysmorphic features in 3- to 6-year-old children with developmental disabilities to identify clusters of features for further analysis. Such data analysis is challenging and beyond the analytical power of many classical statistical methods because of the large number of variables and possible redundancies among them. We proposed a stepwise procedure to cluster variables and to select important variables for further analysis. First, we filtered out the variables classified as dysmorphic in ?10 children. Second, we screened out the variables that contribute un-predictive information using Information Value (< 0.02). Lastly, we clustered variables using Oblique Component Analysis and selected important variables from each of the clusters based on their between- and within-cluster correlations and 1-R squared ratios. This method will be demonstrated using an example of analyzing 80 ear features, which resulted in 3 clusters and 6 variables selected for further analysis.
|