Abstract:
|
Classical multidimensional scaling is an important tool for data reduction in many applications. It takes in a distance matrix and outputs low-dimensional embedded samples such that the pairwise distances between the original data points can be preserved, when treating them as deterministic points in the ambient space. When data are noisy, we found that the quality of the embedded samples produced by classical multidimensional scaling starts to break down, when either the ambient dimensionality or the noise variance is large. This motivates us to propose the modified multidimensional scaling procedure which applies a nonlinear shrinkage to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and moment of noise. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.
|