Data Mining Methods for Clustering Large Two-way Data to Identify Local Structures and Global Patterns
*Minho Chae, FDA  Chun-houh Chen, Institute of Statistical Science, Academica Sinica  Wen Zou, FDA  James J Chen, FDA 

Keywords: Clustering, Visualization, RSVD

In microarray data hierarchical clustering (HC) is often used to group objects (samples) according to observations (gene expression profiles) to discern possible patterns in the data. While HC can quickly reveal local behavior of the data, it hardly shows global patterns and smooth transitional trends. Our approach overcomes this shotcoming of HC by employing robust singular value decomposition (RSVD) in order to smoothly sort internal nodes of HC since RSVD can reveal bilinear structure of the data with missing values and outliers. In addition to a microarray data, this approach also applied to Adverse Event Reporting System’s (AERS) data of FDA and presented with intuitive matrix visualization such as generalized association plot (GAP).