Online Program

Saturday, February 23
PS3 Poster Session 3 & Continental Breakfast Sat, Feb 23, 7:30 AM - 9:00 AM
Napoleon Ballroom

Two Matrix Factorization Widgets for Orange Data Mining (302565)

Chris Beecher, NextGen Metabolomics, Inc 
Fajwel Fogel, Ecole Polytechnique Paris 
Paul Fogel, National Institute of Statistical Sciences 
Douglas A. Marsteller, PepsiCo 
*S. Stanley Young, NISS 

Keywords: non-negative matrix factorization, robust singular value decomposition, data mining, Python, data processing, two-way tables, unsupervised analysis

Orange is open source, data mining software based on Python and C++ programming. A useful component of Orange allows for a drag and drop interface to visualize building the data processing flow and analysis pipeline, or visual programming. On the canvas of this modular program, analysis or processing objects called “widgets” are dropped and connected to create a simple yet powerful data processing flow. In this project we added two matrix factorization methods to Orange; non-negative matrix factorization (NMF), and robust singular value decomposition, (rSVD). There is considerable evidence that matrix factorization methods are useful for understanding two-way tables, but they would be more useful if integrated into a software package that is readily and freely available. To support these matrix factorization widgets, pre- and post- data processing widgets were added to our module. The pre-processing widgets are designed to deal with normalizations, missing data, etc. The post-processing widgets are designed to view and assess the results of the matrix factorization. Both synthetic and real publically available datasets are analyzed to relate to analysis with the R NMF package and to visualize how to set up a workflow. Ultimately, the results allow the data analyst to better understand the information in a two-way data table while minimizing the programming required.