Online Program

Friday, October 21
Knowledge
Fri, Oct 21, 3:30 PM - 4:20 PM
Salon 3
Statistical Interface with Computer Science for Data Science

Statistical Interface with Computer Science for Data Science: Some Recent Development (303356)

*Jiayang Sun, Case Western Reserve University 

Keywords: big data, R, interface, computer science tool, programming language, clustering, network, machine learning, feature selection

The triad of big data challenges are modern analytics, infrastructure, and crowdsourcing. Crowdsourcing relies on mass human input online and can be a savior when traditional data collection and computing fall short. This talk will sample some of our recent work in machine learning and crowdsourcing: a) Robust Non-Negative Matrix Factorization (rNMF), b) Numerical Formal Concept Analysis (nFCA), c) Subsampling Winner Algorithm (SWA), and d) Crowdsourcing in Epidemiology. We shall show rNMF as a competitor or complement to PCA, rPCA and rSVD; nFCA to typical clustering and network analysis; SWA to other feature selection procedures; and demonstrate crowdsourcing application and code we developed. Much of these work feature statistical paradigm changes for analyzing large and complex data, and take advantage of statistical interface with computer science. Our applications include analyses of social data, ovarian cancer, cardiovascular and NHDS claims data (with time permitting).