Abstract:
|
An important class of learning problem that attracted attention from various disciplines such as neuroscience, theoretical computer science, information theory, statistical physics, and machine learning can be viewed as samples from a distribution over large domains. To address the practical problem of `big data little time,' one of the key challenges is to design time and storage-efficient algorithms for large-k probability distributions.
In this talk, I will discuss the `Big Data Tringle' challenge, and present a new viewpoint by integrating many fundamental concepts and tools that might give us taxonomy way of thinking about this general research field. Central to our approach is a new functional representation scheme (nonparametric harmonic analysis) to reduce the size of the problem (dimension of the sufficient statistics) with an eye towards understanding and developing efficient algorithms that are fast enough and accurate enough to make a big difference. Our modeling approach works "reasonably well" in practice and often outperforms the recent `breakthrough' algorithms of theoretical computer science.
|