Abstract:
|
Modern technological advances produce data at breathtaking scales and complexities such as the images and videos on the web. Such big data require highly expressive models for their representation, understanding and prediction. To fit such models to the big data, it is essential to develop practical learning methods and fast inferential algorithms. In this talk, I will first briefly show our latest development of a visual Turing test system for deep scene and event understanding in videos. The system is formulated under the statistical visual modeling and computing framework. Among many other components, I will focus on online object tracking to explain my methods of tracking, learning and parsing with a unified representation. Then, to address the limited bandwidth in the perception of visual content, I will present my on-going and future work on life-long communicative learning based on situated dialogue, which is inspired by human communication and integrates the deep perception of visual content and the perception of "dark matter" including human's beliefs, intents, goals and even values.
|