Abstract:
|
In recent years it has become very common to hear statements on how Big Data, the availability of very large datasets, is revolutionising science. It is applicable to a wide variety of area, but it is often forgotten that the breakthroughs achieved with these data do not only come from its volume, but specially from the capability to do a meaningful data analysis with them. This capability requires the large processing capability of computers but also, and more critically, a proper understanding of the statistical properties of these samples and the ability to design statistical analysis tools to extract knowledge from the data. A clear example of this is the datasets produced by the Gaia mission of the European Space Agency. It is generating very large astrometric catalogues (two billion objects) with unprecedented accuracy, and in this talk I will discuss the challenges faced by the astronomical community to fully exploit its scientific potential. These challenges range from the basic need to understand the properties of the data (data censorships, variable transformation, random errors, systematics) to the design and implementation of analysis tools appropriate to handle them.
|