Online Program

Return to main conference page
Saturday, May 19
Data Science
Data Science in Practice
Sat, May 19, 10:30 AM - 12:00 PM
Grand Ballroom G
 

Spatial Analysis of Crowdsourced Mobile Data (304582)

Presentation

*Arnab Chakraborty, North Carolina State Univeristy 
Soumendra Nath Lahiri, North Carolina State Univeristy 
Alyson Wilson, North Carolina State Univeristy 

Keywords: Veracity Scoring, Geostatistics, Kriging, Hyper-local forecasting

Crowdsourced data on weather elements such as ambient temperature, air pressure, etc., captured by sensors installed in mobile devices can serve as a potential data source for analyzing environmental processes. In regions with high population density where cellphones are omnipresent, crowdsourced data are available in very fine spatial resolution and thus accuracy of 'hyper-local' spatial interpolation can be improved. But due to the low quality of the sensors, the non-laboratory environment, and other interactions with unknown external and internal processes, the dataset obtained through crowdsourcing from mobile sensors is not completely reliable. Analyzing this varying-quality spatial data is a challenge that has been addressed in this paper. A score to assess the quality of the observations is introduced, namely `Veracity Score'. Incorporating this score in the geostatistical analysis of the crowdsourced data makes the inference and predictions robust to the contamination. Merits of the proposed methodology are showcased by implementing on a real crowdsourced dataset taking daily average temperature over the land of the United States of America for a particular day as the process of our interest. Cross-validated comparison analysis as well as simulation studies show that the predictions obtained by our methodology based on Veracity Scoring are uniformly better than the standard geostatistical approach. In addition daily average temperature readings from NOAA ground stations has been incorporated along with crowdsourced data for 'hyper-local' prediction of the station-measured process at locations in between the stations.