Online Program

Return to main conference page
Thursday, February 14
Thu, Feb 14, 5:30 PM - 7:00 PM
St. James Ballroom
Poster Session 1 and Opening Mixer

Using Google APIs to Automate Data Extraction for Sampling Frames (303865)

View Presentation View Presentation

Heather Driscoll, ICF 
*Adam Lee, ICF 
Robynne Locke, ICF 
Randy ZuWallack, ICF 

Keywords: API, Python, Web Scrapping, Data Science, Sampling, Data Cleaning, ETL

In this poster, we present a novel approach to extract data using data science techniques, specifically a geocoded dataset (area sampling frame) through the Google Maps and Street View APIs. In 2018, we began work on a study of Multi-Unit Residential Buildings (MURB) across 8 major metropolitan areas in Canada. There were no definitive data sources that could identify MURBs and this required us to build our own sampling frame. The traditional approach to this type of sample frame development would be to physically canvas an area and review buildings to determine their eligibility into the study. The geographic area that field staff would need to cover would be a significant undertaking as it spanned 6 provinces. As an alternative, we virtualized the canvasing of areas by creating a custom interface with Google Maps API that allowed us to extract geocoding data from buildings, and a secondary program that allowed for bulk extraction of Google Street View images to evaluate their eligibility. As a result, significant time and resources were saved by implementing this process.