Previous DataFests
2023 – American Bar Association
Goal: Analyze data to provide advice to the American Bar Association on best to ensure the appropriate legal experts are available to support their pro bono legal advice site.
Data consisted of files pertaining to questions posed to the site.
2022 – Play2Prevent Lab, Yale School of Medicine
Goal: Analyze game logs of the Elm City game to determine if there are coherent styles of play that might be useful for characterizing middle school students’ attitudes towards risky behaviors.
2021 – Rocky Mountain Poison Control Center
Goal: Provide advice for medical professionals that could identify potential misuse of prescription drugs.
Data consisted of over 10,000 responses to an international survey about prescription drug use.
2020 – Virtual ASA DataFest
Goal: The 2020 DataFest was held as a virtual data challenge in which students worked in teams to explore an impact of the COVID-19 pandemic. Given the variety of potential topics, part of what made th2 2020 challenge unique was it involved participants finding a data set for their analysis.
2019 – Canadian National Women’s Rugby Team
Goal: How do we quantify the role of fatigue and workload in a team’s performance in Rugby 7s? How reliable are the subjective wellness Data? Should the quality of the opponent or the outcome of the game be considered when examining fatigue during a game? Can widely used measurements of training load and fatigue be improved? How reliable are GPS data in quantifying fatigue?
2018 – Indeed
Goal: What advice would you give a new high school about what major to choose in college? How does Indeed’s data compare to official government data on the labor market? Can it be used to provide good economic indicators?
2017 – Expedia.com
Goal: How do visitors' searches relate to the choices of hotels booked or not booked? What role do external factors play in hotel choice?
Expedia provided DataFesters with data from search results from millions of visitors around the world who were interested in traveling to destinations all over the world. The data were in two files, one of which included data collected on search results from visitors' sessions, and another which contained detailed information about the destinations that visitors searched for.
2016 – TicketMaster
Goal: How can site visits be converted to ticket sales, and how can TicketMaster identify "true fans" of an artist or band?
Data consisted of three sets. One included events from the last 12 months that tracked customer travel through the website. Another provided information about advertising campaigns on Google, and the third included data on the events themselves.
2015 – Edmunds.com
Goal: Detect insights into the process of car shopping that can help make the process easier for customers.
Data consisted of visitor 'pathways' through a website that helps customers configure car features and shop for cars. Five data files were linked by a customer key, and including data about the customer, about his or her visits to the webpage, and, when applicable, about the car purchased and the dealership where the car was purchased.
2014 – GridPoint
Goal: Help understand how customers can best save money and energy
Data consisted of a random sample of customers, with five-minute aggregates over a year of energy consumption that was then aggregated across important features of the commercial properties, as well as supporting climate and location data.
2013 – eHarmony.com
Goal: Help understand what qualities people look for in prospective dates
The DataFest students worked with a large sample of prospective matches. For each customer, data were provided on his or her preferences, as well as four matches, their preferences, and information about whether parties contacted one another.
2012 – Kiva.com
Goal: Help understand what motivates people to lend money to developing-nation entrepreneurs and what factors are associated with paying these loans
Several data sets were provided, including characteristics of lenders and borrowers and loan pay-back data.
2011 – Los Angeles Police Department
Goal: Make a data-based policy proposal to reduce crime
Data consisted of arrest records for every arrest in Los Angeles from 2005-2010, including time, location, and weapons involved.