Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 283 - Deep Learning Methods
Type: Contributed
Date/Time: Tuesday, August 9, 2022 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322700
Title: How Much Data Do We Need? Predicting Deep Learning Model Performance and Training Data Sizes
Author(s): Jelena Frtunikj* and Thomas Muehlenstaedt and Rajat Mehta
Companies: ArgoAI and ArgoAI and ArgoAI
Keywords: deep learning; power law; prediction; automated driving ; data annotation

Collecting and annotating data is typically the most expensive part of developing supervised Deep Learning (DL) algorithms. It is also widely accepted that adding more training data improves DL model performance. However, we usually have no idea how much data or which data is required to achieve a given model desideratum/performance goal.

This paper presents a method that fits a power law and inverted power law models to data points consisting of given performance (e.g. accuracy, error, F1 score, etc.) calculated using training sets with increasing sizes. The fitted models are then used to 1) predict the model performance on larger sample sizes than available and 2) predict the needed training dataset size based on a desired model performance. Important aspects in this process are the choice of the different data set sizes and the sampling strategy inside each data set. The method is applied to two DL case studies: 1) open source image classification algorithm and 2) object detection algorithm used in automated driving vehicles (AVs).

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program