Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 353 - Research and Educational Tools
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistics and Data Science Education
Abstract #318590
Title: The Importance of Good Coding Practices for Data Scientists
Author(s): Maria-Cristiana Girjau* and Randall Pruim and Nicholas Horton
Companies: Amherst College and Calvin University and Amherst College
Keywords: R; code style; data science education; statistical practice
Abstract:

Many students and practitioners are reluctant to adopt good coding practices as long as the code "works". However, code standards are an important part of modern data science practice, and they play an essential role in the development of "data acumen". Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We illustrate key aspects of coding practices (both good and bad), focusing primarily on the R language. The lessons distilled from the examples are organized into a top ten list:

1. Follow a style guide 2. Copy and paste is not a workflow 3. Don't impose coding paradigms from other languages 4. Use R Markdown for documents and webpages 5. Choose your toolkit wisely 6. Expect that it might not work: Fail safely 7. How do you know it works? Sanity checks and unit tests 8. When it doesn't work: Debug 9. Use version control 10. R isn't always the best choice

Good coding practices are vital for statistics and data science. In academic programs, it is important for instructors to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program