Online Program

Return to main conference page

All Times ET

Thursday, June 3
Software & Data Science Technologies
Software and Technology Shaping Data Science
Thu, Jun 3, 1:10 PM - 2:45 PM
TBD
 

The Importance of Good Coding Practices for Data Scientists (309699)

Maria-Cristiana Girjau, Amherst College 
Nicholas Jon Horton, Amherst College 
*Randall Pruim, Calvin University 

Keywords: R, code style, data science education, statistical practice

Many students, as well as some practitioners, are reluctant to adopt good coding practices. As long as the code "works", they are satisfied and ready to move on. But how do they know that it "works"? For data analysts, these issues are an important part of data science practice. Recent reports have highlighted the role of code standards as a component of "data acumen".

- Good coding practices lead to more reliable code. - Good coding practices save more time than they cost. - Good coding practices are important, even for beginners. - Good coding practices focus attention.

We will illustrate key aspects of coding practices (both good and bad) and suggest ways that these practices can be included in the statistics and data science curricula. Examples will be taken from a variety of sources and focus primarily on the R programming language.

The lessons distilled from these examples can be organized into a "top ten list" of good coding practices:

1. You gotta have style: The importance of following a style guide; 2. Copy and paste is not a workflow; 3. Don't impose coding paradigms from other languages; 4. Take advantage of R Markdown for constructing documents and webpages; 5. Choose wisely: Sometimes less is more; 6. Expect that it might not work: Fail safely; 7. How do you know it works? Sanity checks and unit tests; 8. When it doesn't work: Learn to use a debugger; 9. Get (version) control of the situation; 10. R may not always be the best choice.

Good coding practices are vital for good statistics and data science. In academic programs, it is particularly important to begin establishing these practices early, to reinforce them often, and to expect students to adopt more and more habits of good programmers as they progress through their programs of study. To avoid student frustration, it is important that instructors hold themselves to a higher standard while gently guiding students towards better coding practices.