Many students and practitioners are reluctant to adopt good coding practices as long as the code "works". However, code standards are an important part of modern data science practice, and they play an essential role in the development of "data acumen". Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We illustrate key aspects of coding practices (both good and bad), focusing primarily on the R language. The lessons distilled from the examples are organized into a top ten list:
1. Follow a style guide 2. Copy and paste is not a workflow 3. Don't impose coding paradigms from other languages 4. Use R Markdown for documents and webpages 5. Choose your toolkit wisely 6. Expect that it might not work: Fail safely 7. How do you know it works? Sanity checks and unit tests 8. When it doesn't work: Debug 9. Use version control 10. R isn't always the best choice
Good coding practices are vital for statistics and data science. In academic programs, it is important for instructors to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students.
|