Keywords: reproducible research, SEER registry data, Rstudio
The SEER Program of the National Cancer Institute (NCI) provides publicly available information on cancer data. These registry data contain more than 9 million records for the last 4 decades. Generating analysis reports using the SEER data could be intensively iterative when implementing changes in a target study population over time, such as choosing particular cancer type, years of diagnosis, ethnicity/race, or age groups, etc. To handle this challenge we developed an R program in companion with Rstudio to generate reports. Reports in forms of HTML, PDF, MS Word, and MS PowerPoint with built-in R code will be presented and their strengths and weaknesses will be summarized. In addition, we will provide examples and codes for drawing customizable geo-maps using cartographic boundary shapefiles in the same R environment, as it might be of investigators’ interest to accompany a report visually for spatial data. Using our R program will enhance the integrity of statistical analysis. We hope that this program increases the efficiency of statisticians’ time, reduce potential errors, and enhance research reproducibility in the SEER data analysis.