Online Program

Return to main conference page
Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments

Last Name:

Abstract Keyword:



Viewing Short Course (full day)s onlyView Full Program
Thursday, February 15
SC1 Introduction to Big Data Analysis
Thu, Feb 15, 8:00 AM - 5:30 PM
Salon A
Instructor(s): Fulya Gokalp Yavuz, Yildiz Technical University; Mark Daniel Ward, Purdue University
This one-day introductory workshop is geared toward CSP participants who want to revitalize or improve their data analysis skills, especially with an emphasis on big data. Ward and Gokalp Yavuz will present tools and techniques for these most fundamental, low-level aspects of data analysis. We are well-versed at teaching such techniques to students who have no background in data analysis or programming. This workshop will bring people up to speed with powerful techniques for data analysis. This one-day course has no prerequisites. This workshop will be hands-on and driven by examples, using large data sets. The intended participants for the course are people who work in a data-driven environment and have an increasing need to perform aspects of large data analysis. Before data is gathered and organized, a great deal of data manipulation is necessary, especially for working with big data sets. Sometimes the data need to be scraped from remote sources, and then parsed into more natural forms. This process often involves munging and cleaning the data. The need to be able to reproduce and reliably verify all of the methods used for the data wrangling is more important than ever.

Outline & Objectives

R will be the main tool utilized in the workshop. The workshop is geared toward practitioners with (perhaps) only a limited knowledge of R, or even no knowledge of R at all. For instance, someone who has previously used (only) Excel, SAS, or Tableau for data analysis is a perfect candidate for this all-day immersive workshop. We endeavor to use R and its XML scraping and parsing libraries for pulling raw data from disparate sources on the internet, and wrangling them into forms that are amenable for data analysis.

The entire workshop will be example-driven. Participants should bring a laptop computer (Mac, Windows, and UNIX are all welcome). We will work in RStudio. Instructions for installing the necessary software can be sent to the participants before the workshop starts. We will use R Markdown for creating reproducible documents.

By the end of the one-day workshop, participants will have learned how to scrape data sets from the web, parse the desired portions of the data, wrangle it into a desired form for data analysis, and also perform some cleaning and verifying of the data. Reproducible paradigms and reliability will be emphasized throughout the workshop.

About the Instructor

Dr. Mark Daniel Ward is an Associate Professor of Statistics at Purdue University. Ward has years of experience teaching fundamental data analysis techniques to students who often have no previous experience with such tools. He emphasizes new computational tools, including R, data visualization, UNIX, bash shell scripting, regular expressions, SQL, XML, etc. Ward firmly believes in team-oriented environments for learning data analysis. He coordinates the Statistics Living Learning Community at Purdue, a $1.5 million NSF grant in which students are immersed in a year-long data analysis environment that blends the undergraduate Statistics coursework with research opportunities, professional development, extracurricular data analysis activities, etc.

Dr. Fulya Gokalp Yavuz is currently a Post-Doc in the Department of Statistics, Purdue University. She has been working with Dr. Ward since July 2016 on training in Data Science and she is enthusiastic on teaching new data science topics with new methods. She has teaching experience on statistical topics such as Multivariate Statistics and Statistical Programs such as R.

Relevance to Conference Goals

This workshop fits squarely within Theme 3 of the CSP workshop, namely, "Big Data and Data Science". The workshop should, according to CSP's description, "help practitioners working in these fields stay current with state-of-the-art methods". The workshop should be especially appealing to people who yearn to move into more data-oriented tasks in the workplace, but who have not (yet) moved beyond traditional spreadsheet or database tools for data analysis.

The workshop will have a learning-by-doing methodology, in which the participants will be actively learning, rather than listening to lectures.

By understanding the fundamental tools for data analysis, the participants will be better enabled to move onwards to statistical methods after having learned a great deal about the computational resources that are needed for reproducible data wrangling at the earliest stages of the data analysis cycle.

Software Packages

R, RStudio, XML libraries, R Markdown. Ward and Gokalp Yavuz will provide computational resources for the participants to use. If participants bring a laptop computer, they can utilize our computational environment within a web browser. There is no need to install any software before the workshop, and no previous background is required.

SC2 An Introduction to D3.Js: From Scattered to Scatterplot
Thu, Feb 15, 8:00 AM - 5:30 PM
Salon B
Instructor(s): Scott Murray, O’Reilly Media

Download Handouts
Interested in coding data visualizations on the web, but don't know where to start? This workshop will have you transforming data into visual images in no time at all, starting from scratch and building an interactive scatterplot by the end of the session. We'll use d3.js, the web's most powerful library for data visualization, to load data and translate values into SVG elements — drawing lines, points, and scaled axes to label our data. We’ll learn how to use motion and visual transitions, and introduce simple interactivity to make our charts more explorable.

All methods and examples will be up-to-date for the current version of D3 (4.x as of this writing).

Outline & Objectives

Audience and Prerequisites:

Intended for absolute beginners new to D3, yet with some prior programming experience (though not necessarily JavaScript), and some prior web experience (HTML, CSS). Participants should also be comfortable working with basic data formats (such as CSV files).


- Intro to D3 as a tool
- Set up with empty page template
- Selecting elements
- Creating elements
- SVG images and elements
- Data in JavaScript (arrays)
- Binding data to create elements
- Using transitions between states
- Using scales to position elements
- Adding axes
- Transitions and motion
- Interactivity

About three-quarters of the day will be devoted to the core concepts listed above. The remainder of our time will be spent on topics most relevant to participants. This could include additional topics (different data types, visual or interaction design concerns, geographic maps) or small group exercises and consultation for participants sharing similar concerns.


Participants will leave comfortable using D3 to load data into the browser and map that data to visual elements.

About the Instructor

Scott Murray is a designer, creative coder, and artist who writes software to create data visualizations and other interactive phenomena. His work incorporates elements of interaction design, systems design, and generative art. Scott is in the Learning Group at O’Reilly Media, is author of the O’Reilly title “Interactive Data Visualization for the Web” (the second edition of which will be published in 2017), and has presented two video courses on D3. Scott is also affiliated with the Visualization and Graphics Lab at the University of San Francisco, where he has taught data visualization and interaction design. He is also a Senior Developer for Processing, and is writing a new book with O’Reilly, “Creative Coding and Data Visualization with p5.js: Drawing on the Web with JavaScript.” Scott earned an A.B. from Vassar College and an M.F.A. from the Dynamic Media Institute at the Massachusetts College of Art and Design. His work can be seen at

Relevance to Conference Goals

By the end of this course, participants will be familiar with the most powerful tool for web-based data visualization, and therefore in a good position to better communicate their findings to a global audience. D3 familiarity is in demand with employers, and the skills learned in this session can be applied immediately, for a wide range of projects.

That said, please note that D3 is intended for custom visualization—it has no “templates” or preset “views” or chart types. Exploratory tools like Tableau already serve this purpose. This course is about learning the core concepts of D3, so you can use it to design and develop your own highly customized, interactive data visualizations.

Software Packages

This course will rely heavily on:

- Web standard technologies built into every browser (HTML, CSS, SVG, JavaScript)
- D3 (free, and will be provided; also see

Please bring to the workshop a laptop with the following installed:

- Chrome
- A code editor (I recommend Atom, which is free)

You will also need the ability to run a local web server. You can accomplish this either by:

- Installing a web server application (such as MAMP or WAMP). This is the friendliest, GUI approach, but requires you to download and install everything in advance of the course.
- Use Python or another tool to run a simple server via terminal commands. This requires no additional installation if you are using Mac OS.

Code examples will be distributed at the event. All code examples will be updated and tested with the current version of the the software (version 4.x at the time of this writing) as of February 2018.