Datasets and Stories: Introduction and Guidelines

Robin H. Lock, St. Lawrence University
Tim Arnold, North Carolina State University

Journal of Statistics Education v.1, n.1 (1993)


Abstract

We describe the purpose of the "Datasets and Stories" section of this journal. Guidelines for submitting datasets and articles to this section are discussed. Instructions are provided for retrieving data from the JSE data archives.

1. Introduction

The purpose of this section is to provide a forum for exchanging interesting data and discussing ways that such data can be used effectively in teaching statistics. In each issue, we intend to feature one or two datasets with full articles describing their use. In addition, an archive of these and other datasets has been established as a resource for readers. Below we describe procedures for accessing these datasets and guidelines for submitting your favorite data.

IMPORTANT: The success of this section is critically dependent upon the willingness of its readers to contribute both interesting datasets and descriptive articles.

2. Guidelines for Submitting Data

At least two files are associated with each archived dataset. A "doc" file should contain adequate documentation to explain the structure of the data, give the source, describe all variable codings, provide sufficient narrative to put the data in context, and suggest some interesting questions to pursue. A blank template for such a documentation file is stored in the data archives and appears as an appendix below.

A second "dat" file contains the raw data as a flat ASCII text file. The "doc" file should contain any format information needed to process the raw data by standard computer packages. In some cases, a dataset might require more than one raw data file.

These two files are required for a dataset to be entered into the data archive. Some contributors will, in addition, have experiences to share using the dataset in the classroom. We encourage these contributors to write an article for the "Datasets and Stories" section of JSE.

3. Guidelines for Submitting Dataset Articles

An article for the "Datasets and Stories" section should be an expansion of the narrative which is found in the "doc" file. It should follow the general guidelines for any JSE article and will be subject to a similar review process. Authors are encouraged to emphasize the "story" aspect of this section by elaborating on the circumstances and questions which led to the collection of the data. We also encourage descriptions of creative ways the data might be used in teaching statistics, particularly those that are based on actual experiences.

4. Criteria for Suitable Datasets

We hesitate to define in advance what are or are not "good" datasets, but several criteria will be considered before making data available in the archives.

(a) Copyright issues. It is the responsibility of the contributor to secure any permissions needed to make the data freely available to all.

(b) Reality. In general, we prefer "real" as opposed to "artificial" data, although we acknowledge the usefulness of some well-crafted "fake" data in certain teaching situations.

(c) Size.Very large datasets (e.g., greater than 1 megabyte of storage) are discouraged unless they have particularly interesting pedagogical appeal. On the other hand, very small datasets (e.g., a two-way table demonstrating Simpson's paradox) may not require computer analysis, but are still useful examples to have for teaching and should be included in the archives.

(d) General appeal. We are seeking datasets which other instructors might find useful. While that does not exclude examples which are specific to a given discipline, we caution contributors to avoid technical jargon and arcane situations which might be accessible or appeal to only a very limited audience of students.

(e) Other JSE articles. Authors of other articles in this journal may choose to make raw data relevant to their articles available through the JSE data archives.

(f) Textbook data. In general, datasets appearing in textbooks would require specific permission from the publisher in order to be included in the JSE data archives. We will consider requests from authors or publishers to make data files for an entire text available through the JSE data archives.

5. Accessing Archived Datasets

Documentation and data files are retrievable through e-mail by sending a message to the address:
archive@jse.stat.ncsu.edu

Both the "doc" and "dat" files are found with a common root name in the directory "jse/data". Thus a typical message to retrieve a "doc" or "dat" file should look like

send jse/data/93cars.doc
send jse/data/93cars.dat.txt

A special index file (http://jse.amstat.org/archive.htm) contains a listing of datasets currently available in the JSE data archives. Descriptive articles (if available) are found in the appropriate JSE volumes.

To serve as an example for submissions to the "Datasets and Stories" section, this issue includes a description by Robin Lock of some data on 1993 model automobiles. The full article is found at http://jse.amstat.org/v1n1/datasets.lock.html. The raw data and documentation are at http://jse.amstat.org/datasets/93cars.dat.txt and http://jse.amstat.org/datasets/93cars.txt.

In future issues we will use this space to list new additions to the JSE data archives and to direct readers to descriptive articles in the "Datasets and Stories" section.

6. Contributions and Comments

Data for the archives, articles for the "Datasets and Stories" section, and questions or suggestions should be directed to either of the section editors:

Robin H. Lock
Mathematics Department
St. Lawrence University
Canton, NY 13617
(315) 379-9021 (office)
(315) 379-5804 (fax)

rlock@stlawu.bitnet

Tim Arnold
Department of Statistics, Box 8203
North Carolina State University
Raleigh, NC 27695-8203
(919) 515-1927 (office)
(919) 515-7591 (fax)
arnold@stat.ncsu.edu


Addendum (added July 7, 1999)

In November 1998, "doc" files were renamed "txt" files to avoid confusion with Word files. In July 1999, some obsolete file names and links in this article were updated.

E-mail access to data and documentation files has been replaced by access through the World Wide Web. Thus the e-mail instructions at the beginning of Section 5 of this paper are no longer correct.


Addendum (added November 2010)

In November 2010, Datasets and Stories Editor Dex Whittinghill made a change to the template for data documentation files. Please use the November 2010 Updated Template for Data Documentation Files that supersedes the former Appendix A in the original Lock and Arnold paper. Do not use the template referenced in the original Lock and Arnold paper.



November 2010 Updated Template for Data Documentation Files

This form is available at http://jse.amstat.org/v18n3/datasets_template.htm

NAME: A descriptive name for the dataset file (.txt or .dat.txt)
TYPE: e.g., Random sample, Census, Time series, Designed experiment,...
SIZE: Number of observations, number of variables
ARTICLE TITLE: Title of the article, when appropriate

DESCRIPTIVE ABSTRACT:
A brief (no more than 10 lines) description of the dataset.

SOURCES:
Acknowledge any published data sources or give brief description of origins of the data.

VARIABLE DESCRIPTIONS:
Provide a "key" for reading the ASCII data file. Explain how the data is delimited (tab, comma, space, etc.),any variable codings (including missing values) and/or measurement units.

SPECIAL NOTES:
Describe any special circumstances which should be brought to the attention of persons attempting to analyze the data.

STORY BEHIND THE DATA:
A brief narrative describing the origins of the data and the reasons they were collected. This is a good place to supply any background needed to understand the underlying variables, describe relevant issues, and suggest questions which might be of interest. This and the next section should be fairly concise. If you find them getting too long -- it's time to write a full "Datasets" article!

PEDAGOGICAL NOTES:
Suggest some ways an instructor might use the data in class. Describe any interesting features and/or statistical concepts which are well illustrated.

REFERENCES:
Include any references not in the SOURCES section.

SUBMITTED BY:
Name
Affiliation
Surface address
e-mail address

(This gives you credit and provides a source for instructors who find the data useful to get clarifications if needed.)


Appendix

EDITOR'S NOTE: The form below is no longer in use. Use Nov. 2010 update saved at http://jse.amstat.org/v18n3/datasets_template.htm


A Template for Data Documentation (DOC) Files

This form is no longer in use.

NAME: A descriptive title
TYPE: e.g., Random sample, Census, Time series, Designed experiment,...
SIZE: Number of observations, number of variables

DESCRIPTIVE ABSTRACT:
A brief (no more than 10 lines) description of the dataset.

SOURCES:
Acknowledge any published data sources or give brief description of origins of the data.

VARIABLE DESCRIPTIONS:
Provide a "key" for reading the ASCII data file. Explain any variable codings (including missing values) and/or measurement units.

SPECIAL NOTES:
Describe any special circumstances which should be brought to the attention of persons attempting to analyze the data.

STORY BEHIND THE DATA:
A brief narrative describing the origins of the data and the reasons they were collected. This is a good place to supply any background needed to understand the underlying variables, describe relevant issues, and suggest questions which might be of interest. This and the next section should be fairly concise. If you find them getting too long -- it's time to write a full "Datasets" article!

PEDAGOGICAL NOTES:
Suggest some ways an instructor might use the data in class. Describe any interesting features and/or statistical concepts which are well illustrated.

REFERENCES:
Include any references not in the SOURCES section.

SUBMITTED BY:
Name
Affiliation
Surface address
e-mail address

(This gives you credit and provides a source for instructors who find the data useful to get clarifications if needed.)


Return to Table of Contents | Return to the JSE Home Page