Mary C. Meyer

University of Georgia

Journal of Statistics Education Volume 14, Number 1 (2006), ww2.amstat.org/publications/jse/v14n1/datasets.meyer.html

Copyright © 2006 by Mary C. Meyer, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

**Key Words:**.

Upon complaining about this state of affairs, I was told by sales representatives in two stores that boys actually had wider feet than girls, so needed wider shoes. Being very skeptical, I thought I would test this claim. The data I collected have subsequently been used in my statistics classes at several levels from introductory statistics to linear models.

I asked the children to design a study to see if boys have wider feet than girls.
Of course we could not measure only foot width. I asked the class, “what if the
boys in this class are, on average, bigger than the girls, and wear bigger shoe sizes?” That way,
the average width for the boys’ feet might be bigger than that for the girls, but
maybe not *for a given shoe size*. After some discussion, we all agreed to
measure foot lengths as well, because shoes are fit according to length.

Should we collect any other data about the kids? One of the students suggested that age of the child should be important, because older kids have bigger feet. After some lengthy discussion, the kids decided that both birth month and year should be recorded, because “some kids were nine and a half, and others were only nine.”

Then we turned to the problem of the actual measurements. The instrument used to measure feet was constructed to resemble the familiar foot-sizing device often seen at shoe stores. It was cut out of cardboard and had a ruler glued to the surface. A block of wood was fastened to the end, behind the ruler, for the children to place their heels against. The length and width were to be measured in centimeters. But should we measure right feet, left feet, or both? The kids decided, “measure the longer foot” and I went along with that. Someone wanted also to record whether the left or right foot was longer, which led to some discussion about whether kids who were right handed had right feet that were bigger. Feeling that we were straying from the topic, I suggested that we also record whether each child was right or left handed, so we could get on with the measuring.

Everyone was willing to step up to the measuring device, and hold very still. After yet more discussion of how much weight should be on the foot measured, the kids agreed to stand as evenly as possible on both feet. The width of the foot was the widest measurement perpendicular to the length. After we went through all the feet in my daughter’s class, we invited another class to get measured as well, so we ended up with measurements for 39 fourth graders.

Figure 1. Widths of kids’ feet, for boys and girls. The horizontal lines mark the average width for each group.

Boys | Girls | |
---|---|---|

mean | 9.190 | 8.784 |

standard deviation | 0.4518 | 0.4936 |

sample size | 20 | 19 |

If we let represent the average foot width for fourth-grade boys, and represent the average foot width for fourth-grade girls, we can write the appropriate hypotheses as

*H _{0}*: =
, versus

A two sample *t*-test with samples assumed to be independent can be performed
using the summary statistics. Note that the pooled sample standard deviation is *S _{p}* = 0.4725
and the

Of course, as the fourth graders figured out with some prompting, it could be that the boys are actually just larger than the girls, on average. In fact, this seemed to be the case just from glancing over the group. We need to control for foot length in the model.

To get an idea of the relationship between foot width and foot length for boys and girls, we examine the scatter plot shown in Figure 2. The boys’ measurements are represented as circles and the girls’ as triangles. We see right away that the points in the upper right (larger measurements of both width and length) tend to be for boys, while the points in the lower left tend to be for girls. The question of “do boys have wider feet than girls” does not have so clear an answer once foot length is considered.

Figure 2. Widths of kids’ feet, plotted against length, for boys and girls.

If we assume that foot width increases linearly with length over the range of these data, then we can fit a standard analysis of covariance model to the data. The model can be written as

where

*y _{i}* is the foot width in centimeters for the

*x _{i}* is the foot length in centimeters for the

*d _{i}* is a dummy variable so that

1 if i^{th} child is a boy | |

d =_{i} | |

0 if i^{th} child is a girl |

is the random variation associated with the *i*^{th}
measurement.

We assume that the errors have mean zero, and are independent and normally
distributed with equal variances.
Note that for girls, we have *d _{i}* = 0, so that the equation representing the
relationship is . For boys, the equation representing
the relationship is .
We interpret the parameters as

The intercept is the expected foot width for girls when the foot length is zero; not a useful interpretation!

The slope represents the expected number of centimeters increase in foot width, for every one centimeter increase in foot length, for 4th grade children.

The parameter represents the average difference in mean foot width for boys and girls, for a given length of foot.

The hypotheses of interest are then

*H _{0}*: = 0, versus

Source | SS | df | MS | F-stat | p-value |
---|---|---|---|---|---|

Model | 4.535 | 2 | 2.267 | 15.31 | <0.0001 |

Error | 5.333 | 36 | 0.148 | ||

Total | 9.868 | 38 | |||

Parameter | estimate | std err | t-stat | p-value | |

Intercept | 3.851 | 1.113 | 3.46 | 0.0014 | |

Foot Length | 0.221 | 0.0496 | 4.45 | <0.0001 | |

Boy | 0.233 | 0.129 | 1.80 | 0.0806 |

The regression results are shown in Table 2. The *R ^{2}* for
the model is 4.535/9.868 = 0.459, so that 45.9% of the variation in foot width is explained
by the linear relationship with foot length and gender. The foot length is a
highly significant predictor of foot width, but the gender variable is of
borderline significance. Remembering that we had a one-sided test, we divide
the two-sided

Figure 3. Widths of kids’ feet, plotted against length, for boys and girls, with least squares regression function estimates superimposed.

The estimate of the model variance is 0.148; using this we can estimate how much foot widths for boys or girls vary. We expect about 95% of foot widths to be within two model standard deviations of the mean width, this range is about 1.54 centimeters. Finally, a normal probability plot shows no deviations from the assumed error distribution, and other residual plots similarly support the model assumptions.

The results surprised me, as I was expecting not to be able to reject the null hypothesis at = 0.05. The power of the ANCOVA test is large, about 0.9 if = 0.25. Note that the estimated average difference in mean width between boys’ and girls’ feet, for a given length, is about 2.3 millimeters. The difference in actual shoe widths (measured at local shoe stores) can be seen to be almost half a centimeter, for sizes in the appropriate range. The difference in the average of measured foot widths, while perhaps of statistical significance, may not be of practical significance, considering that the difference is well within the estimate of the model standard deviation. The variation of foot widths within gender is more substantial than the variation between genders.

The data set can be used higher level data analysis classes to make a point about keeping the purpose of the study in mind, when the model is chosen. This is a good example of a case where an observational study is entirely appropriate, and possible confounders are irrelevant.

For my advanced classes, including linear models and consulting, I tell the students the story, including the purpose of the study and the motivation for collecting the data, and give them the data set with no clues about the hypotheses to be tested. Typically, the students do a sophisticated variable selection routine, using all variables including whether or not the child is right or left handed. They discover, for example, that age is a significant predictor of foot width. They present the “best model” in terms of minimizing some criteria such as AIC. They discuss covariates and confounding, two issues I tend to emphasize as very important in data analysis.

However, this more sophisticated analysis does not answer the purpose of the
study! When selecting a shoe size for a child, the length of the foot is
measured. No one asks how old the child is (except perhaps in the interests of
polite conversation), or whether the child is right or left handed. We should
not build a model using these variables, because the only issue concerns foot
width, foot length, and gender. We don’t care about possible confounding
factors, because we do not wish to make a cause and effect conclusion about
feet. We simply want to know, are shoe manufacturers justified in their
decision to make boys’ shoes wider than girls’ shoes, for the same length feet.
Even though age is a significant predictor of foot width, it should *not* be
included in the model.

In conclusion, we estimate that the mean foot width for fourth-graders is about 2.3 millimeters larger for boys, and this size is of borderline statistical significance. It would be interesting to see if a repeat study would again reject the one-sided null hypothesis at = 0.05.

Shoe size charts for men and women can be found on the web at www.bravesurf.com/knowledge/shoe_sizing.htm#D.

For example, it is found that a women’s size 9.5 corresponds to a foot length of 25.4 centimeters, with standard width (B) of 8.6 centimeters, while a man’s size 8 is for the same foot length but a standard width (D) of 9.7 centimeters. Determining if this difference in adult shoe widths is reasonable for physiological differences between men and women would be a nice exercise for a statistics class of about 40 students. Perhaps an ANCOVA model would be appropriate for adult feet as well. The exercise of collecting data, formulating hypotheses, and doing the analyses provides a useful synthesis of various topics covered in a statistics classroom.

Mary C. Meyer

Department of Statistics

University of Georgia

Athens, GA

U.S.A.
*mmeyer@stat.uga.edu*

Volume 14 (2006) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications