Journal of Statistics Education Volume 14, Number 3 (2006), jse.amstat.org/v14n3/datasets.kern.html
Copyright © 2006 by John C. Kern II, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.
Key Words:
In this research, we use the data collected from 6000 rolls in combination with a multinomial - Dirichlet model to make Bayesian inference on the configuration probabilities. Our analysis is intended for students of Bayesian inference, usually advanced undergraduate mathematics/statistics majors or first-year statistics graduate students - as well as their instructors. Aside from providing an entertaining multinomial-Dirichlet application, this analysis is of particular pedagogical interest because specification of prior parameters comes so naturally: The point values for each configuration as defined on the game’s packaging, scorepad, and instructions are used to directly determine the parameter values of the Dirichlet prior distribution. Before providing the details of this model and prior specification, it is necessary to clarify both the rules of the game and the data collection process.
Any turn begins with the rolling of both pig-shaped dice. The configuration of the pigs in this roll, or any other, must fall under exactly one of the following three categories:
If the initial roll on a turn is positive-scoring, the player may choose to immediately roll the pair of pigs again. Such a choice remains available to the player provided the previous roll in that turn was positive scoring. In this way, the points a player earns on their turn is the sum of the point values of an unbroken string of positive-scoring rolls. The end of a player’s turn is determined by the first occurrence of the following three events:
End-Turn Event I: The roll is zero-scoring. In this case the player loses all points accumulated on that turn and must pass the pigs to the next person.
End-Turn Event II: The roll is positive-scoring and the player chooses to pass the pigs to the next person. In this case the player retains all points accumulated on that turn.
End-Turn Event III: The roll finds the pigs in physical contact with each other. In this case the player loses all points accumulated on that turn as well as the points accumulated on all previous turns. The pigs are then passed to the next person.
It is worthwhile to note that this game does allow a player to incorporate strategy, but only through End-Turn Event II. For example, a player may choose to roll the pigs only once per turn. This is the most conservative strategy in the sense that points earned on a turn are never at risk of being lost to a zero-scoring roll. Conversely, the extreme risk strategy views each turn as an all-or-nothing opportunity. Those adopting this strategy will continue to roll until either a non-positive-scoring roll is obtained or at least 100 points are accumulated. Each turn taken under this strategy will end with the player earning either zero points or victory. Analysis of both extreme strategies is given in Section 2.3. Before presenting the data from these 6000 rolls (and the collection method), we now detail the configuration-to-point-value mapping.
Position | Name | Description |
---|---|---|
1 | Dot Up | Pig lies on its left side |
2 | Dot Down | Pig lies on its right side |
3 | Trotter | Pig stands on all fours |
4 | Razorback | Pig lies on its spine, with feet skyward |
5 | Snouter | Pig balances on front two legs and snout |
6 | Leaning Jowler | Pig balances on front left-leg, snout, and left-ear |
Position | |||||
---|---|---|---|---|---|
1 (Dot Up) | 2 (Dot Down) |
3 (Trotter) | 4 (Razorback) |
5 (Snouter) | 6 (Leaning Jowler) |
Although we have just identified the positions assumed by the roll of a single pig, we remind the reader that a player will always roll both pigs. The points awarded to (or taken from) a player are therefore based on the combined positions of the rolled pigs. If, for example, one pig lands Dot Up, and the other lands Trotter, then the player earns 5 points. Shown in Table 3 are the point values awarded for all of the thirty-six possible position combinations, as specified by the instructions. This table assumes the pigs have (arbitrarily) been assigned labels of “Pig 1” and “Pig 2,” and that once rolled, the pigs are not touching each other. Notice that higher point values are given darker background shading.
Pig 1 Position | ||||||
---|---|---|---|---|---|---|
Pig 2 Position | 1 (Dot Up) | 2 (Dot Down) |
3 (Trotter) | 4 (Razorback) |
5 (Snouter) | 6 (Leaning Jowler) |
1 (Dot Up) | 1 | 0 | 5 | 5 | 10 | 15 |
2 (Dot Down) | 0 | 1 | 5 | 5 | 10 | 15 |
3 (Trotter) | 5 | 5 | 20 | 10 | 15 | 20 |
4 (Razorback) | 5 | 5 | 10 | 20 | 15 | 20 |
5 (Snouter) | 10 | 10 | 15 | 15 | 40 | 25 |
6 (Leaning Jowler) | 15 | 15 | 20 | 20 | 25 | 60 |
From Table 3 we see that only two of the thirty-six position combinations are zero-scoring; any roll that finds the pigs lying on opposite sides results in End-Turn Event I. Aside from the two configurations (both pigs lying on the same side) worth 1 point, all other configurations are positive-scoring and worth some multiple of 5 points. Note that a positive-scoring roll must yield a point value from the set {1, 5, 10, 15, 20, 25, 40, 60}. We will refer to Table 3 often, especially in Section 2 when determining a prior distribution for a Bayesian multinomial data model. Before discussing this model, we finish this introduction by describing the data and the method by which it was collected.
Due to variability in rolling technique across people, we decided to standardize the rolling technique by using a trap-door style rolling apparatus. This apparatus was constructed in such a way as to impart on the pigs realistic rolling movement. It consisted of nothing more than a four-inch square sheet of sturdy cardboard, well-creased to divide its area into two equal size rectangles. This sheet was then placed on a level, eight-inch tall wooden platform, such that the crease was parallel to an edge of the platform. Rolling the pigs was accomplished by placing the pigs on one half of the crease-divided cardboard (in the trotting position, 0.25 inches apart, facing away from the crease and toward the parallel platform edge), and using the other half of the cardboard as a handle to push-slide the cardboard toward the parallel platform edge - making sure to always keep the crease and platform edge parallel. When the cardboard is moved far enough for the crease to overlap the edge of the platform, the pushing-sliding stops, and the weight of the pigs cause their half of the creased cardboard to drop in trap-door fashion. Even with no pigs on the cardboard, the crease was such that the cardboard weight itself would cause the drop. The other half of the cardboard is anchored securely under the fingers of the roller; hence only the pigs tumble to the table below. In this way, the pigs are not simply dropped to the table. Rather, they are dropped with the forward momentum gained from the pushing-sliding of the creased cardboard. Rolls that saw either pig touch a platform support were ignored.
Variation in the rolling technique is introduced from a variety of sources. A source of variation we intentionally impose is that of platform height: One person rolled the pair of pigs 3000 times from the aforementioned eight-inch tall platform, while the other pair were rolled 3000 times from a similar five-inch tall platform. Natural sources of variation not imposed by the author include:
This analysis treats these sources of variation - imposed or natural - as negligible.
Shown in Table 4 are the number of times each of the thirty-six possible position combinations were observed in the 6000 rolls. When comparing this data with the scores from Table 3, we see combinations that occur more often are generally associated with lower point values. Note that the frequencies in this table sum to 5977; exactly 23 of the 6000 rolls resulted in the two pigs touching each other (corresponding to End-Turn Event III). If on a given roll we treat the positions of the black and pink pigs as discrete random variables, then Table 4 can be viewed as their empirical joint distribution by dividing each cell by 6000.
Pink Pig Position | ||||||
---|---|---|---|---|---|---|
Black Pig Position | 1 (Dot Up) | 2 (Dot Down) |
3 (Trotter) | 4 (Razorback) |
5 (Snouter) | 6 (Leaning Jowler) |
1 (Dot Up) | 573 | 656 | 139 | 360 | 56 | 12 |
2 (Dot Down) | 623 | 731 | 185 | 449 | 58 | 17 |
3 (Trotter) | 155 | 180 | 45 | 149 | 17 | 5 |
4 (Razorback) | 396 | 473 | 124 | 308 | 45 | 8 |
5 (Snouter) | 54 | 67 | 13 | 47 | 2 | 1 |
6 (Leaning Jowler) | 10 | 10 | 0 | 7 | 1 | 1 |
This data is used in the following section to estimate the probabilities of observing each of the eight possible positive scores, a zero-scoring roll, and a roll in which the pigs are touching each other.
(1) |
and k is restricted to the ten integer set S defined by
Thus, for example, represents the probability of observing a 15-point roll, the probability of a 0-point roll (End-Turn Event I), and the probability of End-Turn Event III.
Bayesian analyses distinguish themselves from their classical counterparts by incorporating into parameter inference information supplied by the researcher before data have been observed. This information is presented in the form of a probability distribution - called the prior distribution - on the parameter(s) of interest. In our application, the parameters of interest are the scoring probabilities = and the data are the scoring outcomes of the 6000 rolls. What makes this analysis so appealing from a Bayesian pedagogical perspective is that through the very scores it assigns to each configuration, the game provides prior information about that can be easily expressed through a probability distribution on . Too often are students of Bayesian inference confronted with examples where prior distributions are presented with no motivation/explanation except a distancing phrase such as “suppose our prior distribution for is ...,” or, “expert opinion suggests the prior distribution for to be ... .” In this example, simply reading the instructions empowers one to give “expert opinion” on the unknown scoring probabilities .
The method by which the provided scores are used to make a priori statements about may differ according to individual philosophy; we believe readers will find the approach presented here among the most reasonable and intuitive. Based on three assumptions, we translate the provided scores into point estimates for the scoring probabilities. The first of these assumptions is the least imposing (and most intuitive):
Assumption 1: Positive roll scores and their corresponding roll probabilities are inversely related.
Thus, a 60-point roll is less likely than any other positive-scoring roll (as it has the highest point value); a 1-point roll is more likely than any other positive-scoring roll (as it has the lowest point value); the 5-point roll is more likely than the 10-point roll, but less likely than the 1-point roll, and so on. Our second assumption about tells exactly how much more likely a k-point roll is, relative to a 1-point roll for k > 0:
Assumption 2: The probability of observing a 1-point roll is k times greater than the probability of observing a k-point roll, for k > 0.
The yet unmentioned zero-scoring and End-Turn Event III rolls are accounted for in our final assumption:
Assumption 3: 1-point rolls and zero-scoring rolls have the same probability, while a 1-point roll is c times more likely (c > 0) than a k = -1 point roll.
That = follows naturally from our willingness to assume the dot-up position has the same, or nearly the same probability as the dot-down position. If one assumes the dot-up and dot-down probabilities differ greatly, then take = w* for some w > 0. As for the relationship between and , we are inclined to set c equal to a number near 60 in an effort to equate the chances that the most detrimental event (score of k = -1) and the most beneficial event (score of k = 60) occur.
From these assumptions, we formally recognize the relationship between and each of the nine remaining probabilities as follows:
(2) |
(3) |
(4) |
Using these relationships, we express the sum in (1) in terms of only to yield
(5) |
Solving (5) for gives our a priori point estimate of (and hence of ):
(6) |
Using this value of with (2) and (4) gives the following prior point estimates for the remaining eight probabilities:
(7) |
We remind the reader that these point estimates come directly from the scores supplied by the game as well as our assumptions (2), (3), and (4) about the relationships among the scoring probabilities . When specifying the prior distribution on , we will choose parameter values that set the marginal expectation of each equal to the corresponding point estimate in (6), or (7). We choose the functional form of the prior distribution (i.e. the family of prior distributions) to be Dirichlet; the motivation and density for this distribution will be presented upon explicit presentation of the multinomial data model.
for a vector (n-1, n0, n1, ..., n60) of nonnegative integers satisfying , is
(8) |
When we view the right hand side of (8) as a function of and treat (n-1, ..., n60) as known, we obtain the multinomial likelihood function for , denoted by L():
L() | (9) |
The proportionality symbol is used to remind the reader that any function of proportional to the right hand side of (9) can be labeled “the” likelihood function, and will yield the same inference on .
Standard classical inference finds the values that maximize the likelihood function. These values are given by
as shown in various texts (see Berry and Lindgren 1996, or
Lange 1999).
Thus, the maximum likelihood estimates for the
’s are simply the respective proportions of k-point
rolls observed among the 6000. These proportions, as obtained from the
data in Table 4, are:
Note that these estimates are based completely on the likelihood
function for . Bayesian inference on
, however, is
based on the conditional distribution of given the
data. This distribution - called the posterior distribution and
denoted here by - combines information in the data
(through L()) with the prior distribution for
. We
denote the prior distribution by . Once the family
and parameter values of the prior distribution have been chosen, we
use Bayes’s Theorem to give the posterior distribution
as a function of the likelihood
L() and
prior :
Thus the conditional distribution of given the data is
proportional to the product of the likelihood function and prior
distribution for . Estimates analogous to
(10) can be
obtained from the posterior distribution by finding the marginal
expected value of each , and will reflect a balance
between the maximum likelihood estimates and the prior point estimates
for each . The following section details our choice of
.
where . Recognizing the marginal expectation
and variance of any as
makes the task of specifying a
straightforward. Specifically, we set these marginal expected values
equal to their corresponding prior point estimates in (6)
or (7) by choosing c = 50 - thereby assuming the a priori probability of k = -1 to fall
between that of k = 40 and k = 60 - and letting
for some m > 0. Note that the choice of m > 0 does not influence the
marginal expectation of any . The choice of m does,
however, influence the marginal variance of every . In
particular, because the marginal variance of is
roughly proportional to 1/m, we can express greater prior uncertainty in a
with smaller values of m, and greater prior certainty
in a with larger values of m.
It is now clear that one of the factors motivating our Dirichlet prior
choice is our ability to use the available prior point estimates for
to easily determine the values of a, up to a
strength-of-prior-belief parameter m. Inference on in the
following sections will be a function of this (now) lone prior
parameter m. We will also refer to m as a hyperparameter, as
is common for parameters of a prior distribution.
That the Dirichlet prior for is conjugate for the multinomial
likelihood is a second reason for its use. A prior
density is conjugate for a likelihood if the resulting posterior
density is of the same family as . Substituting into
(11) our likelihood from
(9) and our prior density
from (12) gives the following posterior distribution on
:
Comparing this posterior density with the prior density in
(12), we see is indeed
Dirichlet, with parameter vector b = {b-1, b0,
..., b60}, where
bk = (nk + ak).
While a Dirichlet prior on does afford an easily obtainable
(and manageable) posterior distribution for , it necessarily
reflects a somewhat limited range of prior beliefs (O’Hagan, 1994). For
example, the correlation between a particular (,
) pair, as given by our Dirichlet prior, is fixed for
every m > 0. Other prior distributions on , such as the
multivariate normal family, can eliminate such a limitation at the
cost of introducing additional hyperparameters. Ultimately,
limitations associated with use of the Dirichlet prior are deemed here
as insignificant in comparison with the ease-of-implementation
advantage its use carries. Having recognized the Dirichlet posterior
distribution for in
(15), we are now ready to make
Bayesian inference on these ten scoring probabilities.
(10)
(11) 2.2 Dirichlet Prior
We choose to be Dirichlet with parameter vector
a = a-1, a0, ... , a60.
The corresponding density is
=
(12)
(13)
(14)
(15) 2.3 Posterior Analysis
The marginal posterior expectation E[] of any
is found easily from the Dirichlet posterior in
(15):
(16) |
Note from (14) that each ak value (and hence A) is proportional to the hyperparameter m. Thus, as this strength-of-prior-belief parameter m goes to zero, the posterior point estimate of in (16) converges to . As , the posterior is dominated by the prior and E[] converges to ak/A.
Influence of m on Posterior Point Estimates
Recognizing that for fixed m
allows us to partition the unit interval into ten segments whose lengths represent the posterior expectation of . The curves shown in Figure 1 partition the unit interval - presented along the vertical axis - into ten segments so that the vertical space between two curves represents the posterior expected value of a at fixed m. As , the vertical separations between the curves correspond directly to the maximum likelihood estimates in (10), and represent posterior estimates that ignore prior information. As the vertical separations between curves correspond directly to the prior point estimates in (6) and (7), and represent posterior estimates that ignore the data collected from the 6000 rolls. For values of m greater than 15, the curves have practically converged to horizontal lines whose separations are equal to the prior point estimates.
Figure 1: Partition of the (0, 1) vertical axis based on the posterior expectation of . The partition is a function of the hyperparameter m.
By comparing the curve separations where m is near zero with those as reveals great discrepancies between the maximum likelihood (or “empirical”) estimates and the prior point estimates:
Figure 2 shows a rescaled (i.e. magnified) version of Figure 1; only the curves presented in the [0.9, 1.0] subset of the Figure 1 unit interval are shown in Figure 2. From this magnified Figure we notice the following:
Figure 2: Partition of (0.9,1.0) vertical axis as magnified from Figure 1. The partition is a function of the hyperparameter m.
We conclude from these two Figures that the scores assigned by the game - in combination with our assumptions (2), (3),and (4) - do not reflect well the relative frequencies we observed in 6000 rolls.
Posterior Predictive Distribution and Extreme Strategies
Let be the yet to be recorded score of the next roll of the pigs. We use the posterior distribution to sample from the distribution of - called the posterior predictive distribution - as follows:
Repeating steps 1 and 2 will yield a distribution of predicted scores that incorporates uncertainty in as given by the posterior distribution - each prediction we simulate comes from a Dirichlet distribution with different , as sampled from . We must also point out that such a simulation requires specification of the hyperparameter m. In the following analysis we use m = 1, as it reflects our belief that data from the 6000 rolls are more influential in estimating than the accompanying scores.
We now use 105 simulated values of to investigate the effectiveness of the two extreme strategies mentioned at the end of Section 1.1:
Extreme-Conservative: Player rolls the pigs only once per turn.
Extreme-Risk: Player continues rolling until either 100 points are accumulated, or a non-positive score is obtained.
Shown in Figure 3 are histograms representing the distribution of the number of turns necessary to obtain 100 points under the extreme-conservative strategy.
Figure 3 (left) | Figure 3 (right) |
Figure 3: Empirical distribution (left) and posterior predictive distribution (right) of the number of turns necessary to reach 100 points, assuming the conservative, one-roll-per-turn strategy.
The histogram on the left is based only on the 6000 rolls; the histogram on the right is based on 105 posterior predictive realizations of . Five number summaries for these distributions are given in Table 5.
Min | Q1 | Q2 | Q3 | Max | |
---|---|---|---|---|---|
Empirical | 10 | 18.75 | 22 | 26.25 | 61 |
Predictive Posterior | 5 | 20 | 24 | 29 | 103 |
Notice in this table the impact of our prior distribution. By choosing m = 1, we allow the prior scores to have a slight influence on our posterior for . As a result, the posterior expectations of and will be higher, on average, than their corresponding maximum likelihood estimates. This greater probability of a non-positive scoring - or penalty - roll will require more extreme-conservative turns to attain 100 points; hence the quartiles and maximum in Table 5 are greater for the distribution based on posterior predictive realizations.
The argument that our posterior with m = 1 estimates the probability of a penalty roll to be greater than our empirical estimate also implies that extreme-risk turns should yield zero points more often if based on posterior predictive realizations. Figure 4 shows the distribution of the number of points obtained before the first penalty roll as observed under the extreme-risk strategy.
Figure 4 (left) | Figure 4 (right) |
Figure 4: Empirical distribution (left) and posterior predictive distribution (right) of the number of points accumulated on a single turn before the first penalty roll is observed.
The histogram on the left is based only on the 6000 rolls; the histogram on the right is based on 105 posterior predictive realizations of . Notice from Table 6 that indeed, fewer points are accumulated before the first penalty roll when using posterior predictive realizations. Furthermore, the empirical proportion of turns resulting in 100 or more points before the first penalty roll is 0.022, while our posterior predictive realizations estimate this proportion to be 0.011.
Min | Q1 | Q2 | Q3 | Max | |
---|---|---|---|---|---|
Empirical | 0 | 1.25 | 12 | 31 | 215 |
Predictive Posterior | 0 | 0 | 10 | 25 | 199 |
The Dirichlet posterior for the ten scoring probabilities found in (15) combines information from 6000 rolls with the provided scores, and reveals that these scores do not reflect well what happens in practice. For example, there are instances where a higher scoring roll is estimated to have a substantially higher probability than a roll that scores lower: a 5-point roll is more likely than a 1-point roll, and a 20 point roll is more likely than a 15-point roll. This does not agree with the basic intuition that a roll’s score and corresponding probability should be inversely related. In this same spirit, the 60-point roll is so rare that, relative to the other roll scores, it is quite undervalued. This same argument can be made for the 40- and 25-point rolls.
Our Dirichlet posterior estimates the chance of a positive scoring roll to be 74.5%. Thus the probability that your next roll is a penalty roll is approximately 1/4. Using our Dirichlet posterior to simulate posterior predictive realizations of the next, unobserved roll score allows us to investigate the effectiveness of various strategies. The extreme-conservative strategy will yield 100 points every 24 turns, on average, while those adopting the extreme-risk strategy are expected to attain 100 points only once per 100 turns.
In the spirit of Shi 2000, we present here a brief, strategy-defining expected value calculation for Pass the Pigs®. Let X be the random variable representing the gain in score on a particular player’s next roll. Note that this gain can be negative. Possible values of X and their corresponding probabilities are given in Table 7, where S represents the player’s score before starting the current turn, and T the score gained, so far, by the player on the current turn.
x | -(S + T) | -T | 1 | 5 | 10 | 15 | 20 | 25 | 40 | 60 |
P(X = x) | 23/6000 | 1279/6000 | 1304/6000 | 2337/6000 | 508/6000 | 171/6000 | 372/6000 | 2/6000 | 2/6000 | 1/6000 |
We can now represent the expected gain in score - with just one additional toss of the pigs on the current turn - as the expected value of the random variable X. Using EX to denote this expectation, straightforward calculation gives
(17) |
This expected value suggests a simple strategy: choose to roll again only when the expected score of the next roll is positive. Since (17) is positive if and only if
this strategy dictates rolling again until T 22 when starting with zero points (S = 0). Notice that the cutoff for T does not change much with S, as End-Turn Event III is so rare. For example, starting a turn with 79 points (S = 79) dictates rolling again until T 21. Players in situations where S 80 would, of course, abandon this strategy, as (100 - S) is less than ; that is, it is possible for both (S + T) 100 and EX > 0 to occur simultaneously.
Does the starting position of the pigs have an effect on score distribution? The first 1500 rolls from both platforms saw both pigs in a “face-forward” initial position. The second 1500 rolls from the eight-inch platform had one pig in the “face-forward” initial position, and the other in the “face-backward” initial position. The second 1500 rolls from the five-inch platform saw both pigs in the “face-backward” initial position.
As our final suggestion for further analysis, we point out that sampling from the posterior predictive distribution provides an endless stream of “next roll” scores that can be used to ascertain the effectiveness of any strategy-even those conditional upon the scores of other players. This is perhaps the most engaging analysis for students, as the personal strategy of each student can be combined in a program that simulates an entire game. Several game simulations reveal the student with the most successful strategy.
Bowker, A.H., (1948), “A test for symmetry in contingency tables,” Journal of the American Statistical Association, 43(244), 572 - 574.
Gelman, A. B., Carlin, J. S., Stern, H. S., and Rubin, D. B. (1995), Bayesian Data Analysis, London; New York: Chapman and Hall.
Lange, K. (1999), Numerical Analysis for Statisticians, New York: Springer-Verlag.
Neller, T., and Presser, C. (2004), “Optimal play of the dice game Pig,” The UMAP Journal, 25(1), 25 - 47.
O’Hagan, A. (1994), Kendall’s Advanced Theory of Statistics: Bayesian Inference, New York; Toronto: Halsted Press.
Shi, Y. (2000), “The game PIG: Making decisions based on mathematical thinking,” Teaching Mathematics and its Applications, 19(1), 30 - 34.
John C. Kern II
Department of Mathematics and Computer Science
Duquesne University
Pittsburg, PA 15282
U.S.A.
kern@mathcs.duq.edu
Volume 14 (2006) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications