NAME: U.S. Senate Votes on Clinton Removal TYPE: Census SIZE: 100 observations (Senators), 10 variables DESCRIPTIVE ABSTRACT: For each U.S. Senator, his or her votes on whether to remove President Clinton on each of the two articles of impeachment (plus a summary variable representing each Senator's number of "guilty" votes) are provided, as well as each Senator's values on several variables that could be predictive of vote (e.g., Senator's degree of conservatism, how well Clinton did in the Senator's state in the 1996 Presidential election). SOURCES: Senators' votes on removal were obtained from the _USA Today_ website (http://usatoday.com/news/index/clinton/senvote2.htm) Senators' degree of conservatism was obtained from ratings issued by the American Conservative Union (http://www.conservative.org/new_ratings/1997/97senate-preview.htm) Other information needed to create the variables (Senators' party, when first elected, when up for re-election, Clinton's percentages in Senators' states in 1996) can be obtained from numerous political almanacs and general almanacs put out annually by many news organizations. This information is also available on several websites. One that contains most of this information is http://www.vote-smart.org VARIABLE DESCRIPTIONS: Columns 1 - 8 Name of senator 10 - 11 State (postal code) 13 Vote on Article I, Perjury: 0 = Not Guilty, 1 = Guilty 15 Vote on Article II, Obstruction of Justice: 0 = NG, 1 = G 17 Number of votes for guilt 19 Party: 0 = Democrat, 1 = Republican 21 - 23 Senator's degree of ideological conservativism (0-100) 25 - 26 Percent of the vote Clinton received in the 1996 Presidential election in each state 28 - 31 The year each Senator's seat is up and he/she must run for re-election (or retire) 33 First-term senator? 0 = no, 1 = yes SPECIAL NOTES: Name of Senator was limited to eight characters, so some names are cut off. Also, because multiple Senators often have the same or similar last names, nicknames were sometimes created to avoid confusion. For example, there is both a Tim Hutchinson and a Kay Bailey Hutchison; the former is referred to as "timhutch" and the latter as "kaybhut." Each Senator's degree of ideological conservativism is based on 1997 voting records as judged by the American Conservative Union (see SOURCES above), where 100 is most conservative. For Senators who were first elected in November 1998, I came up with various substitutions to give them an ideology score, to avoid missing data. Contact me for details, if interested. STORY BEHIND THE DATA: On February 12, 1999, for only the second time in the nation's history, the U.S. Senate voted on whether to remove a President, based on impeachment articles passed by the U.S. House. Dozens of political talk shows featured analyses of why Senators may have voted the way they did, but such discourse was rarely (if ever) informed by systematic statistical analysis of the votes. This dataset allows for such analysis. Further, the magnitude of this event should ensure that classroom students have some familiarity with it, making the dataset a nice one for illustrating statistical principles. PEDAGOGICAL NOTES: These data can be used to illustrate both advanced and introductory types of statistical analyses. In terms of advanced techniques, the main approach would be to use multiple variables to predict Senators' votes on each of the two counts. Given the dichotomous nature of the vote variables, you would run a logistic regression for the vote on each count, one analysis with Article I as the dependent variable, and one with Article II (however, a logistic regression for Article II reveals a "perfect fit" when the conservatism score is used as one of the predictors). The "number of guilty votes" is an ordinal variable (0, 1, or 2) which might be used for illustrating logistic regression for an ordinal response. Another important concept for any type of multiple regression technique is multicollinearity, namely that when two or more predictor variables are highly correlated with each other, this can make the estimates and tests for individual coefficients very unstable. In the present dataset, the political party and conservative rating are correlated (r = .906) with each other, so it would be very questionable to use both as predictors in the same regression equation. As noted above, these data can also be used in teaching introductory statistics. Two-way cross-tabulations and the chi-square test can be used for categorical variables, such as "party" by the vote on Article 1, Perjury. Relationships between quantitative and categorical variables can also be illustrated, such as by comparing conservatism in Democrats versus Republicans. This could be done either graphically by plotting the frequency distribution of the quantitative (conservative) variable on the same scale of magnitude separately by groups of the categorical variable (party), or statistically with an independent samples t-test. Finally, broader statistical issues can also be addressed in class discussions, such as the difference between a sample and a population. Some might argue that because the entire population of U.S. Senators was studied, there would be no need for significance tests that use sample statistics to make inferences for the larger population. SUBMITTED BY: Alan Reifman Department of Human Development and Family Studies College of Human Sciences Texas Tech University Lubbock, TX 79409-1162 AReifman@hs.ttu.edu