NAME: Pricing the C's of Diamond Stones TYPE: Observational Regression Analysis Data SIZE: 308 observations, 5 variables DESCRIPTIVE ABSTRACT: The objective is to infer a sensible pricing model for diamond stones based on data pertaining to their weight (in carats), their colour (either D, E, F, G, H or I) and clarity (either IF, VVS1, VVS2, VS1 or VS2). Of interest is the relative worth of the different grades of colour and clarity and whether differences in prices can be attributed to the 3 different certification bodies (either GIA, IGI or HRD). SOURCE: The data appeared in Singapore's _Business Times_ edition of February 18, 2000. VARIABLE DESCRIPTIONS: Dataset 4c.dat Columns 1 - 4 Carat - Weight of diamond stones in carat units 6 Colour - D, E, F, G, H or I 8 - 11 Clarity - IF, VVS1, VVS2, VS1 or VS2 13 - 15 Certification Body - GIA, IGI or HRD 18 - 21 Price (Singapore $) Dataset 4c1.dat Columns 1 - 4 Carat - Weight of diamond stones in carat units 6 Indicator for colour D 8 Indicator for colour E 10 Indicator for colour F 12 Indicator for colour G 14 Indicator for colour H 16 Indicator for clarity IF 18 Indicator for clarity VVS1 20 Indicator for clarity VVS2 22 Indicator for clarity VS1 24 Indicator for certification body GIA 26 Indicator for certification body IGI 28 Indicator for medium stones between 0.5 to less than 1 carat 30 Indicator for large stones weighing 1 carat or more 32 - 35 Interaction variable med*carat 37 - 40 Interaction variable large*carat 42 - 48 Carat squared 50 - 53 Price (Singapore $) 55 - 65 Ln(Price) Values are aligned and delimited by blanks. There are no missing values. STORY BEHIND THE DATA: Assessing the worth of a diamond stone is no easy task in view of the four C's, namely caratage, colour, clarity and cut. Statistics offers an avenue to infer the pricing of these characteristics. Additional information about these data can be found in the "Datasets and Stories" article "Pricing the C's of Diamond Stones " in the _Journal of Statistics Education_ (Chu 2001). PEDAGOGICAL NOTES: Multiple Linear Regression is employed to construct a pricing model. Of note in this dataset are the presence of nominal, ordinal as well as quantitative data. Operationally, the challenge is how to code the nominal and ordinal data in order to proceed with the analysis. A dataset containing the suggested codes is made available as 4C1.dat. The exercise demonstrates the plausibility of at least 2 models. SUBMITTED BY: Singfat Chu Faculty of Business Administration National University of Singapore 10 Kent Ridge Crescent Singapore 119260 fbachucl@nus.edu.sg