Shai Linn

School of Public Health, Haifa University

and Rambam Medical Center, Haifa, Israel

Journal of Statistics Education Volume 12, Number 3 (2004), ww2.amstat.org/publications/jse/v12n3/linn.html

Copyright © 2004 by Shai Linn, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

**Key Words:**Bayes' Theorem; Diagnosis; Predictive value

However, we typically do not have the information on this population because it is often unfeasible and unethical to perform both the diagnostic tests and an additional definitive test to determine the true diagnosis according to the gold standard (Sackett and Haynes, 2002). For example, using angiography as a gold standard for diagnosing cardiac ischemia by electrocardiographic changes is “not a very attractive alternative in terms of discomfort, risk and cost” (Sackett, Haynes, Guyatt, and Tugwell 1991, p. 101). Therefore, the PPV and NPV are calculated from the sensitivity and specificity and the prevalence of the disease in the target population, using Bayes’ Theorem. Thus, a presentation in one table (Table 1) for analyses in two populations may be pedagogically misleading. A new approach, using two tables (Table 2 and Table 3) instead of one table (Table 1) and specific notations for each table, is hereby proposed.

the results of applications of the clinical test to the patients (target) population.

Gold Standard | ||||

S+ | S- | Total | ||

Clinical Test | T+ | a=True Positive | b=False Positive | a+b |

T- | c=False Negative | d=True Negative | c+d | |

Total | a+c | b+d |

Note: The table demonstrates a misleading presentation in that all test characteristics are calculated in one table.

Sensitivity = a/a+c

Specificity = d/b+d

Positive Predictive Value (PPV) = a/a+b

Negative Predictive Value (NPV) = d/c+d

This may be misleading to many students for the following reasons:

- A presentation of the definition in one table fails to demonstrate the conceptual distinction of the “study
population” in which the test characteristics are determined, and the “patient (target) population” to which the test
is applied afterward to obtain the posterior probabilities. Thus, a presentation in a single table implies erroneously a
cross-sectional situation, and does not convey the sequence of analyses. Therefore, it is often not clear why one should
not calculate the PPV (or NPV) directly from Table 1.
- The need for using Bayes’ Theorem and prevalence to calculate PPV (or NPV) is not obvious from a presentation of
calculating sensitivity and PPV in one table. The presentation in Table 1 erroneously
implies simultaneous calculations in both axes of the table: the “vertical” disease axis AND the “horizontal” test-results
axis. In fact, calculation of the sensitivity and specificity is possible only when the “diseased and non-diseased persons
are sampled” (Baron 2001, p. 243), and direct evaluation of the PPV from
Table 1 would be “misleading” unless “the proportion of patients with the disease
in the study population equals the proportion of patients with the disease in the population in which the test is applied”
(Weinstein and Finberg 1980, p. 86-87). In practice, the analyses are
performed in two stages: first one uses a selective (study) population in which the sensitivity and specificity are
calculated; then, Bayes’ Theorem and the prevalence are used, together with the sensitivity and specificity, to calculate
PPV or NPV (Hirsch and Riegelman 1996;
Greenberg, Daniels, Flanders, Eley and Boring 2001).
- A presentation of the two stages in one table may be difficult for most students to comprehend. It should be emphasized that Bayes’ Theorem enables students to comprehend the interrelationships between the prevalence, sensitivity and specificity, a fundamental characteristic of clinical epidemiology. When predictive values are calculated from the sensitivity and specificity, their magnitude depends on the prevalence of the disease. Failure to understand the two stages of analyses would not let students appreciate the importance of the prevalence of the disease, in calculations based on Bayes’ Theorem.

Finally, using one table to teach diagnostic test characteristics often makes the definitions of rates unclear (Riffenburgh 1993; Weinstein and Finberg 1980; Hirsch and Riegelman 1996) . Is the “true positive rate” referring to the sensitivity (as often defined)? Or is it referring to the predictive value (as often understood by students, physicians or patients)? Analogous considerations are valid for true negative rates, false positive rates and false negative rates. Because of this confusion, Hirsch and Riegelman (1996, pp 11-12) recommended not using these terms at all. We offer a way to overcome these difficulties by presenting the analyses in two tables.

We use S to denote sickness rather than D which could have been used as an acronym for diseased, because of the use of the letter D in the tables.

Gold Standard | ||||

S+ | S- | |||

Clinical Test | T+ | a=True Positive | b=False Positive | |

T- | c=False Negative | d=True Negative | ||

Total | a+c | b+d |

Note: The table demonstrates a more appropriate presentation for the study (selected) population.

Sensitivity = a/a+c

Specificity = d/b+d

*fpr* = b/b+d

*fnr* = c/a+c

A second table with uppercase notations (Table 3) is used to explain the predictive
values among the target population in which the test would be applied for screening or clinical diagnosis. Sampling of this
population is done horizontally, i.e., those with positive and negative tests. Thus, it is appropriate to have *A+B*
and *C+D* as totals in this table if a physician monitors the success of the clinical test by ascertaining the disease
status (the gold standard status) of persons with positive and/or negative tests results. However, it is inappropriate to
have totals of the “vertical” axis of test results.
Thus, we define the Positive Predictive Value (PPV) and the Negative Predictive Value (NPV) as follows:

Gold Standard | ||||

S+ | S- | Total | ||

Clinical Test | T+ | A=True Positive | B=False Positive | A+B |

T- | C=False Negative | D=True Negative | C+D |

Note: The table demonstrates a more appropriate presentation for the patient target population.

Positive Predictive Value (PPV) = A/A+B

Negative Predictive Value (NPV) = D/C+D

False Positive Rate (FPR) = B/A+B

False Negative Rate (FNR) = C/C+D

It is now obvious that the translation of information on sensitivity and specificity to PPV or NPV must be done by using Bayes’ Theorem and the prevalence P(S+).

Positive Predictive Value, PPV

Similarly, Negative Predictive Value, NPV

When the diseased and non-diseased are sampled, in a case control study, the definitions are:

False positive rate among persons without the disease is

i.e., *fpr=1-specificity*

and

False negative rate among persons with the disease is

i.e., *fnr=1-sensitivity*.

These definitions of the *fpr* and *fnr*, which are based on Table 2,
appear in most of the above-mentioned textbooks.

Following Fleiss (1981), we can define these measures of interest in the general
patient population (Table 3), using uppercase notations:

False positive rate among persons with a positive test is

i.e., *FPR=1-PPV*.

This statistic indicates the rate of non-diseased persons who would erroneously be classified as having the disease by the clinical diagnostic test.

Clearly, using Bayes’ Theorem:

Similarly,

False negative rate among persons with a negative test is

ie., *FNR=1-NPV*.

This statistic indicates the rate of diseased persons who would erroneously be classified as not having th disease by the clinical diagnostic test.

Clearly, using Bayes' Theorem

Thus, the two-table presentation enables clear pedagogical distinction of the definitions of error rates in the two
different populations, the selected case-control study population (Table 2),
i.e., *fpr* and *fnr*, and the target population (Table 3), i.e.,
FPR and FNR.

Final dagnosis by pathology, the Gold Standard | ||||

Skin cancer S+ | No skin cancer S- | |||

Clinical Test | Diagnosis of skin cancer T+ | 63 | 6 | |

No diagnosis of skin cancer T- | 10 | 112 | ||

Total | 73 | 118 |

The data
for this study indicate a sensitivity of 86.3%, a specificity of 94.9%, a *fpr* of 5.1% and a *fnr* of 13.7%.
However, PPV, or NPV and the error rates in the general population cannot be calculated from Table 4.
Such erroneous estimates would apply to the physician study population alone, and would yield uninformative (and misleading)
PPV of 91.3%, NPV of 91.8%, FPR of 8.7% and FNR of 8.2%. Such a single-table presentation would be misleading, because it
is incorrect to calculate the PPV and NPV of clinical examinations in the general population from these data. Rather, based
on the sensitivity and specificity, a national prevalence of skin cancer of, say, 0.08%, and Bayes’ Theorem, the calculated
PPV would be approximately 13.407%, quite different from the PPV for the physician in a dermatology clinic. This
discrepancy occurs because of the low prevalence of the disease in the general population. Similar calculations would yield
NPV of 99.98845%, FPR of 98.6593% and FNR of 0.001096%. The data for the general population could be reconstructed by
first determining the margins according to the prevalence, i.e., 8 patients with melanoma for 10000 persons in the general
population. Then, the sensitivity and specificity can be used to yield Table 5, which
is the correct presentation for the general population (because of rounding to integers in constructing the table, direct
calculations from Table 5 would yield estimates slightly different from the above
calculations, based on Bayes’ Theorem).

Final diagnosis by pathology, the Gold Standard | ||||

Skin cancer S+ | No skin cancer S- | Total | ||

Clinical Test | Diagnosis of skin cancer T+ | 7 | 510 | 517 |

No diagnosis of skin cancer T- | 1 | 9482 | 9483 | |

Calculated margins | 8 | 9992 | 10000 |

The prevalence is 0.08%, thus we expect 8 patients (a rounded number) with melanoma and 9992 healthy persons in 10000 persons.

Using a sensitivity of 86.3%, we calculate A=7 (0.863*8).

Using a specificity of 94.9%, we calculate D=9482 (0.949*9992).

As has been mentioned above, most textbooks present both the sensitivity and specificity or the PPV or NPV in a single table. Moreover, some would prefer, pedagogically, to begin with a simpler one 2X2 table and then proceed on to a more conceptually correct - but perhaps more complex - two 2X2 table presentation. It is suggested using a two-table presentation for advanced students, or including a transition from a one-table to a two-table presentation even if one begins teaching using a simple one table. Eventually, using two tables to describe diagnostic test characteristics is, in our experience, pedagogically and conceptually more acceptable to students.

Using the two tables and the derived equations demonstrates clearly the use of Bayes’ Theorem, test characteristics (the sensitivity and specificity) and the prevalence to calculate PPV. It is more obvious that the analyses are done in two stages, for two different populations: the selected study population and the target population. This approach makes it easier to discuss and define two different types of false negative rates and false positive rates in the two populations.

P(T-) = probability of the diagnostic test being negative

P(S+) = probability of the disease, i.e., the prevalence

P(S-) = probability of no disease, i.e., 1-prevalence

vertical line ( | ) stands for "given that"

Altman D.G. (1991), *Practical statistics for medical research*, Chapman & Hall London.

Baron, J.A. (2001), "Clinical epidemiology," in *Teaching Epidemiology* eds. Olsen J., Saracci R., and Trichopoulos D.,
Oxford: Oxford University Press, pp. 237-249.

Beaglehole, R., Bonita, R., and Kjellstrom, T. (1993), *Basic Epidemiology*, Geneva: World Health Organization.

Bhopal, R. (2002), *Concepts of Epidemiology*, Oxford: Oxford University Press.

Bradley, G.W. (1993), *Disease Diagnosis and Decision*, New York: John Wiley & Sons.

Dawson, B., and Trapp, R.G. (1994), *Basic and Clinical Biostatistics*, New York: Lange–McGraw-Hill.

Dawson, B., and Trapp, R.G. (2001), *Basic and Clinical Biostatistics*, New York: Lange Medical Books-McGraw Hill.

Greenberg, R.S., Daniels, S.R., Flanders, W.D., Eley, J.W., and Boring, J.R. (2001), *Medical Epidemiology*,
London: Lange-McGraw-Hill.

Essex-Sorlie, D. (1995), *Medical Biostatistics and Epidemiology*, New York: Appleton & Lange/McGraw Hill.

Fleiss, J.L. (1981), *Statistical Methods for Rates and Proportions (2nd ed.)*, New York: John Wiley & Sons.

Hirsch, R.P., and Riegelman R.K. (1996), *Statistical Operations*, Oxford: Blackwell Science.

Jenicek, M. (1995), *The Logic of Modern Medicine*, Montreal: EPIDEM International.

Kraemer, H.C. (1992), *Evaluation of Medical Tests: Objective and quantitative guidelines*, London: Sage Publications.

Pepe, M. S. (2003), *The Statistical Evaluation of Medical Tests for Classification and Prediction*,
Oxford Statistical Science Series 28, Oxford: Oxford University Press.

Riegelman, R.K. (2000), *Studying a Study and Testing a Test*, Philadelphia: Lippincott Williams & Wilkins.

Riffenburgh, R.H. (1993), *Statistics in Medicine*, San Diego: Academic Press.

Sackett, D.L., Haynes, R.B., Guyatt, G.H., and Tugwell, P. (1991), *Clinical Epidemiology (2nd ed.)*,
Boston: Little Brown & Company.

Sackett, D., and Haynes, R.B. (2002), "The Architecture of Diagnostic Research," in *The Evidence Base of Clinical
Diagnosis*. ed. J.A. Knottnerus, London: BMJ Publishing.

Silva, S.I. (1999), *Cancer Epidemiology: Principles and Methods*. Geneva: International Agency for Research on Cancer,
World Health Organization.

Sox, H.C., Blatt, M.A., Higgins, M.C., and Marton K.I. (1988), *Medical Decision Making*, Boston: Butterworth-Heinemann.

Wassertheil, S. (1995), *Biostatistics and Epidemiology*, New York: Springer-Verlag.

Weinstein, M.C., and Finberg, H.V. (1980), *Clinical Decision Analysis*, Philadelphia: W.B. Saunders Co.

Weiss, N.S. (1996), *Clinical Epidemiology*, Oxford: Oxford University Press.

Shai Linn

School of Public Health

Faculty of Welfare and Health Studies

Haifa Univeristy

and Unit of Clinical Epidemiology,

Rambam Medical Center

Haifa

Israel
*slinn@univ.haifa.ac.il*

Volume 12 (2004) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications