Archives Conversation 2 - Committee on Professional Ethics

Conversations on Ethical Topics #2:
Bayesian Considerations in the Ethical Guidelines

You are invited to read and, if interested, participate in this moderated conversation. Please send your contributions in the form of an e-mail to the Ethics Committee Chair, specifying in the subject line of your message the conversation you wish to contribute to.

Resolution of Wording Issues in the Draft ASA Ethical Guidelines for Statistical Practice
between Jay Kadane representing the Section on Bayesian Analysis
and John Gardenier representing the Committee on Professional Ethics

Reported to the ASA Board of Directors
by John Gardenier, and confirmed by Jay Kadane, August 4, 1999

This last-minute discussion arose from some Bayesians' concerns that opposing lawyers might discredit the ethics of Bayesian expert witnesses unfairly by citing elements of the EGSP to claim that only frequentist methods are ethically acceptable. Several of the points of contention required extensive analysis and discussion to resolve.

II.A.1. As most recently written, the first sentence said, "Strive for practical significance, not just statistical significance." The change we agreed to says simply, "Strive for practical relevance." The problem was that the previous wording implied that one should AT LEAST show "statistical significance," and one really should "go beyond that" to achieve practical significance. All of this implies acceptance of the principles of frequentist hypothesis testing, which Bayesians may reject and which even thoughtful frequentists approach with considerable caution. Insofar as Bayesians may make no effort to produce what seems to them an illusory "statistical significance," the only shared concern is for the practical relevance of the statistical result.

A corresponding change was then needed in the final clause, which now reads, "to justify both the practical relevance (not "significance") of the study and the amount of data (not "sample size") to be used."

Another issue is Tukey's admonition to become familiar with the nature of the data before attempting definitive analyses of it. We do not want to preclude exploratory data analysis!

The form of II.A.1. to which we agreed now reads:

A.1. Strive for practical relevance. Typically, each study should be based on a competent understanding of the subject matter issues, statistical protocols that are clearly defined for the stage (exploratory, intermediate, or final) of analysis before looking at those data which will be decisive for that stage, and technical criteria to justify both the practical relevance of the study and the amount of data to be used.

The most difficult issue in the entire guidelines is II.A.2. In terms of philosophy of science, this embodies the heart of Karl Popper's "falsifiability" principle. We want not only to outlaw deliberate deception of others through sneaky manipulation of data and methodology; we also want to minimize the ever-present potential for self-deception on the part of the investigator. To do so, we have to recognize that all investigators inherently bring to a study some predisposition toward some characteristic of the result. They may want to support a cherished hypothesis, reject the cherished hypothesis of an intellectual rival, or enhance their potential of winning a Noble prize (or at least getting published.) This basic tendency is neither ethical nor unethical; it is simply part of what we are as human beings. Many generations of scientists have recognized that the necessary defense against this tendency is rigorous discipline in observation and analysis. Books have been written on this subject; how are we to express it in just a few words?

The most recent attempt stated, "Guard against possible technically inappropriate bias on the part of the investigator or the data provider(s). Employ data selection or sampling methods and analytic approaches which are designed to assure valid analyses in either frequentist or Bayesian approaches." "Technically inappropriate bias" had replaced "subjective bias" in an attempt to forestall criticism of Bayesian subjective priors. The problem was that "technically inappropriate" does not really define anything. Furthermore, there have been previous criticisms of the use of the term "bias" because there are numerous types of technical bias, all of which deserve appropriate controls. Singling out just one form of bias appears to some statisticians to slight a host of important technical issues.

The best resolution we could come to after a comprehensive review of the complex issues involved is the following:

A.2. Guard against the possibility that a predisposition by investigators or data providers might predetermine the analytic result. Employ data selection or sampling methods and analytic approaches which are designed to assure valid analyses in either frequentist or Bayesian approaches.

This has the advantages that it thoroughly protects both frequentist and Bayesian thinking while expressing the essential core of the falsifiability principle. I was tempted to go further by guarding against anything that "could either predetermine the analytic result or, conversely, preclude potentially valid results that may be unexpected or not wanted." Jay wisely pointed out a trap implied by that wording; any reasonable study design inherently limits the type and range of results which are feasible. It then follows that any study must, in some sense or scope, "preclude" some "potentially valid results." Yes, that could occur through unethical study design, but it will necessarily also occur through honest and competent focus on a specific research issue. We cannot state this additional aspect broadly as an ethical obligation. At least we cannot do so with any conciseness comparable to the rest of the document.

In thoughtful pursuit of the crucial issues in A.1. and A.2., Jay consulted with other faculty at Carnegie Mellon, including philosophers and computer scientists. They pointed out that there are promising avenues of research in artificial intelligence applied to statistics which are threatened by the recent II.A.7. statement, "Recognize that automated statistical computation alone does not constitute adequate statistical analysis; . . ." They recommended a rewording which is admittedly more correct:

A.7. The fact that a procedure is automated does not ensure its correctness or appropriateness; it is also necessary to understand the theory, the data, and the methods used in each statistical study. This goal is served best when a competent statistical practitioner is included early in the research design, preferably in the planning stage.

II.A.8. was relatively noncontroversial, except: . . .the term "statistical test" can be interpreted as a generic preference for frequentist over Bayesian approaches. The problem is easily cured by inserting the word "frequentist" before "statistical test." With this one minor addition the paragraph explanation of A.8. stands as originally submitted.

II.B.5. replicated the undefinable "technically inappropriate bias" discussed in II.A.2. above. The corresponding correction is:

B.5. Apply statistical sampling and analysis procedures scientifically, without predetermining the outcome.

Similarly, "technically inappropriate bias" appeared in II.C.8. It was easily eliminated by shortening the first sentence, which now reads:

C.8. Clearly and fully report the steps taken to guard validity. Address the suitability of the analytic methods and their inherent assumptions relative to the circumstances of the specific study. Identify the computer routines used to implement the analytic methods.

Although not specifically a Bayesian issue, Jay noted in II.C.11. that the words ", both random and systematic" after "possible sources of error" open questions they cannot answer. Eliminating them results in:

C.11. Report the limits of statistical inference of the study and possible sources of error. For example, disclose any significant failure to follow through fully on an agreed sampling or analytic plan and explain any resulting adverse consequences.

II.E.11. contained the subtle trap of not allowing any estimates or approximations in teaming situations where a fully valid statistical result might not be available when needed. A slight relaxation of this guideline results in:

E.11. Avoid compromising statistical validity for expediency, but use reasonable approximations as appropriate.

Again not specifically a Bayesian issue, Jay raised three problems with the anti-discrimination guideline, II.F.6. He correctly pointed out that there is an unnecessary overstatement in saying that "professional qualifications and the contributions of the individual" should be "the" (implicitly only) "basis for decisions . . ." Apart from inappropriate discrimination, a boss might reasonably reject a professionally well qualified individual on the basis of an obnoxious or grossly uncooperative personality, a demonstrated lack of supervisory ability, or other characteristics apart from either "professional qualifications" or issues of discrimination. Jay worried also that the wording "Strive to avoid . . ." (discrimination) might present too easy an "out." I countered that other commenters have rightly pointed out that many statistical practitioners lack the power in their organizations to ensure that they can assuredly "Avoid" harassment or discrimination. Certain government-funded work may be legally restricted to those of certain nationalities or may reflect (discriminatory) affirmative action policies. Some professional awards are specifically oriented to those of particular races or genders. We compromised on "Avoid as best you can . . ." Finally, Jay pointed out that we cannot ethically advocate only nondiscrimination against "statistical practitioners;" we have to apply the same principle to secretaries, mail room orderlies, and others whose careers we control or influence. The result:

F.6. Use professional qualifications and the contributions of the individual as an important basis for decisions regarding statistical practitioners' hiring, firing, promotion, work assignments, publications and presentations, candidacy for offices and awards, funding or approval of research, and other professional matters. Avoid as best you can harassment of or discrimination against statistical practitioners (or anyone else) on professionally irrelevant bases such as race, color, ethnicity, sex, sexual orientation, national origin, age, religion, nationality, or disability.

Lastly, the draft II.H.2. showed evidence of good intentions but faulty reasoning in stating, "Valid findings can result only from competent work in a moral environment." The associated issues are deep and complex; they are best left to the next revision under William Seltzer, which will more deeply address ethical issues with governments and other employers. For now, the moral intent is retained and the logic is less controversial with the following simplification:

H.2. Valid findings result from competent work in a moral environment. Pressure on a statistical practitioner to deviate from these guidelines is likely to damage both the validity of study results and the professional credibility of the practitioner.

Conversations on Ethical Topics #2: Bayesian Considerations in the Ethical Guidelines

Conversations on Ethical Topics #2:
Bayesian Considerations in the Ethical Guidelines