Conversations
on Ethical Topics #2:
Bayesian Considerations in the Ethical Guidelines
-
You are invited
to read and, if interested, participate in this moderated conversation.
Please send your contributions in the form of an e-mail to
the Ethics Committee Chair,
specifying in the subject line of your message the conversation you wish
to contribute to.
Resolution of Wording Issues in the Draft
ASA Ethical Guidelines for Statistical Practice
between Jay Kadane representing the Section on
Bayesian Analysis
and John Gardenier representing the Committee
on Professional Ethics
Reported to the ASA Board of Directors
by John Gardenier, and confirmed by Jay Kadane,
August 4, 1999
This last-minute discussion arose from some Bayesians' concerns that
opposing lawyers might discredit the ethics of Bayesian expert witnesses
unfairly by citing elements of the EGSP to claim that only frequentist
methods are ethically acceptable. Several of the points of contention required
extensive analysis and discussion to resolve.
II.A.1. As most recently written, the first sentence said, "Strive
for practical significance, not just statistical significance." The change
we agreed to says simply, "Strive for practical relevance." The problem
was that the previous wording implied that one should AT LEAST show "statistical
significance," and one really should "go beyond that" to achieve practical
significance. All of this implies acceptance of the principles of frequentist
hypothesis testing, which Bayesians may reject and which even thoughtful
frequentists approach with considerable caution. Insofar as Bayesians may
make no effort to produce what seems to them an illusory "statistical significance,"
the only shared concern is for the practical relevance of the statistical
result.
A corresponding change was then needed in the final clause, which now
reads, "to justify both the practical relevance (not "significance") of
the study and the amount of data (not "sample size") to be used."
Another issue is Tukey's admonition to become familiar with the nature
of the data before attempting definitive analyses of it. We do not want
to preclude exploratory data analysis!
The form of II.A.1. to which we agreed now reads:
A.1. Strive for
practical relevance. Typically, each study should be based on a competent
understanding of the subject matter issues, statistical protocols that
are clearly defined for the stage (exploratory, intermediate, or final)
of analysis before looking at those data which will be decisive for that
stage, and technical criteria to justify both the practical relevance of
the study and the amount of data to be used.
The most difficult issue in the entire guidelines is II.A.2. In terms
of philosophy of science, this embodies the heart of Karl Popper's "falsifiability"
principle. We want not only to outlaw deliberate deception of others through
sneaky manipulation of data and methodology; we also want to minimize the
ever-present potential for self-deception on the part of the investigator.
To do so, we have to recognize that all investigators inherently bring
to a study some predisposition toward some characteristic of the result.
They may want to support a cherished hypothesis, reject the cherished hypothesis
of an intellectual rival, or enhance their potential of winning a Noble
prize (or at least getting published.) This basic tendency is neither ethical
nor unethical; it is simply part of what we are as human beings. Many generations
of scientists have recognized that the necessary defense against this tendency
is rigorous discipline in observation and analysis. Books have been written
on this subject; how are we to express it in just a few words?
The most recent attempt stated, "Guard against possible technically
inappropriate bias on the part of the investigator or the data provider(s).
Employ data selection or sampling methods and analytic approaches which
are designed to assure valid analyses in either frequentist or Bayesian
approaches." "Technically inappropriate bias" had replaced "subjective
bias" in an attempt to forestall criticism of Bayesian subjective priors.
The problem was that "technically inappropriate" does not really define
anything. Furthermore, there have been previous criticisms of the use of
the term "bias" because there are numerous types of technical bias, all
of which deserve appropriate controls. Singling out just one form of bias
appears to some statisticians to slight a host of important technical issues.
The best resolution we could come to after a comprehensive review of
the complex issues involved is the following:
A.2. Guard against
the possibility that a predisposition by investigators or data providers
might predetermine the analytic result. Employ data selection or sampling
methods and analytic approaches which are designed to assure valid analyses
in either frequentist or Bayesian approaches.
This has the advantages that it thoroughly protects both frequentist
and Bayesian thinking while expressing the essential core of the falsifiability
principle. I was tempted to go further by guarding against anything that
"could either predetermine the analytic result or, conversely, preclude
potentially valid results that may be unexpected or not wanted." Jay wisely
pointed out a trap implied by that wording; any reasonable study design
inherently limits the type and range of results which are feasible. It
then follows that any study must, in some sense or scope, "preclude" some
"potentially valid results." Yes, that could occur through unethical study
design, but it will necessarily also occur through honest and competent
focus on a specific research issue. We cannot state this additional aspect
broadly as an ethical obligation. At least we cannot do so with any conciseness
comparable to the rest of the document.
In thoughtful pursuit of the crucial issues in A.1. and A.2., Jay consulted
with other faculty at Carnegie Mellon, including philosophers and computer
scientists. They pointed out that there are promising avenues of research
in artificial intelligence applied to statistics which are threatened by
the recent II.A.7. statement, "Recognize that automated statistical computation
alone does not constitute adequate statistical analysis; . . ." They recommended
a rewording which is admittedly more correct:
A.7. The fact that
a procedure is automated does not ensure its correctness or appropriateness;
it is also necessary to understand the theory, the data, and the methods
used in each statistical study. This goal is served best when a competent
statistical practitioner is included early in the research design, preferably
in the planning stage.
II.A.8. was relatively noncontroversial, except: . . .the term "statistical
test" can be interpreted as a generic preference for frequentist over Bayesian
approaches. The problem is easily cured by inserting the word "frequentist"
before "statistical test." With this one minor addition the paragraph explanation
of A.8. stands as originally submitted.
II.B.5. replicated the undefinable "technically inappropriate bias"
discussed in II.A.2. above. The corresponding correction is:
B.5. Apply statistical
sampling and analysis procedures scientifically, without predetermining
the outcome.
Similarly, "technically inappropriate bias" appeared in II.C.8. It
was easily eliminated by shortening the first sentence, which now reads:
C.8. Clearly and
fully report the steps taken to guard validity. Address the suitability
of the analytic methods and their inherent assumptions relative to the
circumstances of the specific study. Identify the computer routines used
to implement the analytic methods.
Although not specifically a Bayesian issue, Jay noted in II.C.11. that
the words ", both random and systematic" after "possible sources of error"
open questions they cannot answer. Eliminating them results in:
C.11. Report the
limits of statistical inference of the study and possible sources of error.
For example, disclose any significant failure to follow through fully on
an agreed sampling or analytic plan and explain any resulting adverse consequences.
II.E.11. contained the subtle trap of not allowing any estimates or
approximations in teaming situations where a fully valid statistical result
might not be available when needed. A slight relaxation of this guideline
results in:
E.11. Avoid compromising
statistical validity for expediency, but use reasonable approximations
as appropriate.
Again not specifically a Bayesian issue, Jay raised three problems
with the anti-discrimination guideline, II.F.6. He correctly pointed out
that there is an unnecessary overstatement in saying that "professional
qualifications and the contributions of the individual" should be "the"
(implicitly only) "basis for decisions . . ." Apart from inappropriate
discrimination, a boss might reasonably reject a professionally well qualified
individual on the basis of an obnoxious or grossly uncooperative personality,
a demonstrated lack of supervisory ability, or other characteristics apart
from either "professional qualifications" or issues of discrimination.
Jay worried also that the wording "Strive to avoid . . ." (discrimination)
might present too easy an "out." I countered that other commenters have
rightly pointed out that many statistical practitioners lack the power
in their organizations to ensure that they can assuredly "Avoid" harassment
or discrimination. Certain government-funded work may be legally restricted
to those of certain nationalities or may reflect (discriminatory) affirmative
action policies. Some professional awards are specifically oriented to
those of particular races or genders. We compromised on "Avoid as best
you can . . ." Finally, Jay pointed out that we cannot ethically advocate
only nondiscrimination against "statistical practitioners;" we have to
apply the same principle to secretaries, mail room orderlies, and others
whose careers we control or influence. The result:
F.6. Use professional
qualifications and the contributions of the individual as an important
basis for decisions regarding statistical practitioners' hiring, firing,
promotion, work assignments, publications and presentations, candidacy
for offices and awards, funding or approval of research, and other professional
matters. Avoid as best you can harassment of or discrimination against
statistical practitioners (or anyone else) on professionally irrelevant
bases such as race, color, ethnicity, sex, sexual orientation, national
origin, age, religion, nationality, or disability.
Lastly, the draft II.H.2. showed evidence of good intentions but faulty
reasoning in stating, "Valid findings can result only from competent work
in a moral environment." The associated issues are deep and complex; they
are best left to the next revision under William Seltzer, which will more
deeply address ethical issues with governments and other employers. For
now, the moral intent is retained and the logic is less controversial with
the following simplification:
H.2. Valid findings
result from competent work in a moral environment. Pressure on a statistical
practitioner to deviate from these guidelines is likely to damage both
the validity of study results and the professional credibility of the practitioner.