![IconGems-Print](images/IconGems-Print.png)
34 – Advances in Analysis of Categorical Data
On the Discovery and Use of Disease Risk Factors with Logistic Regression: New Prostate Cancer Risk Factors
David E. Booth
Kent State University
Venugopal Gopapalakrishna-Remani
The University of Texas at Tyler
Matthew Cooper
Washington University
Fiona R. Green
University of Manchester
Margaret P. Rayman
University of Surrey
We begin by arguing that the most commonly used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable. We then argue that there are other algorithms available that are much more stable and reliable(e.g. the lasso). We then propose a protocol for the discovery and use of disease risk factors using lasso variable selection with logistic regression followed by boosting. We then illustrate the use of the protocol with a set of prostate cancer data and show that it recovers known risk factors. Finally we use the protocol to identify new risk factors for prostate cancer.