Separation is a characteristic of data in which a linear combination of predictors perfectly discriminates the binary outcome. Because finite-valued maximum likelihood parameter estimates of a binary regression model do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Little focus has been given on whether and how to shrink the intercept parameter. Using classical studies of separation, we argue that efficiency in estimating regression coefficients may vary with the intercept prior. We adapt alternative prior distributions for the intercept that downweight implausibly extreme regions of the parameter space so as to be less sensitive to separation. Through simulation and the analysis of several exemplar datasets, we quantify differences across different priors stratified by established statistics measuring the degree of separation. Relative to diffuse priors, our recommendations result in more efficient estimation of the regression coefficients themselves when the data are nearly separated. They are equally efficient in non-separated datasets, making them suitable for default use.