With the increasing availability of both new large scale data sources as well as computational power, machine learning algorithms for the analysis of these data have seen widespread use and many refinements. In comparison to traditional modelling approaches in the social sciences, especially (generalized) regression models, these algorithms often allow for more flexible modelling strategies and often have additional desirable features, for example automatic variable selection or implicit specification of interaction effects.
Research questions in the social sciences usually aim at population inference for effects of certain variables of interest rather than prediction. We propose an approach adapting the idea of average marginal effects in order to draw inference from a variety of classical machine learning algorithms like e. g. Decision Trees, Naïve Bayes. We will provide results from a simulation study focusing on bias and variance of the different estimators. In addition we will present practical examples using large scale datasets from the Institute for Employment Research (Germany) to compare different modelling approaches for substantial social scientific research.
|