Propensity score methods are the gold standard for comparative analysis between two interventions from observational data, yet less well established for comparisons among three or more interventions. Conceptually the extensions of propensity scoring are sound, but practical issues make such difficult. Some researchers will resort to multiple regression or multiple pairwise comparisons. In pairwise analyses the common covariate support in each analysis may differ, complicating attempts to draw simultaneous inferences across all interventions. Also, when baseline covariate imbalances are large, regression methods rely on extrapolation and are highly sensitive to model assumptions. Yang et al (2016) developed a generalized propensity score (GPS) procedure that avoids the computational complexities involved when matching in many dimensions. However, practical issues remain, such as reliable variance estimation and identifying an optimal common covariate support. It is also unclear in which settings GPS matching may be superior to other methods. We will discuss recent research and remaining gaps to move toward best practices for observational analyses with three or more interventions.