Two-stage randomization is a powerful tool for learning about interference between units. Estimating these effects, however, is more complex when either one or both stages of the design are not random. While recent work has approached such observational studies using IPW and related estimators, these can have practical drawbacks and lack well-defined methods for sensitivity analysis. In this paper, we instead investigate both matching methods and alternative propensity-score based estimators. First, we clearly define the relevant estimands, which can differ in subtle ways both from fully randomized designs and from the estimands implied by IPW approaches. Second, we describe complications that arise when forming subclasses or matched samples, including computational complexity. Third, we explore methods to probe the sensitivity of our analysis to possible unmeasured biases. This is difficult because of the two-level data structure, which allows for confounding at both the individual and cluster levels. We assess these methods via simulation and on several canonical two-stage evaluations.