Non-stationary, anisotropic spatial processes are often used when modelling, analysing and predicting complex environmental phenomena. One such class of processes considers a stationary, isotropic process on a warped spatial domain. The warping function is generally difficult to fit and not constrained to be bijective, often resulting in 'space-folding.' Here, we propose modelling a bijective warping function through a composition of multiple elemental bijective functions in a deep-learning framework. The model bears several similarities to deep neural net models used for regression and classification and, crucially, ensures that there is no space-folding by construction. We discuss inference when the resulting deep spatial model is both a non-stationary Gaussian process, as well as a general non-Gaussian process composed from a set of processes that are conditionally Gaussian. Through experiments in one and two dimensions we show that the deep compositional spatial models are quick to fit, and are able to provide better predictions and uncertainty quantification than other deep stochastic models of similar complexity, on both univariate and multivariate data.