We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.