For tumor samples, studying tumor, stromal, and immune cells is crucial to understanding the complex tumor microenvironment. Methods performing digital deconvolution are used, but few datasets exist where the proportion and gene expression of individual cell types are known. To address this, we have devised a procedure to create benchmarking data with which to compare and evaluate deconvolution methods. The model was trained with pancreatic gene expression data from The Cancer Genome Atlas, generating multivariate normal gene expression with realistic correlation structure for each cell type. Alterable properties of our model include disease-specific information, similarity of the cell-type gene expression, noise, and sample size. We used positive and negative controls to create deconvolution scenarios, including ones similar to real tumor gene expression data. With this design, we can simulate gene expression data representing mixtures of multiple cell types with different variances and gene expressions. This in-silico design for the simulation of data with varying properties can be used to help evaluate existing methods and develop new ones for studying tumor heterogeneity.