Image-on-image regression analysis, using images to predict images, is a challenging task, due to the high dimensionality and the complex spatial dependence structures in image predictors and image outcomes. In this work, we propose a novel spatial Bayesian latent factor model for image-on-image regression, where low-dimensional latent factors are adopted to make connections between high-dimensional image outcomes and image predictors. We assign Gaussian process priors to the spatially-varying regression coefficients in the model, which can well capture the complex spatial dependence among image outcomes as well as that among the image predictors. We perform simulations to evaluate the out-of-sample prediction performance of our method compared with linear regression and voxel-wise regression methods for different scenarios. The proposed method achieves better prediction accuracy by effectively accounting for the spatial dependence and efficiently reduces image dimensions with latent factors. We apply the proposed method to analysis of multimodal image data in the Human Connectome Project where we predict task-related contrast maps using sub-cortical volumetric seed maps.