Numerous empirical studies have found that integrative analysis of multimodal data can result in better statistical performance. However, little theory is known on when and why including more variables in a statistical model can improve the prediction. In the context of two-class classification, we provide a theoretical guarantee that running an integrative linear discriminant analysis on multimodal data achieves smaller misclassification error than running linear discriminant analysis on each individual data type. We explicitly characterize the trade-off between the extra information brought by multimodal data and the extra estimation error they bring. We also demonstrate that such a guarantee also applies to some other classifiers. In addition, we address issues of outliers and block missing values that frequently occur in multimodal data.