Abstract:
|
Music is used extensively to evoke emotion throughout the customer journey. This paper develops a theory-based, interpretable deep learning convolutional neural network (CNN) classifier---MusicEmoCNN---to predict the dynamically varying emotional response to music. We first transform the raw music data into a format that accounts for human auditory response as the input into a CNN. Next, we design and construct novel CNN filters for higher-order music features that are based on the physics of sound waves and associated with perceptual features of music, like consonance and dissonance, which are known to impact emotion. The key advantage of our theory-based filters is that we can connect how the predicted emotional response (valence and arousal) are related to human interpretable features of the music. Our model outperforms traditional machine learning models and performs comparably with state-of-the-art black-box deep learning CNN models. Finally, we use our model in an application involving digital advertising. Motivated by YouTube’s mid-roll advertising, we use the model's predictions to identify optimal emotion-based ad insertion positions in videos in terms of ad memorability.
|