Online Program Home
My Program

Abstract Details

Activity Number: 529 - Regression Trees and Random Forests
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329336
Title: Assessing Authorship of Beatles Songs from Musical Content: Bayesian Classification Modeling from Bags-Of-Words Representations
Author(s): Mark Glickman* and Jason Brown and Ryan Song
Companies: Harvard University and Dept of Mathematics, Dalhousie University and School of Engineering and Applied Science, Harvard University
Keywords: bag-of-words; Bayes; multinomial; stylometry; music

The songwriting duo of John Lennon and Paul McCartney, the two founding members of the Beatles, have arguably written some of the most popular and memorable songs of the last century. Despite having written songs under the joint credit agreement of Lennon-McCartney, it is well-documented that most of their songs or portions of songs were primarily written by only one of the two. In some cases, the authorship of Lennon-McCartney songs is in dispute. For Lennon-McCartney songs of known and unknown authorship written and recorded over the period 1962-1966, we constructed five different bags-of-words representations of each song or song portion: unigrams of melodic notes, unigrams of chords, bigrams of melodic note pairs, bigrams of chord changes, and four-note melody contours. We developed a Bayesian model for dependent bags-of-words representations for classification. The model assumes correlated multinomial counts for the bags-of-words as a function of authorship which is then inverted using Bayes rule. Out-of-sample classification accuracy for songs with known authorship was 80%. We demonstrate the results to songs during the study period with unknown authorship.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program