A few months ago my friend and neighbor Olav was fiddling around with a dataset of movie plot descriptions he downloaded from the Internet Movie Database (IMDb). If I recall correctly, he was taking a stab at the Netflix Prize. We discussed this for a while over coffee, but (as usual) our conversations were all over the place; and somewhere along the line we wondered what songs are used most often in movies.
What is that song they always play? The one that goes like ‘#dun dun dun dun dudun dun dun duuuuun#’. You know?
The IMDb site offers lots of different datasets for download, and we quickly found that one of them contains soundtrack listings (the aptly named file soundtracks.list.gz). Now it was just a matter of filtering out the unnecessary contextual data and counting songs. Quickly Olav, who does datamining for a living, managed to get all this done using spiffy point-and-click tools. I proceeded to ask twitter what people thought the answer would be.
The top five results turned out to be a collection of classics. The songs played in movies (according to the IMDb data) is as follows.
- “Jingle Bells” (220x)
- “William Tell Overture” (204x)
- “Home Sweet Home” (160x)
- “Auld Lang Syne” (149x)
- “Rock-a-Bye Baby” (140x) Not at all what we were expecting, but quite obvious when you think about how many Christmas movies are out there. Data mining is very often like that. You find answers that were unexpected, but also unsurprisingly obvious.
It’s the same song, but it never gets old.
[Much later, a friend (can’t remember exactly who) noted that the song that is played most often in theaters is probably not listed in the data set the IMDb provides. It’s the 20th Century Fox intro.]