Friday, September 12, 2008

Real Genius 

So I have been fanatically thinking about Apple's new Genius playlist and recommendation technology and here's what I think is happening.

A little background:

About a year a go, Apple quietly applied for a patent on “fuzzy string matching of media meta-data” (info available here, patent available here) which indicated that they were working on ways to take text information about a specific song and match it against a larger database. (In the patent they even mention All Media Guide as an example of a large music metadata database).

This got me to thinking that Apple was likely going to try to do text matching of someone's music file to a larger data set in order to offer some kind of recommendation on purchases from the iTunes Music Store, but this would also be very handy in trying to determine what music someone has in their library and link it to this master database.


What's happening in Genius

(Please note: a good chunk of this is speculation on my part)

Once you've loaded the iTunes 8 software, it asks you if you want to turn on the Genius feature which involves allowing Apple to "Gather Information" about your collection. My friend Paul gives a quick overview.


Anecdotal evidence says that this takes about a minute and a half per GB of music. I have something like 30GB of music and it indeed took over a half-hour. What is likely happening during this process is that it is sending information about your iTunes library (your .itl file) up to the Apple intarwebs for massive comparison with other peoples' libraries. Apparently it is not just the track titles and artist names, but play counts, ratings, your custom-built playlists, whatever's in there.


They combine your info with your pals' info and they find connections between what you've got and people like you listen to. This is called collaborative filtering and is similar to how Amazon does their "People who bought this also bought this" recommendation technology, but I think Apple is adding some more stuff to the mix:

Click here to read the rest of the post

So here's my goofy representation of three of the collections loaded up to the Apples:


In this figure, we've got my collection, Steve's collection and Bill's collection (c'mon, you know Bill Gates has iTunes loaded somewhere). The first stage of the upload is likely using their fuzzy text matching to link each of our text to the info they have from the iTunes store. This way all versions of "Under My Thumb" by the Rolling Stones (or the "Rooling Stones" as is pointed out in this patent application) can be recognized, reconciled and the data can be normalized. Also from here they can apply their own standardized data like Genre, Popularity, BPM (Beats Per Minute), Sales Ranking, and any number of other goodies they might be sitting on. The iTunes Music Store is also a licensee of AMG data so they'll also have album-level styles (sub-genres) and artist level styles and moods (not to mention the editorial Similar Artist data).

The three of us all have "Dream On" by Aerosmith in our collections. This will be the key to looking at what other things we may be interested in. We also have "Beth" by Kiss in our collections. By this logic, Apple considers these songs to be "Similar" or at least songs that all of us like, and may like together. Similarly, we've all got Miles Davis songs and Rolling Stones songs in our collections, so songs by these artists may also seem to be in the same pile of "People who like this song also like this song."

The gritty part of collaborative filtering is that just because you have beer and diapers in the same shopping cart, that doesn't make them "similar" so at this point Apple applies a layer of filters to trim down the lists. Genre seems to be a big one, which makes it less likely that we'd see a Miles Davis song in our "Dream On" playlist. I speculate that they're also using their BPM data from the store, making it less likely that "Beth" (Kiss' wonderfully sappy love ballad) will be grouped in with an epic jam like "Dream On."

This also may be why songs by The Beatles and AC/DC (two long-time iTunes Music Store holdouts) are not available to build playlists on, nor will they show up in playlists anywhere.

Hey Dude

Without matching Genre and determining BPM in their internal store data, they "don't know what the song is" so to speak.

So at this point Apple likely has a whole huge list of songs in the same Genre (Rock) as "Dream On" and the same approximate BPM range (roughly 80 beats per minute), with maybe some special sauce regarding the styles of the artist or albums the song appears on (Hard Rock, Heavy Metal, Album Rock, Arena Rock). Prioritize the list by sales rank numbers or overall popularity based on the number of times they've seen the song in other peoples' collections, and you have a pretty solid list of Similar Songs.

The problem now is that they're likely to have a bunch of songs in the list that I don't have in my collection (and therefore can't show up in a playlist). Maybe the second best match to "Dream On" is "Shooting Star" by Bad Company, but since I don't have that in my collection. Apple then needs to reconcile their Master List of "Dream On" matches with the info from my collection, and then send that back down to me.

Then you do that 8000 times for every song in my iTunes library.

They're clearly generating (at least) 100-song static playlists for each eligible song, since you can bump the number of songs up from 25 to 100 without making another call up to the servers. I now have a file called iTunes Library Genius.itdb which is roughly 5 times the size of my iTunes library, so the guts are residing on my laptop. The popularity or sales data seems to factor pretty highly since as my buddy Brian pointed out, most of the songs that bubble up in the playlist seem to be "the hits" or "the singles" (or maybe even the AllMusic Track Picks?).

They're clearly "ranking" the songs and trying to ensure that the best matches come back most often near the top (but not always). In the image below, I refreshed my "Dream on" playlist 5 times and "You Really Got Me"' by Van Halen came back as the #2 song most of the time.

Got me

The rest of the playlist changes a lot, so it's interesting to see the best matches linger near the top most of the time. This kind of randomization is tough to get right (and I might argue that they don't have it spot-on just yet).

They're doing some interesting stuff regarding what we call "Pruning" (making sure that no one artist totally dominates the list and making it less likely that two songs by the same artist get played back to back). When you build a playlist, only very rarely do you see two songs by the same artist "touching" each other, and even then it is often something like "Alison Krauss" tagged on one song and "Alison Krauss and Union Station" tagged on the next song.

I Will

This one is interesting in that the seed song (a cover of "I Will" by The Beatles) is very slow and quiet, and it dug out a solid handful of slow, quiet songs. These are indicated by the arrows next to the tracks I feel match roughly the same tempo and feel of the seed song.


The second list ("Little Liza Jane" an uptempo bluegrassy number) offers many more upbeat songs (as indicated by the arrows again). 11 slow songs in list one, 19 fast songs in list 2. That isn't likely to be random, since it seems pretty consistent when I refresh the lists.

Looking at a playlist from Simon & Garfunkel's "The Only Living Boy in New York" gets interesting since it is really a quiet folk-pop song, but also appeared on the "Alternative" soundtrack for 'Garden State'...the playlist is an interesting mix of alt-rock like Iron & Wine and the Shins, but also The Velvet Underground and the Beach Boys.


The fascinating thing is that it picked "Pale Blue Eyes" by the Velvets and "God Only Knows" by the Beach Boys -- Two very quiet, slow, "Searching" and "Poignant" songs. Why didn't it grab the arguably more popular "Rock & Roll" or "California Girls"? This one gets 19 winners out of 24, with the other 5 still making sense some in my mind.

Not all is joy in geniusville though. Look at this stinker:


Good Lord in Heaven, what is happening here? New Order? Deee-Light? It's like Aerosmith was sucked into some bizarro world. Even the other Aerosmith choices the Genius made are rotten. (Note this is from the rockin' original version of "Walk This Way" not the Run D.M.C. version).

Doo Wop

"Sincerely" by the Moonglows brings back a pretty mediocre list, especially since I actually have a lot of old Doo Wop in my collection. Chuck Berry and Prince seem to show up in a whole lot of lists. It seems like Apple piled all of my mainstream R&B and Soul into a bucket and poured it out onscreen.

Wipe Out

"Wipe Out" by The Surfaris also brings back some weird choices (although "Teach Your Children" by CSNY going into "Girls, Girls, Girls" by Mötley Crüe is worth it just for the chuckle factor). I have a pretty good collection of surf rock and '60s pop that would have been more appropriate...I just wonder if they're so obscure that a lot of other iTunes users would not have them loaded up to the hive mind yet.


This one turned out pretty well. The Alt-Country song "Windfall" brings back a pretty solid list of twang-influenced rock songs. Even the Stones song, the Tom Petty song and the Byrds song have dusty back roads elements to them. With the exception of the (currently popular) Hold Steady song, bravo Genius.

Still, I can't help but think that the service would benefit from more in-depth track-level information. In working on the Tapestry project for AMG/Macrovision I see better results from our track-based information:


The Tapestry playlist for the Paul Simon song approaches it less from a "This appeared on the Garden State Soundtrack" perspective and more from a singer-songwriter perspective.


Some quiet Neil Young, the same Velvet Underground song, a Beatles song from Rubber Soul (which you won't see in any iTunes Genius playlist).

Walk THIS Way

The Tapestry playlist for "Walk This Way" fares much better than its Genius brother. Gone is the Deee-Lite and New Order replaced with solid rock jams from that era, and a couple of more modern rockers (Oasis, The Donnas).


"Sincerely" by the Moonglows brings back a really great list of old Doo Wop and street corner soul. Even the James Brown tunes are more from his passionate R&B days as opposed to his passionate funk era.


"Wipe Out" by The Surfaris gives a great list of surf tunes and summertime pop. This is one of those cases where having the data pre-built before launch avoids some of the "cold start" problems that iTunes is probably having with some of these more obscure tunes and styles.


Tapestry provides a comparable list of Alt-Country songs. I like some of the serendipity that the Genius playlist brought in, but there's nothing wrong with this list in my opinion.

(one more...just because we can:)

Ah yes, a little band called The Beatles. "Tomorrow Never Knows" is kind of a droning and haunting song, and Tapestry brings back a really solid list of some of the trippier and more elaborate stuff from that era, many of which feature somebody playing the Sitar. Ker-pow.


Recommendation and The Genius Sidebar

There is also a feature along the side where Genius tells you other songs that you should buy either relating to the artist you're listening to, or other top sellers in that artist's category. This is pretty scattershot in almost everybody's opinion, with only the occasional gem poking out of the dirt.


Final Thoughts:

Overall I am excited about this technology. It works OK now and will likely get better as it crunches more numbers. Still, I can't help but think that with more specific track level information it would be a big improvement.

I think that the tech website Ars Technica put it best when they said: "Admittedly, Genius is a 1.0, and Apple has proven persistent when it comes to tweaking the iTunes experience. The rampant success of similar music-analysis projects like Pandora, and sheer amount of chatter about Genius since its introduction, shows that there is an obvious interest in automated systems that can get to know our music libraries better than we do. For the time being, though, Genius won't be near the top of the class until some more polish is applied and Apple can do something about all the false positives."

Comments: 0

This page is powered by Blogger. Isn't yours?