last.fm is an internet radio and music suggestion service. Registered users can also use last.fm to 'scrobble' tracks they've been listening to. last.fm then keeps track of a user's statistics in terms of top artists, albums and tracks.

Luckily, last.fm also has an API which is accessible as soon as you get a key for it. Thanks to this API, there are lot of cool web-based applications for last.fm.

Today, I want to show you a few little things we can do with this API using R. I used (and modified) the R package RLastFM by Greg Hirson (thanks again, Greg!) to access the API and get the information.

I had the idea to group countries based on the listening habits ('scrobbles') of the people living there. Hierarchical clustering is the way to go here, I guess. As distances, we could just use the number of overlapping artists in the top 50 artists of each country.

First, we will need a function to access the API. This is just a convinience function for the already great working functions by Greg Hirson.

library(RLastFM)

get.country.artists <- function (country) {
  geo.getTopArtists(country)$artist }

Now, we select some countries (I selected all OECD countries, that's kind of arbitrary, but it's a start). Note, that the country names are defined by the ISO 3166-1 country names standard.

oecd.countries <- c("Belgium", "Denmark", "Germany", "France", "Greece", "Ireland", "Iceland", "Italy", "Canada", "Luxembourg", "Netherlands", "Norway", "Austria", "Portugal", "Sweden", "Switzerland", "Spain", "Turkey", "USA", "United Kingdom", "Japan", "Finland", "Australia", "New Zealand", "Mexico", "Czech Republic", "Korea, Republic of", "Hungary", "Poland", "Slovakia", "Chile", "Slovenia", "Israel", "Estonia")

Now, I access the last.fm API and put the results into a list.

countries <- sort(oecd.countries)

art.list <- list()
for (coun in countries) {
  cat(coun,"\n")
  art.list[[coun]] <- get.country.artists(coun) }

Afterwards, we need to create distance matrix based on the number of overlapping artists of two countries. First, I define a function to intersect two artist lists:

intersect.countries <- function (country1.artists, country2.artists) {
  length(intersect(country1.artists, country2.artists)) }

Now, I use the function on every possible pair of countries, write the results into a matrix and convert this matrix into a distance matrix.

result.mat <- c()
for (coun in countries) {
  new.vec <- c()
  for (i in 1:length(countries)) {
    new.dist <- 1 - (intersect.countries(art.list[[coun]], art.list[[countries[i]]]) / 50)
    new.vec <- c(new.vec, new.dist) }
  result.mat <- rbind(result.mat, new.vec) }
colnames(result.mat) <- countries
rownames(result.mat) <- countries
dists <- as.dist(result.mat, diag = T, upper = T)

Now, I'm doing the hierarchical clustering. I'm chosing the Ward method.

dists.clust <- hclust(dists, method = "ward")

And now for the plot (finally!)...

plot(dists.clust, main = "Clustering Dendogram, Method: Ward", xlab = "Similarities based on number of overlapping artists in top 50 artists", sub = "", cex = 0.9)

(click to be able to read anything)

It makes sense, doesn't it? Countries with many overlapping artists in the top 50 share one branch of the clustering tree. Other groups of countries are 'clustering in' later. In the right-most branch, large portions of Scandinavia (except Iceland) are clustering together. For some countries, I don't have an explanation (Iceland and Portugal?).

Currently, I'm experimenting with some visualization technique with the nice R maps package.

last.fm also supplies metro charts, where for specific cities, there are extra charts. Let's play around with it. First, we gonna need some new functions (these are adaptations from the RLastFM package and you gonna need to insert your own API key to make them work).

get.all.metros <- function (country, lastapi = RLastFM:::baseurl) {
xpathSApply(xmlParse(getForm(lastapi, method = "geo.getMetros", country = country, api_key = <your_key_here>), asText = T), "//metro/name", xmlValue) }

p.geo.getMetroArtistChart <- function (f) {
doc = xmlParse(f, asText = T)
list(artist = xpathSApply(doc, "//artist/name", xmlValue),
playcount = xpathSApply(doc, "//artist/listeners", xmlValue)) }
get.metro.artist <- function (metro, country = "germany", n = 100) {
p.geo.getMetroArtistChart(
getForm(RLastFM:::baseurl,
method = "geo.getMetroArtistChart",
country = country,
metro = metro,
limit = n,
api_key = <your_key_here>)) }

Now, let's use them to extract all metros supported in Germany and France. Afterwards, build two lists with metro charts.

de.metros <- get.all.metros(country = "germany")
fr.metros <- get.all.metros(country = "france")

build.metro.chart.list <- function (metros, country) {
metro.chart.list <- list()
for (metro in metros) {
cat(metro, "\n")
metro.chart.list[[metro]] <- get.metro.artist(metro, country = country) }
metro.chart.list }
de.metro.charts <- build.metro.chart.list(get.all.metros(country = "germany"), "germany")
fr.metro.charts <- build.metro.chart.list(get.all.metros(country = "france"), "france")

Now, load the maps package and the dataset of cities that comes with it. Then, draw Germany and France.

library(maps)
data(world.cities)
map(database = "world", regions = c("Germany", "France"), exact = T)


Here comes the fun part: Look into world.citites for each metro and write the top artist of each metro at the location of the city (under the city's name). Please note, that there are two Frankfurts and two Lilles' in world.cities. I have to select the correct ones.

for (city in names(de.metro.charts)) {
  city.info <- world.cities[world.cities$name == city,]
  if (city.info$name[1] == "Frankfurt") city.info <- city.info[1,]
  text(x = city.info$long, y = city.info$lat, labels = city.info$name, cex = .6)
  text(x = city.info$long, y = city.info$lat - 0.25,
       labels = de.metro.charts[[city]]$artist[1],
       col = "#FF0000FF", cex = .6)
}

for (city in names(fr.metro.charts)) {
  city.info <- world.cities[world.cities$name == city,]
  if (city.info$name[1] == "Lille") city.info <- city.info[2,]
  text(x = city.info$long, y = city.info$lat, labels = city.info$name, cex = .6)
  text(x = city.info$long, y = city.info$lat - 0.25,
       labels = fr.metro.charts[[city]]$artist[1],
       col = "#FF0000FF", cex = .6)
}


(click to enlarge)

So much for today, I'm too shocked by Coldplay in whole Germany to go on :)



0

Add a comment

Hi all, this is just an announcement.

I am moving Rcrastinate to a blogdown-based solution and am therefore leaving blogger.com. If you're interested in the new setup and how you could do the same yourself, please check out the all shiny and new Rcrastinate over at

http://rcrastinate.rbind.io/

In my first post over there, I am giving a short summary on how I started the whole thing. I hope that the new Rcrastinate is also integrated into R-bloggers soon.

Thanks for being here, see you over there.

Alright, seems like this is developing into a blog where I am increasingly investigating my own music listening habits.

Recently, I've come across the analyzelastfm package by Sebastian Wolf. I used it to download my complete listening history from Last.FM for the last ten years. That's a complete dataset from 2009 to 2018 with exactly 65,356 "scrobbles" (which is the word Last.FM uses to describe one instance of a playback of a song).
3

Giddy up, giddy it up

Wanna move into a fool's gold room

With my pulse on the animal jewels

Of the rules that you choose to use to get loose

With the luminous moves

Bored of these limits, let me get, let me get it like

Wow!

When it comes to surreal lyrics and videos, I'm always thinking of Beck. Above, I cited the beginning of the song "Wow" from his latest album "Colors" which has received rather mixed reviews. In this post, I want to show you what I have done with Spotify's API.

Click here for the interactive visualization

If you're interested in the visualisation of networks or graphs, you might've heard of the great package "visNetwork". I think it's a really great package and I love playing around with it. The scenarios of graph-based analyses are many and diverse: whenever you can describe your data in terms of "outgoing" and "receiving" entities, a graph-based analysis and/or visualisation is possible.
12

Here is some updated R code from my previous post. It doesn't throw any warnings when importing tracks with and without heart rate information. Also, it is easier to distinguish types of tracks now (e.g., when you want to plot runs and rides separately). Another thing I changed: You get very basic information on the track when you click on it (currently the name of the track and the total length).

Have fun and leave a comment if you have any questions.
3

So, Strava's heatmap made quite a stir the last few weeks. I decided to give it a try myself. I wanted to create some kind of "personal heatmap" of my runs, using Strava's API. Also, combining the data with Leaflet maps allows us to make use of the beautiful map tiles supported by Leaflet and to zoom and move the maps around - with the runs on it, of course.

So, let's get started. First, you will need an access token for Strava's API.

I've been using the ggplot2 package a lot recently. When creating a legend or tick marks on the axes, ggplot2 uses the levels of a character or factor vector. Most of the time, I am working with coded variables that use some abbreviation of the "true" meaning (e.g. "f" for female and "m" for male or single characters for some single character for a location: "S" for Stuttgart and "M" for Mannheim).

In my plots, I don't want these codes but the full name of the level.

It's been a while since I had the opportunity to post something on music. Let's get back to that.

I got my hands on some song lyrics by a range of artists. (I have an R script to download all lyrics for a given artist from a lyrics website.
4

Lately, I got the chance to play around with Shiny and Leaflet a lot - and it is really fun! So I decided to catch up on an old post of mine and build a Shiny application where you can upload your own GPX files and plot them directly in the browser.

Of course, you will need some GPX file to try it out. You can get an example file here (you gonna need to save it in a .gpx file with a text editor, though). Also, the Shiny application will always plot the first track saved in a GPX file.
9
Loading