If you're interested in the visualisation of networks or graphs, you might've heard of the great package "visNetwork". I think it's a really great package and I love playing around with it. The scenarios of graph-based analyses are many and diverse: whenever you can describe your data in terms of "outgoing" and "receiving" entities, a graph-based analysis and/or visualisation is possible. During my work as a linguist, I already used graphs for different purposes like linking-structures within dictionaries, visualising co-occurence patterns of words and so on.

Today, I want to show you something completely different: transfers of male football players in the German "1. Bundesliga", the first division of male football in Germany. We can also describe this data in terms of outgoing and receiving entities (the nodes in the network): the clubs who are selling the players and the clubs who are buying the players. The edges (connections) within the network are the players themselves. And there are further attributes associated with the edges, e.g. the price of the player.

I'll spare you the boring details of getting the data (please write a comment if you would like more details on that). I start with the raw data structure created by the scraping process. It's a dataframe called transfer.df that looks like this:


"abloese" (or "Ablöse") means "transfer fee" and currently holds a string with certain codes and the currency: "ablösefrei" means that no transfer fee had to be payed (remember the famous Bosman ruling?). "-" means that the information on a transfer fee doesn't make sense (e.g., when a player finishes his career). "?" means that no information about the transfer fee is available. "Mio." and "Tsd." just encodes "million" or "thousand", we have to deal with that later.

But we have to take care of something else first. In the dataframe, a player always appears twice if he changed teams within the 1. Bundesliga: the first time for his old club (as outgoing) and the second time for his new club (as incoming). I dealt with it this way (also, I loaded the required packages):

library(visNetwork)
library(igraph)
library(stringr)


transfer.df2 <- data.frame()
all.players <- unique(transfer.df$player)
for (pi in all.players) {
  vork <- grep(pi, transfer.df$player, fixed = T)
  if (length(vork) == 1) {
    transfer.df2 <- rbind(transfer.df2, transfer.df[vork,])
  } else {
    transfer.df2 <- rbind(transfer.df2, transfer.df[vork[1],])
  }
}

This basically means that, whenever a player appears more than once in transfer.df, the first appearance is kept and the second appearance is deleted. The resulting network wouldn't be any different if we would keep the second appearance. So, now we are using transfer.df2 as our data structure.

Now, we have to deal with the "abloese" (transfer fee) column:

transfer.df2$abloese.num <- sapply(transfer.df2$abloese, USE.NAMES = F, FUN = function (x) {
  if (x %in% c("-", "?")) NA else {
    if (x == "ablösefrei") 0 else {
      mio <- grepl("Mio.", x, fixed = T)
      tsd <- grepl("Tsd.", x, fixed = T)
      x2 <- gsub(",", ".", x, fixed = T)
      x3 <- gsub("Mio. €", "", x2, fixed = T)
      x4 <- as.numeric(str_trim(gsub("Tsd. €", "", x3, fixed = T)))
      if (mio) x4*1000000 else {
        if (tsd) x4*1000 else { "FEHLER" }
      }
    }
  }

})

Basically, this is what we are doing:

  • If abloese is "-" or "?", we are using NA
  • If abloese is "ablösefrei" we are putting in 0
  • Then we see whether "Mio." appears in the string.
  • Then we see whether "Tsd." appears in the string.
  • Then we are deleting these substrings and the EUR sign and
  • trim the string and convert it to a numeric value.
  • If "Mio." appeared in the string, we are multiplying the result with one million and if "Tsd." appeared in the string, we are multiplying the result with one thousand (both will never appear in the string, it doesn't make sense).
Alright, now we have the column abloese.num and can move on to group the different transfer fees because we want to assign different colours to the edges in the network dependent on the transfer sum. The thresholds are arbitrary.

transfer.df2$abl.group <- cut(transfer.df2$abloese.num, c(0, 200*1000, 1000*1000, 2000*1000, 5000*1000, 10000*1000, 60000*1000), include.lowest = T)

transfer.df2$abl.col <- ifelse(transfer.df2$abloese.num == 0, "green",
                               ifelse(transfer.df2$abl.group == "[0,2e+05]", "#ffffcc",
                                      ifelse(transfer.df2$abl.group == "(2e+05,1e+06]", "#fed976",
                                             ifelse(transfer.df2$abl.group == "(1e+06,2e+06]", "#feb24c",
                                                    ifelse(transfer.df2$abl.group == "(2e+06,5e+06]", "#fc4e2a",
                                                           ifelse(transfer.df2$abl.group == "(5e+06,1e+07]", "#e31a1c",
                                                                  ifelse(transfer.df2$abl.group == "(1e+07,6e+07]", "#800026", "grey")))))))
transfer.df2$abl.col <- ifelse(is.na(transfer.df2$abl.group), "grey", transfer.df2$abl.col) 

Now, I am converting the dataframe to an igraph object and this object to visNetwork object. I'm sure the igraph step could be skipped, but this works like a charm and doesn't take much time.

graph <- graph.data.frame(transfer.df2)

vn <- toVisNetworkData(graph)

I am assigning color codes to the nodes:

vn$nodes$color <- ifelse(vn$nodes$id %in% clubs, "tomato",
                         ifelse(vn$nodes$id == "Vereinslos", "green",
                                ifelse(vn$nodes$id == "Karriereende", "blue", "grey")))

All clubs in the 1. Bundesliga get "tomato" (clubs is an object I defined earlier) all clubs that are not in the 1. Bundesliga (e.g., Hamburger SV) get "grey". There are two other special "clubs": "Karriereende" for "end of career" and "Vereinslos" for "no club", both get "green".

Three things left to be done:

vn$edges$title <- paste(vn$edges$player, vn$edges$abloese, sep = " - ")
vn$edges$color <- vn$edges$abl.col
vn$edges$width <- 4
  • Assign a title to the edges that consists of the player name and the transfer fee. This appears upon hovering the edge.
  • Assign the grouped transfer fee color we defined earlier.
  • Increase the width of the edges to make the color more visible.
No, for creating the HTML file for the graph:

visNetwork(nodes = vn$nodes, edges = vn$edges, height = "1000px", width = "100%") %>%
  visOptions(highlightNearest = TRUE) %>%
  #visIgraphLayout(layout = "layout_with_dh") %>%
  visEdges(arrows = "to", arrowStrikethrough = F) %>% visSave(file = "~/Desktop/transfers.html", selfcontained = T)

Please visit my personal webspace for the final result. The "redder" an edge in the network is, the more expensive the transfer was. You can also click on the nodes to only highlight all adjacent nodes (selling and buying clubs), drag nodes around (graph physics!) and hover over edges to see the specific player being transfered. Of course, zooming is enabled. visNetwork does all that. I love that package!

12

View comments

  1. This looks really cool. I am wondering whether I could visualize a bipartite network with visNetwork pack?

    ReplyDelete
    Replies
    1. Hi, thanks for your comment. Please try adding this line in the final creation sequence of pipes:

      visIgraphLayout(layout = "layout_as_bipartite") %>%

      If your network is bipartite, this layout function should work.

      Delete
  2. thanks for your post, really inspiring

    ReplyDelete
  3. Thanks for nice post! That is really amazing!I am curious (bot bored) about how you acquire all the data for your analyzing. Could you please write more about this? Did you collect them programmatically or is there any place to download the datafile(s)?

    ReplyDelete
    Replies
    1. Hi! Thanks for your comment. Actually, I took the data from a tabular overview on transfermarkt.de. This is a popular (commercial) website in Germany which is collecting information on all players and (rumors about) transfers. The table itself was not very good to grab automatically, so I copied it to a text file, made some minor alterations and then read this text table in to R. If you want to have this data and the code for reading it, please send me a message to sascha.wolfer[]posteo.de

      Delete
    2. Hi, many thanks for your answer! It would be nice if you can send it. I will send you an email.

      Delete
  4. Hello. Thanks a lot. I had studied about network analysis and your paper have been of grat help for me.

    ReplyDelete
  5. Fantastic works!! This really helps me. Could you share more information? In the final result, whenever mouse moves on a node, player and aloese data will be shown next to the node. It's an elegant and easy way to show details in each node. I've tried visEvent function but I can't get the samw output as your result. How can you do that? Thank you!

    ReplyDelete
    Replies
    1. Hi, thanks for your comment. Actually, the mouse-over label with player and transfer fee is not attached to the NODE. It's attached to the EDGE - and it is set with this line of code:

      vn$edges$title <- paste(vn$edges$player, vn$edges$abloese, sep = " - ")

      Delete
    2. It works!Thank you very much!
      By the way, I found that "visTree" function for tree-based learning result can do similar works and the layout of dialog frame can list multiple feaures with nice arrangement.(see https://datastorm-open.github.io/visNetwork/tree.html)
      Is there any way to get similar result base on regular network?
      Sorry for more questions, thanks!

      Delete
    3. Oh, wow. That really looks quite nice. Of course, you can also 'enhance' a regular network with a lot of these features. However, it heavily depends on which information you want to include in your network visualization. Basically, all features of the graph (e.g. color and size of nodes and edges) are customizable. I guess you have to comb through the documentation of visNetwork to achieve this. You could start here: https://datastorm-open.github.io/visNetwork/nodes.html and go through the following pages.

      Delete
    4. I will try more on these works!! Thank you!!

      Delete
I've been using the ggplot2 package a lot recently. When creating a legend or tick marks on the axes, ggplot2 uses the levels of a character or factor vector. Most of the time, I am working with coded variables that use some abbreviation of the "true" meaning (e.g.
png("goodbye.png", height = 625, width = 500)

par(col = "purple")

plot(1, 1, xlim = c(0,800), ylim = c(0,1600), type = "n", bty = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")

symbols(x = 400, y = 1200, circles = 400, add = T, lwd = 40)

lines(x = c(400, 400), y = c(900, 100), lwd = 40, lend
R is great, and you can do a LOT OF stuff with it.

However, sometimes you want to do really basic stuff with huge or a lot of files. At work, I have to do that a lot because I am mostly dealing with language data that often needs some pre-processing.
I work with R on both Mac OS and Windows. On Windows, you get the option to copy the path of a file or folder by holding Shift while right-clicking on the file or folder. As useful as this feature is, it copies paths to your clipboard in Windows format, e.g.
I recently encountered some functionality in R which most of you might already know. Nevertheless, I want to share it here, because it might come in handy for those of you who do not know this yet.

Suppose you want to read in a large number of very large text tables in R.
I used knitr to hack together a very short tutorial about XML in R.

It's in German. And it's not very long. But, hey, it's free :)

I hope it can be of help to someone who wants to get started with XML processing in R.

Please feel free to post or send any comments about the thing.
As long as I can't find the time to post my newest adventuRes, why don't you check out the great collection of other R-blogs on the web:

www.r-bloggers.com 

Have fun!
Just a fast note: I came across the R-package "knitr" which enables you to generate PDF files by mixing LaTeX and R code in one document. The result looks very nice and is great to create documentations, manuals and so on.
Blog Archive
BlogRoll
BlogRoll
  • Thanks, on its way to CRAN The generic seal of approval from the CRAN team – countless hours spent tabbing between R CMD check and R CMD build logs, ‘Writing R Extensions’ and Stac...

    6 hours ago
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.