Alright, now we have all the data we need in one dataframe. To make this code work, I assume you ran the code from Part 1. We need the dataframe big.tab.

All the data presented here is based on the data from 18/10/2012. You can run an analysis with the actual data or I can do it at some point later in the season.

Let's plot some stuff. How about the old german saying about soccer "Geld schießt keine Tore" (Money doesn't score goals)? Let's look into this.


plot(big.tab$Value, big.tab$Goals.for, type = "n", axes = F, xlab = "Value", ylab = "Goals")
text(x = big.tab$Value, y = big.tab$Goals.for, labels = big.tab$Team, cex = 0.7, col = "#65656599")
axis(side = 1)
axis(side = 2)

We get this... (clickable)
Sorry for the overlapping team names. But you get the gist: It looks like the value of a team covaries with the number of goals for that team.

Now we add a regression line. This means, we predict the number of goals for one team by the value of that team. We also add a Pearson correlation coefficient (r) and its associated p value in the subtitle of the plot. We get this...
How do we interpret this? There are several conclusions that could be drawn.

(1) The value of a team in the british Premier League is reliably correlated with the number of goals that team scored in the championship so far (after 7 games). Beware: Correlation does not imply causation.

(2) The "best guess" of predicting the number of goals by the value of a team is visualized by the dashed red line in the second plot. This means that there are teams who "over-perform" and "under-perform" in relation to their value. FC Fulham, for example, shot way "too many" goals given its value. FC Liverpool, on the other hand, should have shot more goals, because they are under the red line.

(3) One could infer from this plot that it is quite difficult for very valuable teams (e.g., ManU, ManCity and the FC Chelsea) to over-perform since the regression line is so steadily rising. So, they have to score many many goals to outperform their level on the regression line.

By the way: This also works quite good for the value of a team and the points they achieved in the championship (win = 3 points, draw = 1 point).
In the next post, I will do some more analyses and plots with this dataset. And I will try to compare different european championships.




0

Add a comment

Hi all, this is just an announcement.

I am moving Rcrastinate to a blogdown-based solution and am therefore leaving blogger.com.

Alright, seems like this is developing into a blog where I am increasingly investigating my own music listening habits.

Recently, I've come across the analyzelastfm package by Sebastian Wolf. I used it to download my complete listening history from Last.FM for the last ten years.

3

Giddy up, giddy it up

Wanna move into a fool's gold room

With my pulse on the animal jewels

Of the rules that you choose to use to get loose

With the luminous moves

Bored of these limits, let me get, let me get it like

Wow!

When it comes to surreal lyrics and videos, I'm always thinking of Be

Click here for the interactive visualization

If you're interested in the visualisation of networks or graphs, you might've heard of the great package "visNetwork". I think it's a really great package and I love playing around with it.

12

Here is some updated R code from my previous post. It doesn't throw any warnings when importing tracks with and without heart rate information. Also, it is easier to distinguish types of tracks now (e.g., when you want to plot runs and rides separately).

3

So, Strava's heatmap made quite a stir the last few weeks. I decided to give it a try myself. I wanted to create some kind of "personal heatmap" of my runs, using Strava's API.

I've been using the ggplot2 package a lot recently. When creating a legend or tick marks on the axes, ggplot2 uses the levels of a character or factor vector. Most of the time, I am working with coded variables that use some abbreviation of the "true" meaning (e.g.

It's been a while since I had the opportunity to post something on music. Let's get back to that.

I got my hands on some song lyrics by a range of artists. (I have an R script to download all lyrics for a given artist from a lyrics website.

4

Lately, I got the chance to play around with Shiny and Leaflet a lot - and it is really fun! So I decided to catch up on an old post of mine and build a Shiny application where you can upload your own GPX files and plot them directly in the browser.

9

[EDIT: The function now also inludes the possibility to plot the IQR around the median. I shifted the median slightly downwards to prevent the SD and the IQR from overlapping.]

I wrote a function to visualise results of Likert scale items. Please find the function below the post.

9

Today I want to write about a solution to a quite specific problem. Suppose, you want to label cells in your 'vcd' package mosaic plots in a custom way. For example, we might want to use cell labels which indicate "too much" or "too few" cases (given your expected values).

3

png("goodbye.png", height = 625, width = 500)

par(col = "purple")

plot(1, 1, xlim = c(0,800), ylim = c(0,1600), type = "n", bty = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")

symbols(x = 400, y = 1200, circles = 400, add = T, lwd = 40)

lines(x = c(400, 400), y = c(900, 100), lwd = 40, lend

R is great, and you can do a LOT OF stuff with it.

However, sometimes you want to do really basic stuff with huge or a lot of files. At work, I have to do that a lot because I am mostly dealing with language data that often needs some pre-processing.

I work with R on both Mac OS and Windows. On Windows, you get the option to copy the path of a file or folder by holding Shift while right-clicking on the file or folder. As useful as this feature is, it copies paths to your clipboard in Windows format, e.g.

4

I've got a NetAtmo weather station. One can download the measurements from its web interface as a CSV file.

5

Want to change the font used in your R plots? I got a quite simple solution that works on Mac OS.

You need the function 'quartzFonts'. With this function, you can define additional font families to use in your R base graphic plots. The default font families are 'sans', 'serif' and 'mono'.

5

Many GPS devices and apps have the capability to track your current position via GPS. If you go walking, running, cycling, flying or driving, you can take a look at your exact route and your average speed.

5

This is something I did a while ago using the Berlin Affective Word List (BAWL).

The BAWL contains ratings for 2902 German words (2107 nouns, 504 verbs, 291 adjectives). Ratings were collected for emotional valence (bad vs.

Alright, let's test some parallelization functionalities in R.

The machine:

MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that's my understanding of it) that it has two physical cores but can run up to four threads.

4

I recently encountered some functionality in R which most of you might already know. Nevertheless, I want to share it here, because it might come in handy for those of you who do not know this yet.

Suppose you want to read in a large number of very large text tables in R.

The biggest German railway company, the 'Deutsche Bahn', is subject of frequent emotional discussions about being late all the time. A big German newspaper, the Süddeutsche Zeitung built the so-called 'train monitor' (Zugmonitor).

I used knitr to hack together a very short tutorial about XML in R.

It's in German. And it's not very long. But, hey, it's free :)

I hope it can be of help to someone who wants to get started with XML processing in R.

Please feel free to post or send any comments about the thing.

2

Which function rbinds dataframes together fastest?

First competitor: classic rbind in a for loop over a list of dataframes

Second competitor: do.call("rbind", <list of dataframes>)

Third competitor: rbind.fill(<list of dataframes>) from the plyr package

The job:

- rbinding a list of dataframes

2

I already introduced some stuff I did with the last.fm API. But did you ever wonder if your taste of music changes over the year? Sunny music in the sunny months and dark music in darker months? Well, I did. And I want to check it out with the RLastFM package and some additional functions.

3

I want to share a function I wrote for my dissertation. The function is useful for putting up to two R tables into one TeX table.

You have to load the package 'languageR' to have the dataset 'dative' available.

last.fm is an internet radio and music suggestion service. Registered users can also use last.fm to 'scrobble' tracks they've been listening to. last.fm then keeps track of a user's statistics in terms of top artists, albums and tracks.

Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest.

It's been some time since my last post on football. And we're talking about european soccer here.

So I finally managed to write some functions which allow me to extract player stats from www.transfermarkt.de. The site tracks lots of stats in the world of soccer.

As long as I can't find the time to post my newest adventuRes, why don't you check out the great collection of other R-blogs on the web:

www.r-bloggers.com 

Have fun!

Just as a quick reply to a friend of mine who suggested testing the swearing capabilities of The Dude:

Click to enlarge.

As you can see, "The Big Lebowski" (2.79 % swear words) takes the Tarantino threshold (0.98 %) easily, but it's no match against "Reservoir Dogs" (3.28 %).

Fortunately, there is a page called www.opensubtitles.org, where you can get subtitle (.SRT) files for virtually every movie. Now let's see what we can do with these. SRT files are in plain text format (human readable) and can thus be read quite easily with R.

Just a fast note: I came across the R-package "knitr" which enables you to generate PDF files by mixing LaTeX and R code in one document. The result looks very nice and is great to create documentations, manuals and so on.

Blog Archive
BlogRoll
BlogRoll
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.