Alright, now we have all the data we need in one dataframe. To make this code work, I assume you ran the code from Part 1. We need the dataframe big.tab.

All the data presented here is based on the data from 18/10/2012. You can run an analysis with the actual data or I can do it at some point later in the season.

Let's plot some stuff. How about the old german saying about soccer "Geld schießt keine Tore" (Money doesn't score goals)? Let's look into this.


plot(big.tab$Value, big.tab$Goals.for, type = "n", axes = F, xlab = "Value", ylab = "Goals")
text(x = big.tab$Value, y = big.tab$Goals.for, labels = big.tab$Team, cex = 0.7, col = "#65656599")
axis(side = 1)
axis(side = 2)

We get this... (clickable)
Sorry for the overlapping team names. But you get the gist: It looks like the value of a team covaries with the number of goals for that team.

Now we add a regression line. This means, we predict the number of goals for one team by the value of that team. We also add a Pearson correlation coefficient (r) and its associated p value in the subtitle of the plot. We get this...
How do we interpret this? There are several conclusions that could be drawn.

(1) The value of a team in the british Premier League is reliably correlated with the number of goals that team scored in the championship so far (after 7 games). Beware: Correlation does not imply causation.

(2) The "best guess" of predicting the number of goals by the value of a team is visualized by the dashed red line in the second plot. This means that there are teams who "over-perform" and "under-perform" in relation to their value. FC Fulham, for example, shot way "too many" goals given its value. FC Liverpool, on the other hand, should have shot more goals, because they are under the red line.

(3) One could infer from this plot that it is quite difficult for very valuable teams (e.g., ManU, ManCity and the FC Chelsea) to over-perform since the regression line is so steadily rising. So, they have to score many many goals to outperform their level on the regression line.

By the way: This also works quite good for the value of a team and the points they achieved in the championship (win = 3 points, draw = 1 point).
In the next post, I will do some more analyses and plots with this dataset. And I will try to compare different european championships.




0

Add a comment

I've been using the ggplot2 package a lot recently. When creating a legend or tick marks on the axes, ggplot2 uses the levels of a character or factor vector. Most of the time, I am working with coded variables that use some abbreviation of the "true" meaning (e.g.
png("goodbye.png", height = 625, width = 500)

par(col = "purple")

plot(1, 1, xlim = c(0,800), ylim = c(0,1600), type = "n", bty = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")

symbols(x = 400, y = 1200, circles = 400, add = T, lwd = 40)

lines(x = c(400, 400), y = c(900, 100), lwd = 40, lend
R is great, and you can do a LOT OF stuff with it.

However, sometimes you want to do really basic stuff with huge or a lot of files. At work, I have to do that a lot because I am mostly dealing with language data that often needs some pre-processing.
I work with R on both Mac OS and Windows. On Windows, you get the option to copy the path of a file or folder by holding Shift while right-clicking on the file or folder. As useful as this feature is, it copies paths to your clipboard in Windows format, e.g.
I recently encountered some functionality in R which most of you might already know. Nevertheless, I want to share it here, because it might come in handy for those of you who do not know this yet.

Suppose you want to read in a large number of very large text tables in R.
I used knitr to hack together a very short tutorial about XML in R.

It's in German. And it's not very long. But, hey, it's free :)

I hope it can be of help to someone who wants to get started with XML processing in R.

Please feel free to post or send any comments about the thing.
As long as I can't find the time to post my newest adventuRes, why don't you check out the great collection of other R-blogs on the web:

www.r-bloggers.com 

Have fun!
Just a fast note: I came across the R-package "knitr" which enables you to generate PDF files by mixing LaTeX and R code in one document. The result looks very nice and is great to create documentations, manuals and so on.
Blog Archive
BlogRoll
BlogRoll
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.