Alright, now we have all the data we need in one dataframe. To make this code work, I assume you ran the code from Part 1. We need the dataframe big.tab.

All the data presented here is based on the data from 18/10/2012. You can run an analysis with the actual data or I can do it at some point later in the season.

Let's plot some stuff. How about the old german saying about soccer "Geld schießt keine Tore" (Money doesn't score goals)? Let's look into this.


plot(big.tab$Value, big.tab$Goals.for, type = "n", axes = F, xlab = "Value", ylab = "Goals")
text(x = big.tab$Value, y = big.tab$Goals.for, labels = big.tab$Team, cex = 0.7, col = "#65656599")
axis(side = 1)
axis(side = 2)

We get this... (clickable)
Sorry for the overlapping team names. But you get the gist: It looks like the value of a team covaries with the number of goals for that team.

Now we add a regression line. This means, we predict the number of goals for one team by the value of that team. We also add a Pearson correlation coefficient (r) and its associated p value in the subtitle of the plot. We get this...
How do we interpret this? There are several conclusions that could be drawn.

(1) The value of a team in the british Premier League is reliably correlated with the number of goals that team scored in the championship so far (after 7 games). Beware: Correlation does not imply causation.

(2) The "best guess" of predicting the number of goals by the value of a team is visualized by the dashed red line in the second plot. This means that there are teams who "over-perform" and "under-perform" in relation to their value. FC Fulham, for example, shot way "too many" goals given its value. FC Liverpool, on the other hand, should have shot more goals, because they are under the red line.

(3) One could infer from this plot that it is quite difficult for very valuable teams (e.g., ManU, ManCity and the FC Chelsea) to over-perform since the regression line is so steadily rising. So, they have to score many many goals to outperform their level on the regression line.

By the way: This also works quite good for the value of a team and the points they achieved in the championship (win = 3 points, draw = 1 point).
In the next post, I will do some more analyses and plots with this dataset. And I will try to compare different european championships.




0

Add a comment

Rcrastinate is moving.
10 years of playback history on Last.FM: "Just sit back and listen"
3
This dance, it's like a weapon: Radiohead's and Beck's danceability, valence, popularity, and more from the LastFM and Spotify APIs
Network visualization of football transfers using the 'visNetwork' package
12
Get your tracks from the Strava API and plot them on Leaflet maps
3
Where do you run to? Map your Strava activities on static and Leaflet maps.
Substitute levels in a factor or character vector
Substitute levels in a factor or character vector
What's in the words? Comparing artists and lyrics with R.
4
Plotting GPX tracks with Shiny and Leaflet
9
Visualisation of Likert scale results
9
Troubles with cell labels in mosaic plots... and how to solve them.
3
Just plot this...
Just plot this...
Do basic R operations much faster in bash [Slightly off-topic]
Do basic R operations much faster in bash [Slightly off-topic]
Stop fiddling around with copied paths in Windows R
Stop fiddling around with copied paths in Windows R
4
Time series analysis with R: Testing stuff with NetAtmo data
5
Changing the font of R base graphic plots.
5
Stay on track: Plotting GPS tracks with R
5
Getting emotional in the absence of something: Using the Berlin Affective Word List to analyze emotional valence and arousal for nouns and adjectives.
Hyperthreading FTW? Testing parallelization performance in R.
4
Catching errors in R and trying something else
Catching errors in R and trying something else
The 'Deutsche Bahn' (German Railway Corp.) is always late!!1! Or is it? And if, why?
XML in R - A (German) tutorial / XML in R - ein Tutorial auf Deutsch
XML in R - A (German) tutorial / XML in R - ein Tutorial auf Deutsch
2
The rbinding race: for vs. do.call vs. rbind.fill
2
Funky music in funky months: Does my taste of music change over the year?
3
TeXing R tables: Save yourself a lot of typing...
Peace through Music. Country clustering using R and the last.fm API
"I don't wanna grow up": Age / value relationships for football players
The "golden age" of a football player
R-bloggers
R-bloggers
"The Dude" takes the Tarantino threshold
Fun stuff with subtitles or "The Tarantino Threshold"
Creating PDFs and websites with the "knitr" package
Creating PDFs and websites with the "knitr" package
Josh vs. himself (or: Firefly > all)
Going to the Movies...
Soccer is all about money (?) - Part 3: More plots & analyses
Soccer is all about money (?) - Part 2: Simple analyses
Soccer is all about money (?) - Part 1: Getting the Data
Soccer is all about money (?) - Part 1: Getting the Data
4
Let's go! (and Disclaimer)
Let's go! (and Disclaimer)
1
Blog Archive
BlogRoll
BlogRoll
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.