I recently encountered some functionality in R which most of you might already know. Nevertheless, I want to share it here, because it might come in handy for those of you who do not know this yet.

Suppose you want to read in a large number of very large text tables in R. There is the great function fread() in the data.table package, which is really fast in reading in those large tables. However, it is still under development and sometimes it fails (e.g., if there are unbalanced quotes for an entry).

I guess, this will be fixed in the future. In the meantime, I wrote a little function which catches an error and tries something else.

The following function reads in a file (I stored it in one some private webspace for you if you want to try this out) with fread().

The biggest German railway company, the 'Deutsche Bahn', is subject of frequent emotional discussions about being late all the time. A big German newspaper, the Süddeutsche Zeitung built the so-called 'train monitor' (Zugmonitor). The data is (or was) made available in cooperation with OpenDataCity: http://www.opendatacity.de/zugmonitor-api/

This API provided information about trains up until September, 29th 2013. After that, no data is available because the Deutsche Bahn changed its system.

I used knitr to hack together a very short tutorial about XML in R.

It's in German. And it's not very long. But, hey, it's free :)

I hope it can be of help to someone who wants to get started with XML processing in R.

Please feel free to post or send any comments about the thing. If it is actually of use to someone I'll consider extending it.
2

Which function rbinds dataframes together fastest?

First competitor: classic rbind in a for loop over a list of dataframes

Second competitor: do.call("rbind", <list of dataframes>)

Third competitor: rbind.fill(<list of dataframes>) from the plyr package

The job:

- rbinding a list of dataframes with 4 columns each, one column is the splitting factor, the other 3 hold normally distributed random data

- the number of rows of the original dataframe is varied between 20,000; 50,000; 100,000; 20
2

I already introduced some stuff I did with the last.fm API. But did you ever wonder if your taste of music changes over the year? Sunny music in the sunny months and dark music in darker months? Well, I did. And I want to check it out with the RLastFM package and some additional functions.

First, we load the package and assign an API key to the global variable api.key, you have to get yourself an API key to test this stuff.
3

I want to share a function I wrote for my dissertation. The function is useful for putting up to two R tables into one TeX table.

You have to load the package 'languageR' to have the dataset 'dative' available.

Let's suppose you have two tables, one with means and another one with standard deviations. Of course, these two tables have the same number of rows and columns - this is also checked in the function.

last.fm is an internet radio and music suggestion service. Registered users can also use last.fm to 'scrobble' tracks they've been listening to. last.fm then keeps track of a user's statistics in terms of top artists, albums and tracks.

Luckily, last.fm also has an API which is accessible as soon as you get a key for it. Thanks to this API, there are lot of cool web-based applications for last.fm.

Today, I want to show you a few little things we can do with this API using R.

Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest. Please note, that I use a dataframe called eu.players throughout this post, which holds downloaded football player information from transfermarkt.de.

But first, let us get back to the original graph.

It's been some time since my last post on football. And we're talking about european soccer here.

So I finally managed to write some functions which allow me to extract player stats from www.transfermarkt.de. The site tracks lots of stats in the world of soccer. For each player, there is information about the dominant foot, height, age, the estimated market value of the player and a load more.

I extracted stats for all registered players from the five major national championships in Europe.
Blog Archive
BlogRoll
BlogRoll
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.