May
14
The rbinding race: for vs. do.call vs. rbind.fill
Which function rbinds dataframes together fastest?
First competitor: classic rbind in a for loop over a list of dataframes
Second competitor: do.call("rbind", <list of dataframes>)
Third competitor: rbind.fill(<list of dataframes>) from the plyr package
The job:
- rbinding a list of dataframes with 4 columns each, one column is the splitting factor, the other 3 hold normally distributed random data
- the number of rows of the original dataframe is varied between 20,000; 50,000; 100,000; 200,000; 300,000; 400,000; 500,000 and 600,000 rows
- the number of levels for the splitting factor (hence the number of list elements after splitting) is varied between 6, 12 and 24 - the total number of rows for the original dataframe is held constant
The machine:
- A blazing fast late 2008 MacBo
First competitor: classic rbind in a for loop over a list of dataframes
Second competitor: do.call("rbind", <list of dataframes>)
Third competitor: rbind.fill(<list of dataframes>) from the plyr package
The job:
- rbinding a list of dataframes with 4 columns each, one column is the splitting factor, the other 3 hold normally distributed random data
- the number of rows of the original dataframe is varied between 20,000; 50,000; 100,000; 200,000; 300,000; 400,000; 500,000 and 600,000 rows
- the number of levels for the splitting factor (hence the number of list elements after splitting) is varied between 6, 12 and 24 - the total number of rows for the original dataframe is held constant
The machine:
- A blazing fast late 2008 MacBo