The machine:
MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that's my understanding of it) that it has two physical cores but can run up to four threads.
The task:
Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run's properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.
The results:
This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.
Code (plotting code not supplied):
library(doParallel)
library(parallel)
result.df <- data.frame()
for (i in 1:5) {
cat(i,"\n")
for (cases in c(500000, 1000000, 1500000, 2000000)) {
cat(cases, "\n")
for (cores in c(1,2,3,4)) {
n.cores <- cores
n.cases <- cases
cluster <- makeCluster(n.cores)
registerDoParallel(cluster)
t1 <- Sys.time()
result.vec <- foreach(i = 1:100, .combine=c) %dopar% {
rnorm(n.cases, mean = 10, sd = 30)
}
difft <- difftime(Sys.time(), t1, units = "secs")
result.df <- rbind(result.df, c(n.cores, n.cases, difft))
}}}
"This CPU is hyper-threaded, meaning (at least that's my understanding of it) that it has two physical cores but can run up to four threads." Not exactly; you can always run (just about) as many threads as you want, but hyperthreading reduces contention between two threads running on the same core. The key insight is that on some tasks, your code above would indeed show that four threads was faster than two, because of that reduced contention; but going to higher numbers of threads than four should never be faster than four threads, for any task, because it will always increase contention (on your machine, with two physical cores and four virtual cores). It might be interesting to look at larger numbers of threads than four, using your code; you should see performance go down, but I don't know by how much.
ReplyDeleteIntel claims that hyperthreading can result in a speedup of 15-30% for some applications, but it is extremely dependent on details of exactly what the threads are doing, on their memory usage patterns, and a million other factors. If you want to know whether a given task will benefit from hyperthreading, you basically have to try it and see. I use hyperthreading quite often on my 8-physical-core Mac Pro desktop, but the tasks I'm running are quite heterogeneous, which would tend to make hyperthreading more beneficial. Your code is doing a task that is extremely homogeneous (probably spending almost all of its time running a tight loop inside C code called by rnorm); hyperthreading might not be able to help much there because all four threads are trying to use exactly the same processor resources, and even with hyperthreading, since there are only two physical cores, a given processor resource (such as, I might speculate, the physical circuitry that calculates an exponential, in your case) will only have two physical instantiations. It would be interesting to try a more heterogeneous task – something like fitting a linear model to a large dataset, for example.
Thanks for your comments and clarifications, Ben. I will try some other tasks and let you know the results. It would be interesting to find some tasks in R which benefit from hyper-threading and others that don't.
Deletethanks for sharing this. I understand everything, except for the foreach line. you never use the "i = 1:100". what is that for?
ReplyDeleteHey Ben, thanks for the question. The corresponding text in my post to this line of code is:
Delete"Do this a hundred times and combine the result in one vector."
So, the iteration variable i is not used - it is only there to do the task a hundred times.
Best, Sascha