Reversal of code performance optimizations from Ruby 1.8 to 1.9
As some of you already know I am keenly playing with custom Ruby configurations from disabling pthreads in 1.8.6 Linux builds to numerous build variations of Ruby 1.9 (trunk).
Last night I noticed something quite interesting, but before we get to that point, let me tell you a story.There was once a fastidious Ruby developer who payed the closest attention to detail. Code optimization of all kinds were his obsession. So much so that he took great pains to “optimize code”. The results, in my opinion were not very Rubyish and often much less readable than the original code. One day last year, while running on Ruby 1.8.x, he found after running a script that using thefor item in list...endconstruct performed slightly better than the very Rubyish (and IMHO more pleasant)list.each { |item| ... }. On that day he made my life a misery by insisting that I replace all my.eachcode blocks tofor .... After an age of whinging (yes, that is a word – from Middle English, which means to excessively whine), I completed this useless task. The next day he somehow determined that.collectblocks were also inefficient and then insisted on another round of micro optimization that produce very little real world performance benefit if any.
Now fast forward a year. While playing with my recent Ruby 1.9 build on a CentOS server, I noticed the .each enumerable iteration construct performs faster than the for item in list...end construct.
Executive Summary
For those that don’t have data analysis stamina here are the basic results for Ruby 1.8.6 (with pthreads on CentOS) to two decimal places:- The
for i in list...endcase took 96.65% of the time to run as did theeachcase. Not usually worth the code optimization for 99.99% of the time. - The
collectcase took 92.97% of the time to run as did thelist (aka <<)case. So use the more Ruby-ish constructs to improve readability and reduce chance of human error in coding style.
- The
for i in list...endcase took 125.76% of the time to run as did theeachcase. Oreachcase took 79.51% of the time to run as did thefor i in list...endcase. Well, well. Who sucks now!? - The
collectcase took 84.08% of the time to run as did thelist (aka <<)case.
Benchmark Script
Now let’s have a look at my benchmark script for completeness (also just in case any coding errors got in to make the benchmark unfair):
require 'benchmark'
hash = {}
1_000_000.times {|i| hash["name#{i}"] = "value#{i}" }
def concat(value)
"value is #{value}"
end
Benchmark.bm do |b|
b.report("for ") { for key in hash.keys; concat(hash[key]); end; }
b.report("each") { hash.keys.each {|key| concat(hash[key]) } }
end
Benchmark.bm do |b|
b.report("<< ") do
list = []
for key in hash.keys
list << concat(hash[key])
end
list
end
b.report("collect") do
hash.keys.collect { |key| concat(hash[key]) }
end
end
See Array#<< vs. Array#collect benchmark in Ruby 1.8.6 here.
Definitely not the most interesting benchmark script in the world, but it should demonstrate base performance (without too many other variables factoring into performance).
Ruby 1.8.6 Results
First I ran this benchmark script using a Ruby 1.8.6 build I had in /usr. This build used the default./configure options which in Ruby 1.8.x means pthreads is enabled:
First run...
user system total real
for 4.560000 1.400000 5.960000 ( 5.962244)
each 4.800000 1.360000 6.160000 ( 6.158466)
user system total real
<< 5.410000 1.450000 6.860000 ( 6.855109)
collect 4.980000 1.430000 6.410000 ( 6.423387)
Second run...
user system total real
for 4.570000 1.350000 5.920000 ( 5.916540)
each 4.800000 1.370000 6.170000 ( 6.165264)
user system total real
<< 5.550000 1.360000 6.910000 ( 6.918924)
collect 4.970000 1.440000 6.410000 ( 6.398015)
Third run...
user system total real
for 4.500000 1.480000 5.980000 ( 5.986721)
each 4.810000 1.320000 6.130000 ( 6.135532)
user system total real
<< 5.460000 1.420000 6.880000 ( 6.871511)
collect 4.980000 1.420000 6.400000 ( 6.392726)
Fourth run...
user system total real
for 4.490000 1.410000 5.900000 ( 5.900563)
each 4.730000 1.400000 6.130000 ( 6.131811)
user system total real
<< 5.560000 1.360000 6.920000 ( 6.912522)
collect 4.970000 1.440000 6.410000 ( 6.407939)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.8.6.
Well where to I start? First of all the differences in the results above running on Ruby v1.8.6 (with pthreads enabled) are quite small anyway. Let us simply analyze the small amount of data I collected to put things in perspective:
In the R shell I did something like...
> for_real_totals <- c(5.962244, 5.916540, 5.986721, 5.900563)
> each_real_totals <- c(6.158466, 6.165264, 6.135532, 6.131811)
> list_real_totals <- c(6.855109, 6.918924, 6.871511, 6.912522)
> collect_real_totals <- c(6.423387, 6.398015, 6.392726, 6.407939)
On to ratios...
> c1_to_c2_ratio <- for_real_totals / each_real_totals
> c1_to_c2_ratio
[1] 0.9681378 0.9596572 0.9757460 0.9622872
> c3_to_c4_ratio <- list_real_totals / collect_real_totals
> c3_to_c4_ratio
[1] 1.067211 1.081417 1.074895 1.078743
Since c4 (aka "collect") is faster take the inverse ratio...
> c4_to_c3_ratio <- collect_real_totals / list_real_totals
> c4_to_c3_ratio
[1] 0.9370219 0.9247124 0.9303232 0.9270045
On to standard deviations....
> stdev.for <- sd(for_real_totals)
> stdev.for
[1] 0.0398919
> stdev.each <- sd(each_real_totals)
> stdev.each
[1] 0.01658215
> stdev.list <- sd(list_real_totals)
> stdev.list
[1] 0.03110267
> stdev.collect <- sd(collect_real_totals)
> stdev.collect
[1] 0.01347952
> avg.for <- mean(for_real_totals)
> avg.for
[1] 5.941517
> avg.each <- mean(each_real_totals)
> avg.each
[1] 6.147768
> avg.list <- mean(list_real_totals)
> avg.list
[1] 6.889517
> avg.collect <- mean(collect_real_totals)
> avg.collect
[1] 6.405517
> avg.c1_to_c2_ratio <- avg.for / avg.each
> avg.c1_to_c2_ratio
[1] 0.966451
> avg.c3_to_c4_ratio <- avg.list / avg.collect
> avg.c3_to_c4_ratio
[1] 1.075560
> avg.c4_to_c3_ratio <- avg.collect / avg.list
> avg.c4_to_c3_ratio
[1] 0.9297484
See R analysis code here.
Interestingly the standard deviations for the for and list (<<) cases were more than double the standard deviation of their Rubish counterparts (each and collect respectively). Though I am not 100% sure what this means if anything yet.
Ruby 1.9 (trunk) Results
Ok, now here are the results for running the benchmark script against Ruby 1.9 (trunk) on the same machine, again four times: user system total real
for 4.130000 0.070000 4.200000 ( 4.227468)
each 3.310000 0.000000 3.310000 ( 3.314974)
user system total real
<< 4.350000 0.030000 4.380000 ( 4.384898)
collect 3.570000 0.100000 3.670000 ( 3.701988)
user system total real
for 4.150000 0.060000 4.210000 ( 4.210407)
each 3.350000 0.010000 3.360000 ( 3.363656)
user system total real
<< 4.340000 0.020000 4.360000 ( 4.361846)
collect 3.560000 0.110000 3.670000 ( 3.668885)
user system total real
for 4.110000 0.060000 4.170000 ( 4.163847)
each 3.320000 0.000000 3.320000 ( 3.321745)
user system total real
<< 4.340000 0.020000 4.360000 ( 4.355446)
collect 3.550000 0.110000 3.660000 ( 3.668928)
user system total real
for 4.160000 0.070000 4.230000 ( 4.224626)
each 3.370000 0.000000 3.370000 ( 3.379422)
user system total real
<< 4.390000 0.020000 4.410000 ( 4.401145)
collect 3.570000 0.110000 3.680000 ( 3.676394)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.9.
Below is my simple data analysis in R shell session:
> real_totals.for <- c(4.227468, 4.210407, 4.163847, 4.224626)
> real_totals.each <- c(3.314974, 3.363656, 3.321745, 3.379422)
> real_totals.list <- c(4.384898, 4.361846, 4.355446, 4.401145)
> real_totals.collect <- c(3.701988, 3.668885, 3.668928, 3.676394)
> ratios.for_to_each <- real_totals.for / real_totals.each
> ratios.each_to_for <- real_totals.each / real_totals.for
> ratios.list_to_collect <- real_totals.list / real_totals.collect
> ratios.collect_to_list <- real_totals.collect / real_totals.list
> ratios.for_to_each
[1] 1.275264 1.251735 1.253512 1.250103
> ratios.each_to_for
[1] 0.7841512 0.7988909 0.7977587 0.7999340
> ratios.list_to_collect
[1] 1.184471 1.188875 1.187117 1.197136
> ratios.collect_to_list
[1] 0.8442586 0.8411313 0.8423771 0.8353267
> stdev.for <- sd(real_totals.for)
> stdev.each <- sd(real_totals.each)
> stdev.list <- sd(real_totals.list)
> stdev.collect <- sd(real_totals.collect)
> stdev.for
[1] 0.02945461
> stdev.each
[1] 0.03149215
> stdev.list
[1] 0.02108821
> stdev.collect
[1] 0.01569489
> avg.for <- mean(real_totals.for)
> avg.each <- mean(real_totals.each)
> avg.list <- mean(real_totals.list)
> avg.collect <- mean(real_totals.collect)
> avg.for
[1] 4.206587
> avg.each
[1] 3.344949
> avg.list
[1] 4.375834
> avg.collect
[1] 3.679049
> avg.for / avg.each
[1] 1.257594
> avg.each / avg.for
[1] 0.7951694
> avg.list / avg.collect
[1] 1.189393
> avg.collect / avg.list
[1] 0.8407652
See R analysis code here.
Of course, we might also be able to say something about the relative speed of Ruby 1.9 to Ruby 1.8.6 as well (since they were run on the same host, with same script and no noticeable external run-time load differences). However, I will leave that as an exercise to the reader.
In the meantime, let me emphasize this point. DO NOT MAKE MICRO CODE OPTIMIZATIONS WITHOUT JUSTIFICATION! It is usually fruitless and you may well be making more work down the line for yourself and creating a less maintainable system at the same time.
I am not advocating changing to the each coding style if you honestly find it less readable than for item in list, but there should be no excuse for not using collect since (a) it is the Ruby way; (b) less error prone; and© more readable!