Reversal of code performance optimizations from Ruby 1.8 to 1.9

written by supo on October 31st, 2007 @ 12:19 AM

As some of you already know I am keenly playing with custom Ruby configurations from disabling pthreads in 1.8.6 Linux builds to numerous build variations of Ruby 1.9 (trunk).

Last night I noticed something quite interesting, but before we get to that point, let me tell you a story.
There was once a fastidious Ruby developer who payed the closest attention to detail. Code optimization of all kinds were his obsession. So much so that he took great pains to “optimize code”. The results, in my opinion were not very Rubyish and often much less readable than the original code. One day last year, while running on Ruby 1.8.x, he found after running a script that using the for item in list...end construct performed slightly better than the very Rubyish (and IMHO more pleasant) list.each { |item| ... }. On that day he made my life a misery by insisting that I replace all my .each code blocks to for .... After an age of whinging (yes, that is a word – from Middle English, which means to excessively whine), I completed this useless task. The next day he somehow determined that .collect blocks were also inefficient and then insisted on another round of micro optimization that produce very little real world performance benefit if any.

Now fast forward a year. While playing with my recent Ruby 1.9 build on a CentOS server, I noticed the .each enumerable iteration construct performs faster than the for item in list...end construct.

Executive Summary

For those that don’t have data analysis stamina here are the basic results for Ruby 1.8.6 (with pthreads on CentOS) to two decimal places:
  • The for i in list...end case took 96.65% of the time to run as did the each case. Not usually worth the code optimization for 99.99% of the time.
  • The collect case took 92.97% of the time to run as did the list (aka <<) case. So use the more Ruby-ish constructs to improve readability and reduce chance of human error in coding style.
For those that don’t have the data analysis stamina here are the basic results for Ruby 1.9.0 (trunk) (with pthreads on CentOS):
  • The for i in list...end case took 125.76% of the time to run as did the each case. Or each case took 79.51% of the time to run as did the for i in list...end case. Well, well. Who sucks now!?
  • The collect case took 84.08% of the time to run as did the list (aka <<) case.

Benchmark Script

Now let’s have a look at my benchmark script for completeness (also just in case any coding errors got in to make the benchmark unfair):

require 'benchmark'

hash = {}
1_000_000.times {|i| hash["name#{i}"] = "value#{i}" }

def concat(value)
  "value is #{value}" 
end

Benchmark.bm do |b|
  b.report("for ") { for key in hash.keys; concat(hash[key]); end; }
  b.report("each") { hash.keys.each {|key| concat(hash[key]) } }
end

Benchmark.bm do |b|
  b.report("<<     ") do
    list = []
    for key in hash.keys
      list << concat(hash[key])
    end
    list
  end

  b.report("collect") do
    hash.keys.collect { |key| concat(hash[key]) }
  end
end
See Array#<< vs. Array#collect benchmark in Ruby 1.8.6 here. Definitely not the most interesting benchmark script in the world, but it should demonstrate base performance (without too many other variables factoring into performance).

Ruby 1.8.6 Results

First I ran this benchmark script using a Ruby 1.8.6 build I had in /usr. This build used the default ./configure options which in Ruby 1.8.x means pthreads is enabled:
First run...
      user     system      total        real
for   4.560000   1.400000   5.960000 (  5.962244)
each  4.800000   1.360000   6.160000 (  6.158466)
      user     system      total        real
<<       5.410000   1.450000   6.860000 (  6.855109)
collect  4.980000   1.430000   6.410000 (  6.423387)

Second run...
      user     system      total        real
for   4.570000   1.350000   5.920000 (  5.916540)
each  4.800000   1.370000   6.170000 (  6.165264)
      user     system      total        real
<<       5.550000   1.360000   6.910000 (  6.918924)
collect  4.970000   1.440000   6.410000 (  6.398015)

Third run...
      user     system      total        real
for   4.500000   1.480000   5.980000 (  5.986721)
each  4.810000   1.320000   6.130000 (  6.135532)
      user     system      total        real
<<       5.460000   1.420000   6.880000 (  6.871511)
collect  4.980000   1.420000   6.400000 (  6.392726)

Fourth run...
      user     system      total        real
for   4.490000   1.410000   5.900000 (  5.900563)
each  4.730000   1.400000   6.130000 (  6.131811)
      user     system      total        real
<<       5.560000   1.360000   6.920000 (  6.912522)
collect  4.970000   1.440000   6.410000 (  6.407939)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.8.6. Well where to I start? First of all the differences in the results above running on Ruby v1.8.6 (with pthreads enabled) are quite small anyway. Let us simply analyze the small amount of data I collected to put things in perspective:

In the R shell I did something like...
> for_real_totals <- c(5.962244, 5.916540, 5.986721, 5.900563)
> each_real_totals <- c(6.158466, 6.165264, 6.135532, 6.131811)
> list_real_totals <- c(6.855109, 6.918924, 6.871511, 6.912522)
> collect_real_totals <- c(6.423387, 6.398015, 6.392726, 6.407939)

On to ratios...
> c1_to_c2_ratio <- for_real_totals / each_real_totals
> c1_to_c2_ratio
[1] 0.9681378 0.9596572 0.9757460 0.9622872

> c3_to_c4_ratio <- list_real_totals / collect_real_totals
> c3_to_c4_ratio
[1] 1.067211 1.081417 1.074895 1.078743

Since c4 (aka "collect") is faster take the inverse ratio...
> c4_to_c3_ratio <- collect_real_totals / list_real_totals
> c4_to_c3_ratio
[1] 0.9370219 0.9247124 0.9303232 0.9270045

On to standard deviations....
> stdev.for <- sd(for_real_totals)
> stdev.for
[1] 0.0398919

> stdev.each <- sd(each_real_totals)
> stdev.each
[1] 0.01658215

> stdev.list <- sd(list_real_totals)
> stdev.list
[1] 0.03110267

> stdev.collect <- sd(collect_real_totals)
> stdev.collect
[1] 0.01347952

> avg.for <- mean(for_real_totals)
> avg.for
[1] 5.941517

> avg.each <- mean(each_real_totals)
> avg.each
[1] 6.147768

> avg.list <- mean(list_real_totals)
> avg.list
[1] 6.889517

> avg.collect <- mean(collect_real_totals)
> avg.collect
[1] 6.405517

> avg.c1_to_c2_ratio <- avg.for / avg.each
> avg.c1_to_c2_ratio
[1] 0.966451

> avg.c3_to_c4_ratio <- avg.list / avg.collect
> avg.c3_to_c4_ratio
[1] 1.075560

> avg.c4_to_c3_ratio <- avg.collect / avg.list
> avg.c4_to_c3_ratio
[1] 0.9297484
See R analysis code here. Interestingly the standard deviations for the for and list (<<) cases were more than double the standard deviation of their Rubish counterparts (each and collect respectively). Though I am not 100% sure what this means if anything yet.

Ruby 1.9 (trunk) Results

Ok, now here are the results for running the benchmark script against Ruby 1.9 (trunk) on the same machine, again four times:
      user     system      total        real
for   4.130000   0.070000   4.200000 (  4.227468)
each  3.310000   0.000000   3.310000 (  3.314974)
      user     system      total        real
<<       4.350000   0.030000   4.380000 (  4.384898)
collect  3.570000   0.100000   3.670000 (  3.701988)

      user     system      total        real
for   4.150000   0.060000   4.210000 (  4.210407)
each  3.350000   0.010000   3.360000 (  3.363656)
      user     system      total        real
<<       4.340000   0.020000   4.360000 (  4.361846)
collect  3.560000   0.110000   3.670000 (  3.668885)

      user     system      total        real
for   4.110000   0.060000   4.170000 (  4.163847)
each  3.320000   0.000000   3.320000 (  3.321745)
      user     system      total        real
<<       4.340000   0.020000   4.360000 (  4.355446)
collect  3.550000   0.110000   3.660000 (  3.668928)

      user     system      total        real
for   4.160000   0.070000   4.230000 (  4.224626)
each  3.370000   0.000000   3.370000 (  3.379422)
      user     system      total        real
<<       4.390000   0.020000   4.410000 (  4.401145)
collect  3.570000   0.110000   3.680000 (  3.676394)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.9. Below is my simple data analysis in R shell session:

> real_totals.for <- c(4.227468, 4.210407, 4.163847, 4.224626)
> real_totals.each <- c(3.314974, 3.363656, 3.321745, 3.379422)
> real_totals.list <- c(4.384898, 4.361846, 4.355446, 4.401145)
> real_totals.collect <- c(3.701988, 3.668885, 3.668928, 3.676394)
> ratios.for_to_each <- real_totals.for / real_totals.each
> ratios.each_to_for <- real_totals.each / real_totals.for
> ratios.list_to_collect <- real_totals.list / real_totals.collect
> ratios.collect_to_list <- real_totals.collect / real_totals.list
> ratios.for_to_each
[1] 1.275264 1.251735 1.253512 1.250103

> ratios.each_to_for
[1] 0.7841512 0.7988909 0.7977587 0.7999340

> ratios.list_to_collect
[1] 1.184471 1.188875 1.187117 1.197136

> ratios.collect_to_list
[1] 0.8442586 0.8411313 0.8423771 0.8353267

> stdev.for <- sd(real_totals.for)
> stdev.each <- sd(real_totals.each)
> stdev.list <- sd(real_totals.list)
> stdev.collect <- sd(real_totals.collect)
> stdev.for
[1] 0.02945461

> stdev.each
[1] 0.03149215

> stdev.list
[1] 0.02108821

> stdev.collect
[1] 0.01569489

> avg.for <- mean(real_totals.for)
> avg.each <- mean(real_totals.each)
> avg.list <- mean(real_totals.list)
> avg.collect <- mean(real_totals.collect)
> avg.for
[1] 4.206587

> avg.each
[1] 3.344949

> avg.list
[1] 4.375834

> avg.collect
[1] 3.679049

> avg.for / avg.each
[1] 1.257594

> avg.each / avg.for
[1] 0.7951694

> avg.list / avg.collect
[1] 1.189393

> avg.collect / avg.list
[1] 0.8407652
See R analysis code here. Of course, we might also be able to say something about the relative speed of Ruby 1.9 to Ruby 1.8.6 as well (since they were run on the same host, with same script and no noticeable external run-time load differences). However, I will leave that as an exercise to the reader.

In the meantime, let me emphasize this point. DO NOT MAKE MICRO CODE OPTIMIZATIONS WITHOUT JUSTIFICATION! It is usually fruitless and you may well be making more work down the line for yourself and creating a less maintainable system at the same time.

I am not advocating changing to the each coding style if you honestly find it less readable than for item in list, but there should be no excuse for not using collect since (a) it is the Ruby way; (b) less error prone; and© more readable!

Comments are closed