Re: Sun's benchmarking ways to pass options to a method in Ruby

written by supo on February 2nd, 2008 @ 05:39 PM

This is a response to Bryan Donovan’s blog post called ways to pass options.

Bryan’s original benchmark was run on Ruby 1.8.4. I will be running the benchmarks on Ruby 1.8.6 and Ruby 1.9 to demonstrate performance reversal of code level optimizations between Ruby versions (as I did in Ruby performance reversal benchmark 1.8 to 1.9).

I will use Bryan’s Ruby benchmark code to run for both Ruby 1.8.6 and Ruby 1.9.0 (official).

Ruby 1.8.6 Results

Below are the results of running the options passing benchmark in Ruby 1.8.6:

With no option values passed
Rehearsal -------------------------------------------
delete:   0.290000   0.050000   0.340000 (  0.428564)
merge:    0.700000   0.190000   0.890000 (  1.017415)
or_nil:   0.370000   0.050000   0.420000 (  0.509209)
---------------------------------- total: 1.650000sec

              user     system      total        real
delete:   0.300000   0.050000   0.350000 (  0.428401)
merge:    0.760000   0.130000   0.890000 (  1.010474)
or_nil:   0.360000   0.050000   0.410000 (  0.490170)

With some option values passed
Rehearsal -------------------------------------------
delete:   0.360000   0.060000   0.420000 (  0.505094)
merge:    0.890000   0.130000   1.020000 (  1.150165)
or_nil:   0.390000   0.070000   0.460000 (  0.563713)
---------------------------------- total: 1.900000sec

              user     system      total        real
delete:   0.340000   0.060000   0.400000 (  0.489191)
merge:    0.920000   0.110000   1.030000 (  1.149708)
or_nil:   0.350000   0.090000   0.440000 (  0.528287)

With all option values passed
Rehearsal -------------------------------------------
delete:   0.420000   0.050000   0.470000 (  0.547446)
merge:    0.970000   0.130000   1.100000 (  1.228093)
or_nil:   0.420000   0.050000   0.470000 (  0.574593)
---------------------------------- total: 2.040000sec

              user     system      total        real
delete:   0.400000   0.050000   0.450000 (  0.485888)
merge:    0.960000   0.140000   1.100000 (  1.237509)
or_nil:   0.400000   0.060000   0.460000 (  0.536063)
Results of benchmark for Ruby 1.8.6.

Ruby 1.9.0 Results

Below are the results of running the options passing benchmark in Ruby 1.9.0:

With no option values passed
Rehearsal -------------------------------------------
delete:   0.150000   0.000000   0.150000 (  0.230391)
merge:    0.540000   0.010000   0.550000 (  0.645735)
or_nil:   0.120000   0.000000   0.120000 (  0.188468)
---------------------------------- total: 0.820000sec

              user     system      total        real
delete:   0.160000   0.000000   0.160000 (  0.238828)
merge:    0.520000   0.000000   0.520000 (  0.610940)
or_nil:   0.120000   0.000000   0.120000 (  0.180744)

With some option values passed
Rehearsal -------------------------------------------
delete:   0.220000   0.000000   0.220000 (  0.294110)
merge:    0.690000   0.010000   0.700000 (  0.803382)
or_nil:   0.220000   0.000000   0.220000 (  0.297869)
---------------------------------- total: 1.140000sec

              user     system      total        real
delete:   0.210000   0.000000   0.210000 (  0.283983)
merge:    0.700000   0.000000   0.700000 (  0.802766)
or_nil:   0.230000   0.000000   0.230000 (  0.295433)

With all option values passed
Rehearsal -------------------------------------------
delete:   0.270000   0.010000   0.280000 (  0.348052)
merge:    0.770000   0.040000   0.810000 (  0.947252)
or_nil:   0.240000   0.010000   0.250000 (  0.336790)
---------------------------------- total: 1.340000sec

              user     system      total        real
delete:   0.280000   0.000000   0.280000 (  0.660810)
merge:    0.750000   0.000000   0.750000 (  1.618925)
or_nil:   0.240000   0.000000   0.240000 (  0.478159)
Results of benchmark for Ruby 1.9.0.

In Ruby 1.8.6 the delete code is consistently more efficient, whereas in Ruby 1.9.0 the or_nil code is mostly more efficient, except for the With some option values passed scenario.

This once again shows that when optimizing performance on the code level, you need to be careful that you justify your micro-optimizations before you create code spaghetti just “because….”. Readability and maintainability is most important and when needed, only then should you optimize.

Reversal of code performance optimizations from Ruby 1.8 to 1.9

written by supo on October 31st, 2007 @ 12:19 AM

As some of you already know I am keenly playing with custom Ruby configurations from disabling pthreads in 1.8.6 Linux builds to numerous build variations of Ruby 1.9 (trunk).

Last night I noticed something quite interesting, but before we get to that point, let me tell you a story.
There was once a fastidious Ruby developer who payed the closest attention to detail. Code optimization of all kinds were his obsession. So much so that he took great pains to “optimize code”. The results, in my opinion were not very Rubyish and often much less readable than the original code. One day last year, while running on Ruby 1.8.x, he found after running a script that using the for item in list...end construct performed slightly better than the very Rubyish (and IMHO more pleasant) list.each { |item| ... }. On that day he made my life a misery by insisting that I replace all my .each code blocks to for .... After an age of whinging (yes, that is a word – from Middle English, which means to excessively whine), I completed this useless task. The next day he somehow determined that .collect blocks were also inefficient and then insisted on another round of micro optimization that produce very little real world performance benefit if any.

Now fast forward a year. While playing with my recent Ruby 1.9 build on a CentOS server, I noticed the .each enumerable iteration construct performs faster than the for item in list...end construct.

Executive Summary

For those that don’t have data analysis stamina here are the basic results for Ruby 1.8.6 (with pthreads on CentOS) to two decimal places:
  • The for i in list...end case took 96.65% of the time to run as did the each case. Not usually worth the code optimization for 99.99% of the time.
  • The collect case took 92.97% of the time to run as did the list (aka <<) case. So use the more Ruby-ish constructs to improve readability and reduce chance of human error in coding style.
For those that don’t have the data analysis stamina here are the basic results for Ruby 1.9.0 (trunk) (with pthreads on CentOS):
  • The for i in list...end case took 125.76% of the time to run as did the each case. Or each case took 79.51% of the time to run as did the for i in list...end case. Well, well. Who sucks now!?
  • The collect case took 84.08% of the time to run as did the list (aka <<) case.

Benchmark Script

Now let’s have a look at my benchmark script for completeness (also just in case any coding errors got in to make the benchmark unfair):

require 'benchmark'

hash = {}
1_000_000.times {|i| hash["name#{i}"] = "value#{i}" }

def concat(value)
  "value is #{value}" 
end

Benchmark.bm do |b|
  b.report("for ") { for key in hash.keys; concat(hash[key]); end; }
  b.report("each") { hash.keys.each {|key| concat(hash[key]) } }
end

Benchmark.bm do |b|
  b.report("<<     ") do
    list = []
    for key in hash.keys
      list << concat(hash[key])
    end
    list
  end

  b.report("collect") do
    hash.keys.collect { |key| concat(hash[key]) }
  end
end
See Array#<< vs. Array#collect benchmark in Ruby 1.8.6 here. Definitely not the most interesting benchmark script in the world, but it should demonstrate base performance (without too many other variables factoring into performance).

Ruby 1.8.6 Results

First I ran this benchmark script using a Ruby 1.8.6 build I had in /usr. This build used the default ./configure options which in Ruby 1.8.x means pthreads is enabled:
First run...
      user     system      total        real
for   4.560000   1.400000   5.960000 (  5.962244)
each  4.800000   1.360000   6.160000 (  6.158466)
      user     system      total        real
<<       5.410000   1.450000   6.860000 (  6.855109)
collect  4.980000   1.430000   6.410000 (  6.423387)

Second run...
      user     system      total        real
for   4.570000   1.350000   5.920000 (  5.916540)
each  4.800000   1.370000   6.170000 (  6.165264)
      user     system      total        real
<<       5.550000   1.360000   6.910000 (  6.918924)
collect  4.970000   1.440000   6.410000 (  6.398015)

Third run...
      user     system      total        real
for   4.500000   1.480000   5.980000 (  5.986721)
each  4.810000   1.320000   6.130000 (  6.135532)
      user     system      total        real
<<       5.460000   1.420000   6.880000 (  6.871511)
collect  4.980000   1.420000   6.400000 (  6.392726)

Fourth run...
      user     system      total        real
for   4.490000   1.410000   5.900000 (  5.900563)
each  4.730000   1.400000   6.130000 (  6.131811)
      user     system      total        real
<<       5.560000   1.360000   6.920000 (  6.912522)
collect  4.970000   1.440000   6.410000 (  6.407939)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.8.6. Well where to I start? First of all the differences in the results above running on Ruby v1.8.6 (with pthreads enabled) are quite small anyway. Let us simply analyze the small amount of data I collected to put things in perspective:

In the R shell I did something like...
> for_real_totals <- c(5.962244, 5.916540, 5.986721, 5.900563)
> each_real_totals <- c(6.158466, 6.165264, 6.135532, 6.131811)
> list_real_totals <- c(6.855109, 6.918924, 6.871511, 6.912522)
> collect_real_totals <- c(6.423387, 6.398015, 6.392726, 6.407939)

On to ratios...
> c1_to_c2_ratio <- for_real_totals / each_real_totals
> c1_to_c2_ratio
[1] 0.9681378 0.9596572 0.9757460 0.9622872

> c3_to_c4_ratio <- list_real_totals / collect_real_totals
> c3_to_c4_ratio
[1] 1.067211 1.081417 1.074895 1.078743

Since c4 (aka "collect") is faster take the inverse ratio...
> c4_to_c3_ratio <- collect_real_totals / list_real_totals
> c4_to_c3_ratio
[1] 0.9370219 0.9247124 0.9303232 0.9270045

On to standard deviations....
> stdev.for <- sd(for_real_totals)
> stdev.for
[1] 0.0398919

> stdev.each <- sd(each_real_totals)
> stdev.each
[1] 0.01658215

> stdev.list <- sd(list_real_totals)
> stdev.list
[1] 0.03110267

> stdev.collect <- sd(collect_real_totals)
> stdev.collect
[1] 0.01347952

> avg.for <- mean(for_real_totals)
> avg.for
[1] 5.941517

> avg.each <- mean(each_real_totals)
> avg.each
[1] 6.147768

> avg.list <- mean(list_real_totals)
> avg.list
[1] 6.889517

> avg.collect <- mean(collect_real_totals)
> avg.collect
[1] 6.405517

> avg.c1_to_c2_ratio <- avg.for / avg.each
> avg.c1_to_c2_ratio
[1] 0.966451

> avg.c3_to_c4_ratio <- avg.list / avg.collect
> avg.c3_to_c4_ratio
[1] 1.075560

> avg.c4_to_c3_ratio <- avg.collect / avg.list
> avg.c4_to_c3_ratio
[1] 0.9297484
See R analysis code here. Interestingly the standard deviations for the for and list (<<) cases were more than double the standard deviation of their Rubish counterparts (each and collect respectively). Though I am not 100% sure what this means if anything yet.

Ruby 1.9 (trunk) Results

Ok, now here are the results for running the benchmark script against Ruby 1.9 (trunk) on the same machine, again four times:
      user     system      total        real
for   4.130000   0.070000   4.200000 (  4.227468)
each  3.310000   0.000000   3.310000 (  3.314974)
      user     system      total        real
<<       4.350000   0.030000   4.380000 (  4.384898)
collect  3.570000   0.100000   3.670000 (  3.701988)

      user     system      total        real
for   4.150000   0.060000   4.210000 (  4.210407)
each  3.350000   0.010000   3.360000 (  3.363656)
      user     system      total        real
<<       4.340000   0.020000   4.360000 (  4.361846)
collect  3.560000   0.110000   3.670000 (  3.668885)

      user     system      total        real
for   4.110000   0.060000   4.170000 (  4.163847)
each  3.320000   0.000000   3.320000 (  3.321745)
      user     system      total        real
<<       4.340000   0.020000   4.360000 (  4.355446)
collect  3.550000   0.110000   3.660000 (  3.668928)

      user     system      total        real
for   4.160000   0.070000   4.230000 (  4.224626)
each  3.370000   0.000000   3.370000 (  3.379422)
      user     system      total        real
<<       4.390000   0.020000   4.410000 (  4.401145)
collect  3.570000   0.110000   3.680000 (  3.676394)
Results of Array#<< vs. Array#collect benchmark for Ruby 1.9. Below is my simple data analysis in R shell session:

> real_totals.for <- c(4.227468, 4.210407, 4.163847, 4.224626)
> real_totals.each <- c(3.314974, 3.363656, 3.321745, 3.379422)
> real_totals.list <- c(4.384898, 4.361846, 4.355446, 4.401145)
> real_totals.collect <- c(3.701988, 3.668885, 3.668928, 3.676394)
> ratios.for_to_each <- real_totals.for / real_totals.each
> ratios.each_to_for <- real_totals.each / real_totals.for
> ratios.list_to_collect <- real_totals.list / real_totals.collect
> ratios.collect_to_list <- real_totals.collect / real_totals.list
> ratios.for_to_each
[1] 1.275264 1.251735 1.253512 1.250103

> ratios.each_to_for
[1] 0.7841512 0.7988909 0.7977587 0.7999340

> ratios.list_to_collect
[1] 1.184471 1.188875 1.187117 1.197136

> ratios.collect_to_list
[1] 0.8442586 0.8411313 0.8423771 0.8353267

> stdev.for <- sd(real_totals.for)
> stdev.each <- sd(real_totals.each)
> stdev.list <- sd(real_totals.list)
> stdev.collect <- sd(real_totals.collect)
> stdev.for
[1] 0.02945461

> stdev.each
[1] 0.03149215

> stdev.list
[1] 0.02108821

> stdev.collect
[1] 0.01569489

> avg.for <- mean(real_totals.for)
> avg.each <- mean(real_totals.each)
> avg.list <- mean(real_totals.list)
> avg.collect <- mean(real_totals.collect)
> avg.for
[1] 4.206587

> avg.each
[1] 3.344949

> avg.list
[1] 4.375834

> avg.collect
[1] 3.679049

> avg.for / avg.each
[1] 1.257594

> avg.each / avg.for
[1] 0.7951694

> avg.list / avg.collect
[1] 1.189393

> avg.collect / avg.list
[1] 0.8407652
See R analysis code here. Of course, we might also be able to say something about the relative speed of Ruby 1.9 to Ruby 1.8.6 as well (since they were run on the same host, with same script and no noticeable external run-time load differences). However, I will leave that as an exercise to the reader.

In the meantime, let me emphasize this point. DO NOT MAKE MICRO CODE OPTIMIZATIONS WITHOUT JUSTIFICATION! It is usually fruitless and you may well be making more work down the line for yourself and creating a less maintainable system at the same time.

I am not advocating changing to the each coding style if you honestly find it less readable than for item in list, but there should be no excuse for not using collect since (a) it is the Ruby way; (b) less error prone; and© more readable!

How do we scale?

written by supo on October 27th, 2007 @ 03:39 PM

This is perhaps the most frequently asked question we get at Managed Open Source (MOS). To answer this we always find it necessary to push back on our clients and ask, “how to scale what?”. This is when most of our clients go silent and realize they haven’t put much thought into what questions they really should be asking about their current environment.

The problem with most organizations is that they do not consider exactly what it is they want or need to scale. For example, we recently had a startup client that used the Ruby on Rails framework to build their application and wanted us to “scale Rails”. Scaling open source application stacks is what we do best, but in reality we do not actually “scale Rails”. We scale the application, which has specific needs and intricacies. Ruby and Rails are only two parts of the whole stack. Not to mention there is more to scaling than simply software stack. If you are trying to reach an Asian audience, but have data centers in Europe, that wouldn’t be the optimal way to scale from a data delivery perspective.

Sometimes application optimizations only need to deliver more consistent response times or more correct responses, rather than delivering higher throughput or lightning speed. For example, if your current application configuration can fulfill up to 5,000 requests per second, but only 93% of them get fulfilled without errors, that may not be an acceptable level of response quality. Instead the application may be better off providing throughput of up to 4,000 requests per second but 99.8% responses are successful.

The moral of these examples is when asking “how should we scale?” take a few steps back and ask yourself what is unique to your specific application. What we often find is that there are a few options to tweak at each level of the application stack (including hardware and data center levels) to get to the point where your application can scale better yet not restrict the configuration too much you end up painting your way into a corner.

Think about what “scaling” means for your application and your business needs rather than just in terms of performance and/or throughput.

Managed Open Source’s next set of posts will be on scalability vs. performance in more detail since so many people in IT equate the two.