Efficiency of Arrays vs Ranges in Ruby

Issue

While working on something recently, I started to think about the efficiency of Arrays and Ranges in Ruby. I started to try and research this but could find very little information on it or even how I could test this myself.

So I came across some code that checks what range a HTTP status code is in, and it’s written something like this

SUCCESS = (200...300)
REDIRECTION = (300...400)

if SUCCESS.include?(status_code)
  status = 'success'
elsif REDIRECTION.include?(status_code)
  status = 'redirection'
end

So this got me thinking that it seems wasteful to use 200…300 when we essentially only need 200…207, but would there a big efficiency difference in this, if any at all?

Also what about the 4XX codes, as it is not always a straight run of the range, it got me thinking that maybe I should turn it into an array, so I could write it one of two ways

As a straight range
CLIENT_ERROR = (400...429)

or as an array
CLIENT_ERROR = [*(400...419), 422, 429]

I’m assuming the first option is a better approach and more efficient but just not too sure how to validate my thoughts, so any advice or input on this would be greatly appreciated

Solution

TL;DR

Ranges are generally faster and more memory-efficient than reifying Arrays. However, specific use cases may vary.

If in doubt, benchmark. You can use irb’s relatively new measure command, or use the Benchmark module to compare and contrast different approaches. In general, reifying a Range as an Array takes more memory and is slower than comparing against a Range (or even a small Array of Range objects), but unless you loop over this code a lot this seems like a premature optimization.

Benchmarks

Using Ruby 3.1.0, the Range approach is around 3,655.77% faster on my system. For example:

require 'benchmark'

n = 100_000

Benchmark.bmbm do
  _1.report("Range") do
    n.times do
      client_error = [200..299, 400..499]
      client_error.include? 404
    end
  end

  _1.report("Array") do
    n.times do
      client_error = [*(200..299), *(400..499)]                                
      client_error.include? 404
    end
  end
end
Rehearsal -----------------------------------------
Range   0.022570   0.000107   0.022677 (  0.022832)
Array   0.707742   0.041499   0.749241 (  0.750012)
-------------------------------- total: 0.771918sec

            user     system      total        real
Range   0.020184   0.000043   0.020227 (  0.020245)
Array   0.701911   0.037541   0.739452 (  0.740037)

While the overall total times are better with Jruby and TruffleRuby, the performance differences between the approaches are only about 3-7x faster with Ranges. Meanwhile, Ruby 3.0.1 shows an approximate 37x speed improvement using a non-reified Range rather than an Array, so the Range approach is the clear winner here either way.

Your specific values will vary based on system specs, system load, and Ruby version and engine. For smaller values of n, I can’t imagine it will make any practical difference, but you should definitely benchmark against your own systems to determine if the juice is worth the squeeze.

Answered By – Todd A. Jacobs

Answer Checked By – Pedro (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.