Main page background image

Ruby Concurrency, Parallelism, and GIL


Ihor T.
RoR Developer

Today we’re going to discuss Processes, Threads, Fibers, and concurrency versus parallelism and how all this is applied in Ruby. What is the GIL, and what is the new feature called Ractors?

Ruby Processes vs. Threads vs. Fibers

Process:

  • Process is the execution of a program. All processes start with one Thread, called the main Thread.
  • Process is an isolated executive object that does not exchange data and information.
  • The main problem with processes is that they require a lot of system calls, whether we want to create a process, terminate a process, manage it, etc.
  • Proper synchronization between processes is not required.

Thread:

  • Thread is an entity within a process or a unit of execution. So, in other words, threads are lightweight processes within a process.
  • Threads require fewer system calls.
  • We can easily communicate between threads because they share the same memory.
  • Threads have a problem called “Thread Safety.” Since we are sharing data between threads, we can break that data.
  • Threads must be synchronized to avoid unexpected behavior.

Fiber:

  • Fiber is also a lightweight primitive that can be manually paused, resumed, and scheduled.
  • Fiber is similar to a thread, with the difference that it has more control than a thread.
  • Fiber consumes much fewer resources than Thread.
  • While Thread is running in the background, Fiber becomes the main program before you interrupt it.

Creating Threads in Ruby

You must use the Thread class in Ruby to create a new thread.

# creating the first thread
thread_1 = Thread.new { p 'Thread 1' }

# creating the second thread
thread_2 = Thread.new { p 'Thread 2' }

If you want to wait for these threads to complete before continuing with the rest of your program, you need to call the join method on the newly created threads.

[thread_1, thread_2].join(&:join)

p 'I will only execute after thread_1 and thread_2 complete'

But if you don’t join those threads, they will still run unless you terminate the program, in which case all threads associated with that program will also be killed.

It is also worth noting that there is no guarantee that thread_1 will be executed first and thread_2 second. The execution sequence can be any.

Ruby Thread synchronization using Mutex

In concurrency, when you have two or more threads reading and writing to the same variable, usually at the end of execution, the value stored in the variable is not what it should have been.

The mutex can lock this resource when it is accessed by one thread and unlock it only after it has been completed.

Example 1 (without Mutex):

array = [0, 0, 0]

threads = 2.times.map do # creating 2 threads.
  Thread.new do
    100.times do # each thread must increase the value of each element by 100 times.
      array.map! { |counter| counter + 1 }
    end
  end
end
threads.each(&:join)

p array # => We expect to see [200, 200, 200]. But no! The actual value might be [197, 199, 200] or similar.

Example 2 (with Mutex):

array = [0, 0, 0]

threads = 2.times.map do
  Thread.new do
    100.times do
      # now we synchronize the threads which means the array variable is locked and can only be accessed by one Thread at a time
      mutex.synchronize do
        array.map! { |counter| counter + 1 }
      end
    end
  end
end
threads.each(&:join)

p array # => Now we see [200, 200, 200].

Creating Process in Ruby

pid = fork do
  p 'Hello, Im a procces'
end

Process.waitall

Unlike Thread, the Process will not be killed if we terminate the program and kill the main process.

Creating Fiber in Ruby

  • Unlike Thread or Process, Fiber does not start automatically. To do this, you need to explicitly run it using Fiber#resume.
f = Fiber.new do
  puts 'Fiber 1'
  Fiber.yield # this line stops a running fiber and pass control back to a main program.
  puts 'Fiber 2'
end

f.resume # => we run Fiber so that it prints 'Fiber 1'
f.resume # => we run Fiber a second time so that it prints 'Fiber 2', the code after Fiber.yield
f.resume # => if we run fiber a third time it will raise “FiberError: dead fiber called”

You need Fiber Scheduler implementation to run different operations in parallel, but Ruby doesn’t provide a default scheduler.

However, the fiber_scheduler gem is an excellent solution to help you deal with it.

require "fiber_scheduler"
require "open-uri"

FiberScheduler do # once we set the scheduler, Fiber becomes non-blocking
  Fiber.schedule do
    # This HTTP request takes 2 seconds
    URI.open("https://httpbin.org/delay/2")
  end

  Fiber.schedule do
    # This HTTP request takes 2 seconds
    URI.open("https://httpbin.org/delay/2")
  end
end

This example runs two HTTP requests in parallel.

Works only with Ruby 3.0.

Ruby Concurrency vs. Parallelization

Concurrency:

  • only one task can be active at the same time. So, for example, if we want to start task 1 and task 2, task 2 will be activated only after task 1 is completed.
  • Concurrency can speed up your code even with the same number of CPUs.
  • MRI (the main implementation of Ruby) only offers Concurrency due to the global interpreter lock (GIL).

Parallelization:

  • Many tasks can be active simultaneously, depending on your core and the number of CPUs.
  • Requires more CPU or CPU cores.
  • You can break up CPU-heavy tasks into smaller tasks to complete them faster.
  • JRuby (an alternative implementation of Ruby) offers Parallelization.
  • Version 3.0 of Ruby includes parallel execution via Ractor, a concurrency abstraction that allows parallel execution without thread safety issues.

GIL (GVL)

  • GIL stands for Global Interpreter Lock or Global Virtual Machine Lock (GVL).
  • Limits Ruby’s MRI by allowing one thread to execute at any time.
  • Unlike GO or Java, which are genuinely multi-threaded, Ruby can only run one thread when multiple cores are available.

If you want to dig deeper into GIL (GVL), I recommend reading this article.

The GIL is the reason Ruby offers Concurrency and not Parallelization.

Ractors

  • Ractors provide true parallelism.
  • They don’t share everything as threads do.
  • An interpreter or a locking mechanism protects shared objects.
  • Ractors is still experimental.

In 2020, when Ruby 3.0.0 was released, Matz said the following:

It’s a multi-core age today. Concurrency is very important. With Ractor and Async Fiber, Ruby will be an actual concurrent language.

Ruby Ractor example:

ractor = Ractor.new do
  p 'Im a ractor'

  num = Ractor.receive # we receive a message that was sent to ractor
  Ractor.yield(num**num) # we send a message from the ractor to the outside
end

ractor.send(2) #  we send a message to ractor
# => #<Ractor:#10 (irb):67 blocking>
ractor.take # we receive a message from ractor
# => 4

Ractors can not access objects defined outside their scope.

msg = 'Ractor'
ractor = Ractor.new do
  p msg # => raises <internal:ractor>:....(ArgumentError)
end

But you still can pass arguments.

msg = 'Ractor'
ractor = Ractor.new msg do |msg|
  p msg # => prints 'Ractor'
end

Ractors shareable and nonshareable objects.

  • Messages sent by Ractors can be shareable or unshareable.
  • When shareable, the same object is transferred.
  • When not shareable, Ractors send a complete copy instead so that the original object is not modified.
msg = 'Ractor'
Ractor.shareable?(msg) # => false
Ractor.shareable?(msg.freeze) # => true
arr = [21]
arr.frozen? # => false
Ractor.make_shareable(arr) # => [21]
arr.frozen? # => true

arr = [1, 2, 3]
ractor = Ractor.new arr do |arr|
  arr.map!(&:to_s) # => the original object is not modified since we are working with a full copy
end

Check out the ractor documentation.

Summary

To sum up, what we have discussed, we see that Ruby provides various concurrency mechanisms.

  • Processes
  • Threads
  • Fibers
  • Ractors (>= Ruby 3.0)

What to choose depends only on what problems you are trying to solve.