Today we’re going to discuss Processes, Threads, Fibers, and concurrency versus parallelism and how all this is applied in Ruby. What is the GIL, and what is the new feature called Ractors?
Ruby Processes vs. Threads vs. Fibers
Process:
- Process is the execution of a program. All processes start with one Thread, called the main Thread.
- Process is an isolated executive object that does not exchange data and information.
- The main problem with processes is that they require a lot of system calls, whether we want to create a process, terminate a process, manage it, etc.
- Proper synchronization between processes is not required.
Thread:
- Thread is an entity within a process or a unit of execution. So, in other words, threads are lightweight processes within a process.
- Threads require fewer system calls.
- We can easily communicate between threads because they share the same memory.
- Threads have a problem called “Thread Safety.” Since we are sharing data between threads, we can break that data.
- Threads must be synchronized to avoid unexpected behavior.
Fiber:
- Fiber is also a lightweight primitive that can be manually paused, resumed, and scheduled.
- Fiber is similar to a thread, with the difference that it has more control than a thread.
- Fiber consumes much fewer resources than Thread.
- While Thread is running in the background, Fiber becomes the main program before you interrupt it.
Creating Threads in Ruby
You must use the Thread class in Ruby to create a new thread.
# creating the first thread
thread_1 = Thread.new { p 'Thread 1' }
# creating the second thread
thread_2 = Thread.new { p 'Thread 2' }
If you want to wait for these threads to complete before continuing with the rest of your program, you need to call the join method on the newly created threads.
[thread_1, thread_2].join(&:join)
p 'I will only execute after thread_1 and thread_2 complete'
But if you don’t join those threads, they will still run unless you terminate the program, in which case all threads associated with that program will also be killed.
It is also worth noting that there is no guarantee that thread_1 will be executed first and thread_2 second. The execution sequence can be any.
Ruby Thread synchronization using Mutex
In concurrency, when you have two or more threads reading and writing to the same variable, usually at the end of execution, the value stored in the variable is not what it should have been.
The mutex can lock this resource when it is accessed by one thread and unlock it only after it has been completed.
Example 1 (without Mutex):
array = [0, 0, 0]
threads = 2.times.map do # creating 2 threads.
Thread.new do
100.times do # each thread must increase the value of each element by 100 times.
array.map! { |counter| counter + 1 }
end
end
end
threads.each(&:join)
p array # => We expect to see [200, 200, 200]. But no! The actual value might be [197, 199, 200] or similar.
Example 2 (with Mutex):
array = [0, 0, 0]
threads = 2.times.map do
Thread.new do
100.times do
# now we synchronize the threads which means the array variable is locked and can only be accessed by one Thread at a time
mutex.synchronize do
array.map! { |counter| counter + 1 }
end
end
end
end
threads.each(&:join)
p array # => Now we see [200, 200, 200].
Creating Process in Ruby
pid = fork do
p 'Hello, Im a procces'
end
Process.waitall
Unlike Thread, the Process will not be killed if we terminate the program and kill the main process.
Creating Fiber in Ruby
- Unlike Thread or Process, Fiber does not start automatically. To do this, you need to explicitly run it using Fiber#resume.
f = Fiber.new do
puts 'Fiber 1'
Fiber.yield # this line stops a running fiber and pass control back to a main program.
puts 'Fiber 2'
end
f.resume # => we run Fiber so that it prints 'Fiber 1'
f.resume # => we run Fiber a second time so that it prints 'Fiber 2', the code after Fiber.yield
f.resume # => if we run fiber a third time it will raise “FiberError: dead fiber called”
You need Fiber Scheduler implementation to run different operations in parallel, but Ruby doesn’t provide a default scheduler.
However, the fiber_scheduler gem is an excellent solution to help you deal with it.
require "fiber_scheduler"
require "open-uri"
FiberScheduler do # once we set the scheduler, Fiber becomes non-blocking
Fiber.schedule do
# This HTTP request takes 2 seconds
URI.open("https://httpbin.org/delay/2")
end
Fiber.schedule do
# This HTTP request takes 2 seconds
URI.open("https://httpbin.org/delay/2")
end
end
This example runs two HTTP requests in parallel.
Works only with Ruby 3.0.
Ruby Concurrency vs. Parallelization
Concurrency:
- only one task can be active at the same time. So, for example, if we want to start task 1 and task 2, task 2 will be activated only after task 1 is completed.
- Concurrency can speed up your code even with the same number of CPUs.
- MRI (the main implementation of Ruby) only offers Concurrency due to the global interpreter lock (GIL).
Parallelization:
- Many tasks can be active simultaneously, depending on your core and the number of CPUs.
- Requires more CPU or CPU cores.
- You can break up CPU-heavy tasks into smaller tasks to complete them faster.
- JRuby (an alternative implementation of Ruby) offers Parallelization.
- Version 3.0 of Ruby includes parallel execution via Ractor, a concurrency abstraction that allows parallel execution without thread safety issues.
GIL (GVL)
- GIL stands for Global Interpreter Lock or Global Virtual Machine Lock (GVL).
- Limits Ruby’s MRI by allowing one thread to execute at any time.
- Unlike GO or Java, which are genuinely multi-threaded, Ruby can only run one thread when multiple cores are available.
If you want to dig deeper into GIL (GVL), I recommend reading this article.
The GIL is the reason Ruby offers Concurrency and not Parallelization.
Ractors
- Ractors provide true parallelism.
- They don’t share everything as threads do.
- An interpreter or a locking mechanism protects shared objects.
- Ractors is still experimental.
In 2020, when Ruby 3.0.0 was released, Matz said the following:
It’s a multi-core age today. Concurrency is very important. With Ractor and Async Fiber, Ruby will be an actual concurrent language.
Ruby Ractor example:
ractor = Ractor.new do
p 'Im a ractor'
num = Ractor.receive # we receive a message that was sent to ractor
Ractor.yield(num**num) # we send a message from the ractor to the outside
end
ractor.send(2) # we send a message to ractor
# => #<Ractor:#10 (irb):67 blocking>
ractor.take # we receive a message from ractor
# => 4
Ractors can not access objects defined outside their scope.
msg = 'Ractor'
ractor = Ractor.new do
p msg # => raises <internal:ractor>:....(ArgumentError)
end
But you still can pass arguments.
msg = 'Ractor'
ractor = Ractor.new msg do |msg|
p msg # => prints 'Ractor'
end
Ractors shareable and nonshareable objects.
- Messages sent by Ractors can be shareable or unshareable.
- When shareable, the same object is transferred.
- When not shareable, Ractors send a complete copy instead so that the original object is not modified.
msg = 'Ractor'
Ractor.shareable?(msg) # => false
Ractor.shareable?(msg.freeze) # => true
arr = [21]
arr.frozen? # => false
Ractor.make_shareable(arr) # => [21]
arr.frozen? # => true
arr = [1, 2, 3]
ractor = Ractor.new arr do |arr|
arr.map!(&:to_s) # => the original object is not modified since we are working with a full copy
end
Check out the ractor documentation.
Summary
To sum up, what we have discussed, we see that Ruby provides various concurrency mechanisms.
- Processes
- Threads
- Fibers
- Ractors (>= Ruby 3.0)
What to choose depends only on what problems you are trying to solve.