Why do we need fibers
Unlike closures, which have a defined entry and exit point, fibers can preserve their state and return (yield) many times:
f = Fiber.new do
puts 'some code'
param = Fiber.yield 'return' # sent parameter, received parameter
puts "received param: #{param}"
Fiber.yield #nothing sent, nothing received
puts 'etc'
end
puts f.resume
f.resume 'param'
f.resume
prints this:
some code
return
received param: param
etc
Implementation of this logic with other ruby features will be less readable.
With this feature, good fibers usage is to do manual cooperative scheduling (as Threads replacement). Ilya Grigorik has a good example on how to turn an asynchronous library (eventmachine
in this case) into what looks like a synchronous API without losing the advantages of IO-scheduling of the asynchronous execution. Here is the link.
Fibers are something you will probably never use directly in application-level code. They are a flow-control primitive which you can use to build other abstractions, which you then use in higher-level code.
Probably the #1 use of fibers in Ruby is to implement Enumerator
s, which are a core Ruby class in Ruby 1.9. These are incredibly useful.
In Ruby 1.9, if you call almost any iterator method on the core classes, without passing a block, it will return an Enumerator
.
irb(main):001:0> [1,2,3].reverse_each
=> #<Enumerator: [1, 2, 3]:reverse_each>
irb(main):002:0> "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):003:0> 1.upto(10)
=> #<Enumerator: 1:upto(10)>
These Enumerator
s are Enumerable objects, and their each
methods yield the elements which would have been yielded by the original iterator method, had it been called with a block. In the example I just gave, the Enumerator returned by reverse_each
has a each
method which yields 3,2,1. The Enumerator returned by chars
yields "c","b","a" (and so on). BUT, unlike the original iterator method, the Enumerator can also return the elements one by one if you call next
on it repeatedly:
irb(main):001:0> e = "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):002:0> e.next
=> "a"
irb(main):003:0> e.next
=> "b"
irb(main):004:0> e.next
=> "c"
You may have heard of "internal iterators" and "external iterators" (a good description of both is given in the "Gang of Four" Design Patterns book). The above example shows that Enumerators can be used to turn an internal iterator into an external one.
This is one way to make your own enumerators:
class SomeClass
def an_iterator
# note the 'return enum_for...' pattern; it's very useful
# enum_for is an Object method
# so even for iterators which don't return an Enumerator when called
# with no block, you can easily get one by calling 'enum_for'
return enum_for(:an_iterator) if not block_given?
yield 1
yield 2
yield 3
end
end
Let's try it:
e = SomeClass.new.an_iterator
e.next # => 1
e.next # => 2
e.next # => 3
Wait a minute... does anything seem strange there? You wrote the yield
statements in an_iterator
as straight-line code, but the Enumerator can run them one at a time. In between calls to next
, the execution of an_iterator
is "frozen". Each time you call next
, it continues running down to the following yield
statement, and then "freezes" again.
Can you guess how this is implemented? The Enumerator wraps the call to an_iterator
in a fiber, and passes a block which suspends the fiber. So every time an_iterator
yields to the block, the fiber which it is running on is suspended, and execution continues on the main thread. Next time you call next
, it passes control to the fiber, the block returns, and an_iterator
continues where it left off.
It would be instructive to think of what would be required to do this without fibers. EVERY class which wanted to provide both internal and external iterators would have to contain explicit code to keep track of state between calls to next
. Each call to next would have to check that state, and update it before returning a value. With fibers, we can automatically convert any internal iterator to an external one.
This doesn't have to do with fibers persay, but let me mention one more thing you can do with Enumerators: they allow you to apply higher-order Enumerable methods to other iterators other than each
. Think about it: normally all the Enumerable methods, including map
, select
, include?
, inject
, and so on, all work on the elements yielded by each
. But what if an object has other iterators other than each
?
irb(main):001:0> "Hello".chars.select { |c| c =~ /[A-Z]/ }
=> ["H"]
irb(main):002:0> "Hello".bytes.sort
=> [72, 101, 108, 108, 111]
Calling the iterator with no block returns an Enumerator, and then you can call other Enumerable methods on that.
Getting back to fibers, have you used the take
method from Enumerable?
class InfiniteSeries
include Enumerable
def each
i = 0
loop { yield(i += 1) }
end
end
If anything calls that each
method, it looks like it should never return, right? Check this out:
InfiniteSeries.new.take(10) # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I don't know if this uses fibers under the hood, but it could. Fibers can be used to implement infinite lists and lazy evaluation of a series. For an example of some lazy methods defined with Enumerators, I have defined some here: https://github.com/alexdowad/showcase/blob/master/ruby-core/collections.rb
You can also build a general-purpose coroutine facility using fibers. I've never used coroutines in any of my programs yet, but it's a good concept to know.
I hope this gives you some idea of the possibilities. As I said at the beginning, fibers are a low-level flow-control primitive. They make it possible to maintain multiple control-flow "positions" within your program (like different "bookmarks" in the pages of a book) and switch between them as desired. Since arbitrary code can run in a fiber, you can call into 3rd-party code on a fiber, and then "freeze" it and continue doing something else when it calls back into code you control.
Imagine something like this: you are writing a server program which will service many clients. A complete interaction with a client involves going through a series of steps, but each connection is transient, and you have to remember state for each client between connections. (Sound like web programming?)
Rather than explicitly storing that state, and checking it each time a client connects (to see what the next "step" they have to do is), you could maintain a fiber for each client. After identifying the client, you would retrieve their fiber and re-start it. Then at the end of each connection, you would suspend the fiber and store it again. This way, you could write straight-line code to implement all the logic for a complete interaction, including all the steps (just as you naturally would if your program was made to run locally).
I'm sure there's many reasons why such a thing may not be practical (at least for now), but again I'm just trying to show you some of the possibilities. Who knows; once you get the concept, you may come up with some totally new application which no-one else has thought of yet!