Can someone explain Clojure Transducers to me in Simple terms?
Transducers are recipes of what to do with a sequence of data without knowledge of what the underlying sequence is (how to do it). It can be any seq, async channel or maybe observable.
They are composable and polymorphic.
The benefit is, you don't have to implement all standard combinators every time a new data source is added. Again and again. As a result, you as user are able to reuse those recipes on different data sources.
Prior to version 1.7 of Clojure you had three ways to write dataflow queries:
nested calls
(reduce + (filter odd? (map #(+ 2 %) (range 0 10))))
functional composition
(def xform (comp (partial filter odd?) (partial map #(+ 2 %)))) (reduce + (xform (range 0 10)))
threading macro
(defn xform [xs] (->> xs (map #(+ 2 %)) (filter odd?))) (reduce + (xform (range 0 10)))
With transducers you will write it like:
(def xform
(comp
(map #(+ 2 %))
(filter odd?)))
(transduce xform + (range 0 10))
They all do the same. The difference is that you never call transducers directly, you pass them to another function. Transducers know what to do, the function that gets a transducer knows how. The order of combinators is like you write it with threading macro (natural order). Now you can reuse xform
with channel:
(chan 1 xform)
Transducers improve efficiency, and allow you to write efficient code in a more modular way.
This is a decent run through.
Compared to composing calls to the old map
, filter
, reduce
etc. you get better performance because you don't need to build intermediate collections between each step, and repeatedly walk those collections.
Compared to reducers
, or manually composing all your operations into a single expression, you get easier to use abstractions, better modularity and reuse of processing functions.
Say you want to use a series of functions to transform a stream of data. The Unix shell lets you do this kind of thing with the pipe operator, e.g.
cat /etc/passwd | tr '[:lower:]' '[:upper:]' | cut -d: -f1| grep R| wc -l
(The above command counts the number of users with the letter r in either upper- or lowercase in their username). This is implemented as a set of processes, each of which reads from the previous processes's output, so there are four intermediate streams. You could imagine a different implementation that composes the five commands into a single aggregate command, which would read from its input and write its output exactly once. If intermediate streams were expensive, and composition were cheap, that might be a good trade-off.
The same kind of thing holds for Clojure. There are multiple ways to express a pipeline of transformations, but depending on how you do it, you can end up with intermediate streams passing from one function to the next. If you have a lot of data, it's faster to compose those functions into a single function. Transducers make it easy to do that. An earlier Clojure innovation, reducers, let you do that too, but with some restrictions. Transducers remove some of those restrictions.
So to answer your question, transducers won't necessarily make your code shorter or more understandable, but your code probably won't be longer or less understandable either, and if you're working with a lot of data, transducers can make your code faster.
This is a pretty good overview of transducers.
Transducers are a means of combination for reducing functions.
Example:
Reducing functions are functions that take two arguments: A result so far and an input. They return a new result (so far). For example +
: With two arguments, you can think of the first as the result so far and the second as the input.
A transducer could now take the + function and make it a twice-plus function (doubles every input before adding it). This is how that transducer would look like (in most basic terms):
(defn double
[rfn]
(fn [r i]
(rfn r (* 2 i))))
For illustration substitute rfn
with +
to see how +
is transformed into twice-plus:
(def twice-plus ;; result of (double +)
(fn [r i]
(+ r (* 2 i))))
(twice-plus 1 2) ;-> 5
(= (twice-plus 1 2) ((double +) 1 2)) ;-> true
So
(reduce (double +) 0 [1 2 3])
would now yield 12.
Reducing functions returned by transducers are independent of how the result is accumulated because they accumulate with the reducing function passed to them, unknowingly how. Here we use conj
instead of +
. Conj
takes a collection and a value and returns a new collection with that value appended.
(reduce (double conj) [] [1 2 3])
would yield [2 4 6]
They are also independent of what kind of source the input is.
Multiple transducers can be chained as a (chainable) recipe to transform reducing functions.
Update: Since there now is an official page about it, I highly recommend to read it: http://clojure.org/transducers