Composing traversals
· Pablo ·This post assumes basic knowledge on Haskell, including what Functor
and Monad
are.
The problem
A few months ago, I was at a functional programming conference where the following problem1 was posed at a talk
Given a container
Tree
, write a functionupdate :: Tree a -> [a] -> Maybe (Tree a)
that, when called asupdate t as
, updates the leaves oft
by the new tags inas
(listed in preorder).If there are not enough elements on
as
, just returnNothing
This post attempts to explain a concise solution to that problem. You might want to try it on your own at first and then come back and read my solution. If you think your solution is better, please send it to me!
We will need the following modules:
import Data.Tree
import Control.Monad.State
import Data.Maybe
import Data.Functor.Compose
Multi-way Tree
s
In order to solve the problem, we will make use of multi-way trees, also known as rose trees, which are defined as
data Tree a = Node a [Tree a]
That is, a Tree
is made from the value of the root node (of type a
) and a (possibly empty) list of children trees.
To warm-up we will need a simple function2 that creates a singleton tree from a value.
singleton :: a -> Tree a
singleton x = Node x []
That was easy! Now, to simplify the problem a little, let’s first try writing an
unsafeUpdate :: Tree a -> [a] -> Tree a
function. If you don’t know how to approach this problem, you can think of it on imperative terms.
The following is a plausible solution on Python, assuming some tree type with methods tag
and children
and a predicate isLeaf
that tells you whether a tree is a leaf.
def unsafeUpdate(t, as):
if isLeaf(t):
t.tag = as.pop(0)
else:
for child in t.children:
unsafeUpdate(t,as)
As we shall see, the solution present in this post is strikingly similar in appearance to that one, yet it remains purely functional. Of course, other possibilities exist, yet I believe this one is concise, efficient and it is easily transformed into a safe version.
Purely functional State
This section introduces the State
monad.
If you already know about it, you may skip it.
The Control.Monad.State
module can be scary-looking at first, yet we will make simple use of it.
It defines a monad3 State
, that can be thought of as having this definition:
newtype State s a = State {runState :: s -> (a,s) }
A value m :: State s a
encapsulates a transformation that makes use of a state s
, it (possibly) transforms it and obtains a value of type a
.
For example, a function from System.Random
such as random
:: StdGen -> (a, StdGen)
could be encapsulated inside a randomState :: State StdGen a
value that produces a random value and has as an internal state the seed of type StdGen
.
By encapsulating state transformations in this way, we can remain pure while simulating computations that use some sort of state.
Unsurprisingly, for any possible type s
, State s
is a functor, with mapping given by
f <$> m = State $ \s -> let (a,s) = runState m s in (f a, s)
That is, we run the state modifying computation m
and then apply f
to its result, leaving the state unchanged.
It also is an Applicative
and a Monad
.
For simplicity, we only show the definitions of pure
and >>=
, leaving <*>
as an exercise4.
pure x
is the computation that returns x
, leaving the state unchanged.
pure x = State $ \s -> (x,s)
The bind operator is a bit more complicated; on this setting its type would be
>>= :: State s a -> (a -> State s b) -> State s b
that is, it takes a computation that would produce a value of type a
, runs it and applies the given function.
Its definition is equivalent to
m >>= f = State $ \s ->
let (a, s') = runState m s
in runState (f a) s'
State also defines a set of functions that allow us to get and manipulate the internal state, and to get the final value of the computation. We will use (with simplified type signatures):
get
, that gets the state,get :: State s s get = State $ \s -> (s,s)
modify
, that applies a function to the internal state,modify :: (s -> s) -> State s () modify f = State $ \s -> ((), f s)
evalState
, that gets the final value of the computation,evalState :: State s a -> s -> a evalState = fst . runState
If you feel like you need further examples on State
before moving on, you can check the Wikibooks tutorial.
The base case
We are now ready to write a skeleton of the unsafeUpdate
function, and to write a the base case.
As you might expect, the idea is to write it in terms of a function using State
.
The internal state will be the list of remaining new leaves, and thus we want to write a function with type
unsafeUpdateSt :: Tree a -> State [a] (Tree a)
By using what we learned in the last section, we could then write:
unsafeUpdate = evalState . unsafeUpdateSt
But let’s not get ahead of ourselves.
If we look at the base case on the Python version we want to simulate the line t.tag = as.pop(0)
, that is, extracting the first value of the as
list and returning it.
Let’s try and write a version of pop
(named unsafePop
) using the State
monad.
What would its type be?
We don’t really need to specify the index here, since we will only take the first element, hence, it is just a State
value.
The internal state would be the list of tags [a]
and the result of the computation would be the first element of the list, with type a
. Therefore:
unsafePop :: State [a] a
Now, using do
notation we can write the function easily: we first get the internal state, then we delete the first element from it and lastly we return this element:
unsafePop = do
xs <- get
modify tail
pure (head xs)
That almost looks like imperative code!
Now, for tackling the base case, we just need to get a State [a] (Tree a)
from that State [a] a
value, therefore:
unsafeUpdateSt (Node _ []) = singleton <$> unsafePop
The inductive case will need further machinery.
List traversals
If we look at the Python code, we can see that the gist of the inductive case is to apply the function unsafeUpdateSt
recursively to the list of children,
for child in t.children:
unsafeUpdate(t,as)
and pass the list of remaining new tags around.
Of course, there are no for loops on Haskell, (at least not exactly) so we need to resort to (implicit) recursion.
Now, the usual way to apply a function to a list is by treating the list as a functor and using the map
function, whose definition (with :
written in prefix notation) is
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = (:) (f x) (map f xs)
yet in this case or function unsafeUpdateSt :: Tree a -> State [a] (Tree a)
has a type that looks more like a -> f b
for a certain container f
(namely, State [a]
) and thus map
is not useful in this situation5. A generalization is needed.
The Traversable
class provides just that generalization (and much more).
We will use it only on lists, but it can be used on any Functor
that supports an iterator-like interface (for example, most data structures on the containers
library). Its type (specialized to lists) is
traverse :: (Applicative f) => (a -> f b) -> [a] -> f [b]
Its definition6 is pretty straightforward, namely,
traverse - [] = pure []
traverse f (x:xs) = (:) <$> f x <*> traverse f xs
which (excluding the use of <*>
and <$>
) looks very much like the map
definition.
It is a generalization since we can use the Identity
functor to get back
map f = runIdentity . traverse (Identity . f)
Using this function we can apply unsafeUpdateSt
recursively to the list of children and then reconstruct the tree by mapping the Node
function appropriately. That is,
unsafeUpdateSt (Node x children) = Node x <$> traverse unsafeUpdateSt children
This feels very similar to the imperative code! In fact, the Data.Traversable
module defines for = flip traversable
so we could define
unsafeUpdateSt (Node x children) = Node x <$> for children unsafeUpdateSt
So, to recall, we can define unsafeUpdate
in terms of unsafePop
as:
unsafeUpdateSt :: Tree a -> State [a] (Tree a)
unsafeUpdateSt (Node _ []) = singleton <$> unsafePop
unsafeUpdateSt (Node x children) = Node x <$> traverse unsafeUpdateSt children
unsafeUpdate :: Tree a -> [a] -> Tree a
unsafeUpdate = evalState . unsafeUpdateSt
Now on to the actual problem!
Composing functors
The Data.Functor.Compose
module defines the Compose
newtype,
newtype Compose f g a = Compose { getCompose :: f (g a) }
This type allows us to compose two type constructors (with kind f, g :: Type -> Type
) into a new one.
The interesting feature that Compose
allows is to define instances for Functor
, Applicative
or Traversable
for the composition of two Functor
s/Applicative
s/Traversable
s.
As an example, suppose f, g :: Type -> Type
are both functors.
Then Compose f g
is also a functor, where the fmap
is given by
fmap :: (a -> b) -> Compose f g a -> Compose f g b
fmap f (Compose x) = Compose (fmap (fmap f) x)
where the first fmap
is the one for f
and the second the one for g
.
Similar instances can be defined for Applicative
and Traversable
(check the source code if you are curious!).
How can we take advantage of Compose
for our problem?
Let’s start by refining the pop function.
Recall from the previous sections that we defined a function unsafePop :: State [a] a
.
The problem with this function is that it is unsafe: if the state is the empty list, the function will throw an error.
To solve this, we change the type to State [a] (Maybe a)
, returning Nothing
if there are no remaining elements in the list.
We wrap it up with the Compose
constructor, having
pop :: Compose (State [a]) Maybe a
pop = Compose $ do
xs <- get
modify' (drop 1)
pure (listToMaybe xs)
Now, our unsafeUpdateSt
function can be used as is for the safe version of the function.
We only need to replace the unsafePop
function by pop
:
updateSt :: Tree a -> Compose (State [a]) Maybe (Tree a)
updateSt (Node _ []) = singleton <$> pop
updateSt (Node x children) = Node x <$> traverse updateSt children
Notice that both <$>
and traverse
here are using the instance of Functor
and Traversable
for the type Compose (State [a]) Maybe
.
Finally, the solution to our problem will be to unwrap the one provided by updateSt
.
update :: Tree a -> [a] -> Maybe (Tree a)
update = evalState . getCompose . updateSt
As you can see, once we had the unsafe version, creating the safe one was just a matter of updating the pop
function appropriately and dealing with the wrapping and unwrapping of Compose
.
Conclusions
In this post we have seen a possible solution to a problem dealing with trees using Haskell.
The use of Compose
, Traversable
and Functor
allowed us to translate an imperative-looking solution into purely functional code, producing a concise yet complete solution.
Furthermore, the solution could be built step-by-step, by considering a simplified problem first.
This might not be the best most efficient solution for this problem7, but it is an excellent way to present such type-classes’ usefulness.
As I said at the beginning, any improvements or alternative solutions are welcome. Until next time!
-
That was not the exact statement; the point of the talk was to show how to use dependent types to statically avoid the possibility of not having the correct number of tags on the list, but this version is better suited for the purpose of this post. It also used binary trees while we will use multi-way trees. ↩
-
The
Data.Tree
module already includes such function on theApplicative
instance (i.e., thepure
function), but I think that makes the code slightly more confusing (sinceTree
’s behavior as anApplicative
is far from trivial). ↩ -
Well, technically a class of monads, so as to allow for the use of “monad transformers”, but we will not make use of transformers on this post. ↩
-
Although at present time it might be more beginner-friendly to restrict the talk to Monads and use
return
instead ofpure
, the “Monad of no return” proposal seems likely to be accepted, thus in the long term sticking topure
is the right choice.return
is also discouraged in other Haskell-inspired languages such as Idris. ↩ -
If you have used monads (specially if you did so before the Burning Bridges proposal of GHC 7.10 was put in place) it is likely that you have met with
mapM
. AlthoughmapM
can be used for definingunsafeUpdateSt
, it will not be applicable (directly) in the safe version of update, because of theMonad
restriction. Furthermore, I believe it should be discouraged in favor oftraverse
, since in current versions of GHCmapM
is justtraverse
with a restricted type. ↩ -
One should use
foldr
here and that is what the Prelude does but for didactic purposes I chose this definition. ↩ -
In particular, although I have not checked it, it seems likely that you would need to deal with laziness to avoid a space leak. ↩