Shared publicly  - 
 
A common trick in data structures design for trees is to annotate nodes with information about the subtree. For example, if you want to efficiently query the sum of a range in a tree, if sums are stored in subtrees you only need to query logarithmically many.

Now here's a question: why do we not have any extensible, modular way of adding this metadata to vanilla data structures? Why do I have to reimplement red-black trees plus bonus features if I want to do this?
2
Dan Doel's profile photoNoah Easterly's profile photoMichael Sloan's profile photoChris Lesniewski's profile photo
6 comments
 
Don't finger trees let you attach an arbitrary Monoid? Unfortunately I think the implementation in Data.Sequence is hardcoded to length. I wonder why it doesn't expose that?
 
There's a separate finger tree package with the general definition.

I believe the reason Data.Sequence is specialized is speed.
 
Seems like it would be difficult to do this in a way that was completely general, but I bet there's a paper in doing it for binary trees balanced using rotation operations.
 
It's probably possible so long as you assume a specific constraint on the contained type when you write the operations for manipulating the container.
 
This would indeed be pretty useful! This comment kinda turned into an essay - sorry!

One way to implement a data structure that would support this sort of extension would be to hide the constructors, and only use them to export a class instance. This class could be pretty simple, like a (forgetful) bijection, or more complicated, providing a lens (implementing this lens efficiently would likely require efficient equality tests, though). The annotation datatype would also implement this typeclass, and could be as simple as "data (FBij a, Default b) => Ann a b = Ann a b". The forgetful bijection would just yield the "a" for pattern matches, and "flip Ann def" for construction.

(yes, forgetful bijection is likely not a technical term)

I've thought about stuff related to this while working on monoidal rope representations of text buffers. Some other ideas I had:

* It'd be convenient to be able to have different annotations to the same structure co-exist, despite not knowing about eachother. Of course, a tuple of the annotations In other words, plugin A doesn't need to know about plugin B, but they both store annotations.

* Data dependencies between annotations - this would allow for expressing what annotations depend on, so that we know when they need to be updated / have been invalidated.

My initial (WIP) solution involved storing a "Map AnnID Dynamic" - not very ideal. I'm not sure if there's a much better solution if it needs to be done at runtime.
 
Perhaps what's necessary is for the original data structure implementer to parameterize over their recursive container (rather than assuming the Identity). So that, instead of:

newtype RedBlackTree a = RedBlackTree (forall n. Black n a)
data Black n a where
Leaf :: Black Zero a |
B :: a -> Node n a -> Node n a -> Black (Succ n) a
data Red n a = R a (Black n a) (Black n a)
type Node n a = Either (Black n a) (Red n a)

You have

newtype RedBlackTree' f a = RedBlackTree' (forall n. f (Black' n f a))
data Black' n f a where
Leaf' :: Black' Zero f a
B' :: a -> Node' n f a -> Node' n f a -> Black' (Succ n) f a
data Red' n f a = R' a (f (Black' n f a)) (f (Black' n f a))
type Node' n f a = Either (f (Black' n f a)) (f (Red' n f a))

And the types of the various data structure manipulation functions (`insert`, `delete`, etc) are updated to put the necessary constraints (Functor, Monad, Monoid) on the container, now that the Identity can no longer be assumed.
Add a comment...