Shared publicly  - 
Pipes - observation, comment, and question

I've had a chance to digest the Pipes package, and I have three basic thoughts I'd like to share.

The observation: a driving motivator for iteratees has always been making it easier to reason about resource usage. In support of this, a general property of enumeration-IO packages has been that scarce resources (Handles/sockets/file descriptors) will never be held for longer than necessary, and indeed it's basically been impossible to circumvent this (at least in "iteratee", "enumerator", and Oleg's code). However, by reifying enumerators (producers) to data, the pipes package no longer has this property. A Pipe may hold an open Handle until it's garbage collected, which may be much later than you would expect (or desire). I don't think it will be easy to regain this property without significantly altering some core design choices.

The comment: I believe that the decision to require that all data producers be explicitly drained is a mistake. This is mostly because at present, I don't see an easy way to short-circuit the necessary IO to consume e.g. a file if it's determined at an early stage that further processing is unnecessary. Also, out of all the programming mistakes I've made, and bugs I've introduced (both categories are unfortunately quite large), I don't believe I've ever made an error that would have been prevented by this requirement. However, this would be relatively simple to change and/or work around should the implementers desire to do so.

The question: for anyone who's used the pipes library for any significant work, how does performance compare to iteratee/enumerator/iterIO? An element-based implementation, such as pipes uses, is unquestionably more elegant than the block-based implementations of other libraries, but I've yet to see an element-based implementation that matches the performance of a block-based version. Changing this probably wouldn't be too difficult, but the result would definitely not look as pretty.

In conclusion, I'm very glad to see some new libraries expanding the design space, as most other packages hew fairly closely to Oleg's work. However, I do think it's fair to say that the "pipes" design makes some tradeoffs that may impact some use-cases heavily. I don't think the ultimate iteratee design, if indeed one exists, has yet been found.
The pipes package. "Iteratees done right". This library implements iteratees/enumerators/enumeratees simply and elegantly, but uses different naming conventions. Advantages over traditional ...
John Lato's profile photoGabriel Gonzalez's profile photoOliver Braun's profile photoDebasish Ghosh's profile photo
I can't speak for pipes, but conduits also avoids chunking, and performance has been fine. But that's not really a fair statement, since the majority of the use cases for conduits (or iteratee/enumerator/pipes) involve data which has already been chunked, either via ByteString or Text.

Chunking was originally in conduit, but I removed it (+Greg Weber's idea) to solve the data loss issue. I consider it a very minor point in the API design, and quite orthogonal to other decisions (chunking could be added back without much difficulty).
+Michael Snoyman - thanks for your input. You actually touch on my only criticism of "enumerator": IMHO it gets the stream abstraction wrong by presenting chunks to users as essentially first-class data. Conduit looks more general in that it isn't necessary to work with chunks, however it seems that you do so anyway. Essentially, what I would like to see is that you change to `sourceHandle :: ResourceIO m => Handle -> Source m Word8`, etc. At least that's my vision for iteratee; sometimes the performance lags behind chunk-aware code.

Getting chunking right without losing data is hard; data will need to be discarded at certain points. Properly defining those times is tricky.
I definitely understand the desire to have such an API, it just seems that in practice it complicates things too much. Here's a possible approach that I haven't really digested fully: the type of `sourceHandle` isn't really that important, it's the type of all the operations. I suppose the practical ramification is that all of the helper functions in enumerator and conduit need to be written three times: in the List, Binary and Text modules.

Perhaps we can unify those operations via an associated type, like `type family Content a`, with instances like `type instance Content ByteString = Word8` and `type instance Content Text = Word8`. Then we could define `head :: Resource m => Sink a m (Content a)`. We'd have to be careful that we still implement things efficiently.

Is there another advantage to have sourceHandle work directly on Word8 that I haven't realized?
I don't understand how just the API presents a complication. The combination of my desired API and high performance does provide for a somewhat messy implementation unfortunately, but that's a price I'm willing to pay.

Unifying the operations via an AT is exactly what iteratee does, through the ListLike package (technically a fundep, but that doesn't matter). There's only one declaration for most functions, and the List, Bytestring, and Text versions all come for free (vector-based versions too, which I use a lot). The only real performance pitfall is with enumeratees; element-wise enumeratees tend to be much less efficient than chunk-wise versions. It can be harder to write chunk-wise enumeratees, because you need to either manually handle leftover data or decide to drop it. But you're used to doing that from enumerator/conduit anyway.

The only other advantage I'm aware of (which shouldn't be underestimated) is having a single implementation point for a very large number of functions over a variety of stream types.
I think Paolo summed up most of what I wanted to say. In Pipes, exception handling (as in Control.Exception) can be delegated to the base monad with no special considerations. Same thing goes for EitherT/ErrorT (which I personally prefer). However, Paolo and I are working on automatic resource finalization. I think his blog post on guarded pipes is on the right track but we are reworking it to make it more elegant and easier to reason about. Our goal is that automatic resource finalization is layered on top of lazy finalization so that you don't have to drain resources to still get automatic finalization.

I believe that the strongest advantage of Pipes over other iteratee libraries is that they are easier to reason about both in terms of their performance characteristics and their behavior because they rely heavily on category theory abstractions. Everybody in this thread is an expert on Pipes because it's so easy to understand. Notice that in the Pipes documentation I never have to explain how the Pipe data type is implemented or how the monad instance works and you never see Stack Overflow questions asking to explain various vagaries of Pipes behavior. Paolo and I are working to ensure that the final resource finalization implementation is equally easy to reason about.
I'm sorry, but that's a lot of hand-waving. Firstly, there's more to exception handling than resource deallocation. What if you want to run a pipe and catch an exception? There doesn't seem to be any way to do so. And due to the nature of pipes (similar to enumerator), you will need to have your code living in the Pipe monad for the majority of your program.

I have first-hand experience with the pain this causes: we had to jump through hoops in Yesod to get exception handling right. I can go into details if you're not familiar with the situation.

As far as "everyone's an expert" and it's so simple... I'm sorry, but the documentation makes me very concerned about pipes. You have the breakdown between strict and lazy, and if you use them the wrong way, things break. It's all well and good that you have a Category instance, but that's an incredibly minor point, and Conduit could have one as well if we were willing to swap around type variables and make things inconsistent. And all we get out of that is the ability to replace (=$=) with (.).

As far as different types: that an advantage of conduits, not a disadvantage. Type errors are direct and clear. It's easy to understand exactly what's going on: a Source just provides data without consuming it. I don't see where the clunkiness is that you're referring to, and mentioning such vague accusations is really not going to advance arguments at all.

But my main gripe about pipes: there's nothing serious to back it up. There are a whole bunch of claims about how composable it is, how it's a better enumerator/iteratee, but you've not actually defined the problems you're solving. I can surmise that you believe it makes things easy to reason about, but there's no real incentive to switch besides just trusting you that it's going to solve problems.

I've made it clear that I don't believe pipes can scale up to the real problems large projects face. I don't think pipes could handle something like an HTTP proxy elegantly. I would recommend that, if you believe otherwise, to actually show serious code demonstrations.
+Michael Snoyman FWIW, I get a "more elegant" feeling out of Pipes than conduits, and +Paolo Capriotti's post resonated a lot with my initial impression of conduits. It's just one man's opinion, but conduits feels ugly (enumerator even worse) and pipes seems nice.

I wouldn't be so quick to dismiss criticisms of conduits that come from this direction. Some abstractions are better than others.
Actually, I do tend to dismiss criticisms like that without any kind of backing. We hear them all the time in the Haskell world: people think imperatively, there's too much time spent worrying about types, etc. These are baseless criticisms, and they don't advance arguments at all.

My strong belief- and I have yet to be shown otherwise- is that the elegance of pipes (which I'm not denying) is due to the fact that it doesn't solve the real problems we have. There's no question that conduit could be made simpler. I could completely remove BufferedSource, and get rid of an entire extra abstraction. But that would mean the library can't do as much as it can now.

So instead of saying "pipes are more elegant than conduits" without backing, what I'm looking for is "pipes are more elegant, and they can do everything conduits do." At that point, you'll have my attention. Until then, it's a bunch of hot air.

Alternatively, you could prove that pipes do everything necessary and that some of the features of conduit needn't exist, but I think that will be a hard sell. Each feature of conduit came as the result of some actual problem we were trying to solve.
I misunderstood. I didn't realize that the clunkiness claim was directed at the different types. It's true: we'll have to agree to disagree. I don't think having separate operators is confusing. I also don't think that you're correct about different behaviors for different kinds of composition. Or more to the point: if you are correct, it's a universal problem that pipes can't solve.

What I mean is that there is an inherent issue of data loss that cannot be overcome. If you don't believe me, go read the relevant section of the conduit chapter, it spells out data loss in terms of plain lists.

Let me give you a completely valid use case that demonstrates the shortcomings we had with enumerator, and which I assume based on everything I've seen apply to pipes. We have a web application that needs to read a request body, and pipe it through something which my throw an exception. For example, streamed parsing of XML. If there is an exception thrown by this pipeline, we need to catch it and return a 400 error message.

You've also pointed out another downside of pipes: it can't handle buffered sources. This is a major component of conduit. Compare WAI based on enumerator and http-enumerator WAI/conduit and http-conduit, you'll see the complete change in API. pipes would not allow us to have this more elegant solution.

In other words: pipes may be elegant in the small, but will lead us back to ugliness in the large.

To be clear: based on all the comments I've seen so far, I still believe pipes was designed in a vacuum without taking actual problems into consideration. It's very easy to create elegant solutions under such circumstances.
Apologies, let me clarify: the reason we need different composition operators is because we have different types, nothing to do with data loss. Any differences in behavior would come down to the issue of data loss, which is a universal issue.

As a side point, removing chunking lessens the impact of data loss, which IMO makes pipes and conduit more resilient than iteratee/enumerator. (enumerator works around the issue at the combinator level. For an example, look at the implementation of concatMapM.) But there are still times when it crops up.
I just want to emphasize that this is NOT a competition. Paolo and I strive a lot to emulate the practicality of conduits and I feel we have a lot to learn from it. In fact, we are indebted to conduits because it takes the pressure off of our library to pragmatically deliver results to the Haskell community immediately and we have more freedom to experiment and try to "get it right". After all, Haskell needs a killer framework like Yesod right now to broaden its appeal and I wouldn't want to feel responsible for holding the community back because I was nit-picking on elegance (as I am wont to do).
I agree that there's no competition in the sense that we're personally trying to one-up each other, I feel the same way. And to be clear: I like pipes, and am very glad people are looking into alternate approaches to the problem. I'm truly hopeful you can come up with something that will either simplify conduits, or even replace them.

However, at a technical level, I think there is a competition, in the sense of which package will be used. I'm just worried that pipes will not cover all use cases, and we'll end up with a lot of people spending a lot of time trying to solve problems in pipes that those of us who used enumerator already tried to solve.
I suppose I have to disagree that pipes are more compositional than other libraries. For example, in iteratee Enumerators are just Kleisli arrows, and can be composed via Kleisli composition. You just apply iteratees to enumerators like you would any other function. I don't see what's non-compositional about it, or even what's not associative. As an example, most of iteratee's fancy combinators are simply a combination of plain old function composition (.) and running an iteratee.

In fact, in some ways pipes are less compositional. Consider Kleisli composition of enumerators. I don't see how this is possible with pipes. It would work if your producer has a non-Zero source type, but then what's type stop someone from accidentally writing `pipeFile "b" <-< pipeConsumer`? In iteratee, and I presume conduits as well, mistakes like this are compile-time errors. With pipes, it appears to me that invalid code like this is either accepted at compile time, or you have to restrict the available compositions.

At its heart, iteratee provides only two things: a monadic parser library and functions for feeding data to those parsers. It's really not any different from uu-parsinglib, attoparsec, or many similar libraries, except for providing more enumeration functions.
I have actually eliminated Zero from the next release of the library (scheduled roughly for a week from now) and all pipes are now type safe and you can compose them and run them safely as you described. pipeConsumer will end up with a polymorphic output type and pipeFile will have a polymorphic input type so that would type-check and run even if no information actually flows across their "boundary". runPipe now has the type:

runPipe :: (Monad m) => Pipe (Maybe a) b m r -> m r

... so that it can guarantee providing input since it can't guarantee at compile time that the pipe it runs doesn't call await. You are completely correct that v1.0 is unsafe for this reason and this was absolutely a flaw in the initial library release.

It turns out that the (Maybe a) in the input type of runPipe leads to some very elegant and symmetrical results grounded in category theory and it led to useful bonus functionality I didn't anticipate, and I'm still discussing this with Paolo to see how much of them we want to include in the next official release.

I think you are missing the huge potential of iteratee libraries. It's not just about streaming data for performance. It's about compositionality and modularity that is applicable to ANY programming project. Iteratees make it trivial to write modular code and to mix and match functionality. For example, I can write functions like:

email address = forever $ do
x <- await
lift $ sendEmailTo address (show x)

prompt = forever $ do
x <- lift $ getLine
yield x

And now I can just compose them and I have a program that creates an e-mail for everything I type on the command line:

runPipe $ email ""<+< prompt

If I decide I instead want to fire up a GUI program to compose e-mails, I just write a new producer pipe:

composeMessage = do
x <- lift $ getMessageFromGUI Program
yield x

And if I want to send 10 emails using my GUI program I just write:

runPipe $ email ""<+< replicateM_ 10 composeMessage

But maybe I prefer to first have a friend approve my e-mails first to make sure that I'm not sending out drunk e-mails to ex-girlfriends. I can just write:

verify = forever $ do
x <- await
lift $ emailToFriend x
y <- lift $ waitForFriendResponse
when y $ yield x

And now i just compose that in the middle:

runPipe $ email ""<+< verify <+< replicateM_ 10 composeMessage

You might even note that I could try using the "email" pipe I defined before to send the e-mail off to my friend instead of defining a hard-coded function like "emailToFriend". Right now we are also integrating arrow functionality into pipes based off of Paolo's blog post that will let you build up complex pipe flow controls and let you fanout or create recursive pipes.

This is the future of general-purpose programming and it is why Haskell shines, making it incredibly easy to mix different modules with incredibly little integration code required. This is why I consider iteratees a "killer feature" of Haskell right up there with STM and why I strive to develop a really elegant solution to try to convert other people to Haskell.
Add a comment...