moderator

Discussion  - 
 
Package dependency management: more debate is needed

You may have seen Greg's post from a few weeks ago (linked below), advocating a strict adherence to the Package Versioning Policy's guideline of specifying both lower and upper bounds when you're writing or modifying a .cabal file.

I'm completely sympathetic to Greg's desire to keep old code working. At the same time, as a maintainer of some core packages that are "upstream of everyone", I feel like Gulliver roped to the ground by the thousand tiny ropes of the Lilliputians: it's becoming increasingly difficult for me to get anything done.

For example, right now, I'm updating both the text library and a new library in lockstep. The new library has a test suite because I'm good like that, and the test suite depends on a chain of third party libraries that somewhere has an upper bound on an older version of text. If I want to make any progress, I have to do a lot of spelunking, local cloning of repos, and hand editing of fake package versions.

It's possible that I'm feeling this particularly keenly because I was, er, lucky enough to write libraries that everyone uses, but the day to day experience of trying to work in this world is a total pain. Not quite enough to make me give up, but I certainly have to expect a couple of hours of unnecessary misery here and there more or less any time I start on anything ambitious.

Perhaps one band-aid for people in similar positions would be to add a command line option to cabal-install to say "violate the dependencies if you have to". Perhaps there's a PVP policy that could be conjured up that better balances the needs of upstream and downstream authors. But the current situation sucks from where I stand, and I'd rather be using my now very limited time to write useful code instead of trying to come up with a more sustainable plan and build consensus around it.

So. All is not well in the ecosystem; and sticking with the status quo continues to be a bad idea; and I don't have the time or the energy to be the one to fix this. Sorry.
24
1
Leon Smith's profile photoAndy Georges's profile photoBryan O'Sullivan's profile photoSimon Marlow's profile photo
28 comments
 
I'm confused by the issue you're having specifically. I mean, I thought Cabal Dev was supposed to mitigate that sort of issue for development.

That said, my personal preference would be that every package be given a unique identifier based on the version of it's upstream identities. Sort of like the way Nix works.

edit: Oh, wait, you mean you need to use the new version of text with the new library you're working on but your upstream dependency needs an earlier version? Oh, yeah, dang that's tough.

edit2: Hmmm... maybe library versions should be something that the module system actually knows about... I've always thought it was kind of weird that packaging isn't something that's built into the core of a language.
 
The major problem I've got with the PvP is that it's supposed to be applied strictly (and the PvP police will come for you if it doesn't) instead of guess-estimating that some libraries are very different from others in term of stability. For example base, bytestring, text .. should not have a upper bound  in your dependency list IMO.

If some of my code break upstream of one of those stable libraries (which in my experience happen very rarely), I much prefer release an update quickly or put a temporary upper bound if I don't have time to fix it straight away. The build "down-time" is usually fairly low (half a day usually), and specially with the excellent Stackage I get to know when thing breaks, most of the time before it's experienced by users. The upside is that most of the time, I don't have to do extra work painstakingly bumping versions around every so often, and things works with newer GHC 
 
I dropped upper bounds and have reduced my maintenance work massively, leaving more time for real work. I doubt I'm going back. I do occasionally have to rush out new versions of several packages though, eg when haskell-src-exts changes. 
 
My issue is that you can't set upper bounds appropriately,  because you don't know in advance what future versions of a library your code will and will not be compatible with.    And forcing a package to bump it's version just to update it's dependency ranges can compound the problem.    I think it should be possible to modify dependency ranges after the fact,  without creating a new version.
 
I agree that upper bound version dependencies are not good. Some people can get a lot too strict with it, creating all the problems you've laid out. Still, I don't think the solution lies with abandoning upper bounds, but with more education on the part of developers as to when they should bump versions and what version ranges they should depend on. What I would advocate is make it the official policy of Hackage that all package maintainers adhere to semantic versioning. http://semver.org explains this.

Under this system, which in practice is already the norm, you could declare that a package requires at least foo-3.2.* (which provides some new feature 3.1 doesn't) and at most 3.* so that you don't catch a non-backwards-compatible update. No package version should change its dependencies after the fact, contrary to +Leon Smith, because what if someone depends on your package? They're screwed because it introduces ambiguity into the dependency graph. No, bumping dependency versions, which should happen less often under semantic versioning, should require a bump to your PATCH version. Otherwise how are people supposed to know to upgrade?

It is the decision of a few developers to ignore this standard that causes everyone else grief. Everyone must work together for this scheme to succeed, so they must be made aware of those mistakes to correct them. Developers also need to make sure their versions mean what we think they mean, otherwise it's chaos all over again.In summary, it should eventually be Hackage policy to disallow upper bound dependency versions more specific than a MAJOR version. This should drastically improve the situation.
 
The dependency issue was discussed a couple of months ago [here](http://www.reddit.com/r/haskell/comments/1ns193/why_pvp_doesnt_work). Some agreed that the concept of soft and hard upper bounds might help. When building a package, a user has the option of ignoring a soft upper bound. A hard upper bound means that the other package is known the break the package being built. Would that help Bryan's issue?
 
Upper bounds are useful information, and we should not throw away information.  The problem is that the tools aren't using the information in an effective way.  There need to be times when we ignore the upper bounds, such as when there isn't a solution within the bounds, and there need to be easy ways to modify the bounds, both locally and for everyone.  Basically it's a tooling issue.  But if we don't provide the information in the first place, the tools have nothing to work with, and packages break. 

Even if the package author is quick off the mark with an update, users are forced into an upgrade cycle, which might trigger further updates.  Less work for the package author might mean more work for users.
 
Partial solution would be to an option for Cabal-install to use only package versions from tested-with. If it contained package versions, that is. Then continuous integration system could give a feedback to package author about known good package version, and such a version would be somehow given back to user.

And of course there is a second minor problem here: we assume that all information about package is known at the moment of upload/release, not at the moment after first automated build... or even later when it is built with new package versions, and thus proven compatible.
 
I tried +Mikhail Glushenkov's new --allow-newer patch tonight and it worked very well for me, allowing me to get past a roadblock where I had several more packages with stale-and-too-restrictive upper bounds in my dependency graph. Good work!
 
A little bit of an offtopic, but same violation would be very nice in terms of "hidden modules" when you want to play with stuff in ghci, or sometimes even hack on something else. Not sure how to resolve this, but for me it was definitely a real time-killer several times.
 
Hackage can store the versions of all the packages with which each new package was sucessfully built when the package was uploaded.  A flag in cabal could be added to compile with these version. (with a warning when the compiler version is not the same).  That could help to solve some problems.
 
Konstantine: you may always use `ghc-pkg expose` to reveal those modules for a time. I often use. Altough I often have to use Hoogle or (better) Hayoo to find which module to expose.
 
I'm going to buck the trend here and say that, if you don't use upper bounds on your package and I've ever used your package, you have probably broken me several times.  I know leaving off the upper bound on text has.  Many of my haskell projects move quite slowly as they are either volunteer work or in hibernation or both, so they can't follow what is the latest on hackage. For example, they might want the version of a particular package that was released with Debian Squeeze (and yes, that ancient version of GHC), and I just need to change a String and rebuild, not update my application to use a brand new API.  Missing (or incorrect) upper bounds tends to make packages that worked 6 months ago stop suddenly, and various incompatibilities between projects means that my ghc-pkg user data often gets blown away and has to be recreate from hackage.  I know your pain, but I have to do the same thing plus code archaeology at the same time.

For this/your particular use case, you should be using a sandbox and the sandbox should have a way to tell it: "I've bumped the version for package X, treat upper bounds that include version OLD as including the current version."  As far as I know, no sandbox supports this, but removing upper bound (for packages uploaded to hackage) is the wrong way to fix that.

That said, the/my use case of maintaining unmoving code will also get a lot easier once "cabal sandbox" hits Debian Sid.  I'll have the particular versions of the particular packages I used before installed and ready in the sandbox and I won't have to pull old versions from hackage.  Hopefully, I'll be able to save a "snapshot" of the sandbox and restore it on another (pretending to be suitably ancient, but usually virtual) machine.  My dev system move forward, but the limited production environments don't, for good reason.  If I was deploying a new production system today, it would be Debian Wheezy and you have to pull all haskell packages that are available from wheezy/wheezy-backports from there instead of installing from hackage.  Sometimes "Ops" is even more "conservative".
 
Boyd: and again using known-to-work version tested by automatic build tools would solve your problem better!
 
+Simon Marlow if I added an upper bound I'm telling you what the current version of the dependency on the day I upload the package is. That information is not valuable, and could be trivially computed with a date comparison. In the cases where that isn't true, I do add upper bounds. Having an array of buildbots build/test my package at different versions of dependencies would give valuable information. 
 
The upper bound indicates the versions of a dependency that work. That might coincide with the current latest version, or it might not - if you dont tell cabal, it cant figure it out by itself. It's definitely valuable information. How could it not be? I don't disagree that automatic build reports would be useful too, and maybe one day when the tooling is better, explicit upper bounds will mostly be unnecessary. But we're not there now.
 
+Neil Mitchell But maybe for some or other reason you might wish to use a previous version (because the latest has a regression you wish to avoid). Then automated calculation based on a date night mess things up.
 
I think the heart of the problem is that sometimes (in practice, probably never?) I want to say that I know a particular version is an upper bound, whereas normally I would be happy to include a tested-with, which is not an upper bound, merely a "I know this works". In Cabal these are shoehorned into the same construct, and I have to release a new version every time I run a new test, which is decidedly unpleasant.
 
With the new hackage you don't have to release a new version to update the bounds.
 
I have a strong dislike for making manual tweaks that do not go through my version control system, do not get automatically built/tested on my continuous integration server etc. The feature makes me consider upper bounds again, but I'm not convinced enough to take the plunge just yet.
 
Not only this, but the updating-the-bounds-on-hackage has implications for security and signing packages, and extra work for a package's author that would have to monitor hackage changes to merge them to its own VCS.
 
Simon: thanks to know. Next time I try to update upper bounds through Hackage interface. Does it affect signing packages?

Should we start to make a spec for tested-with & autobuild as a replacement for obligatory upper bounds? (Except when package maintainer explicitly wants to state that the newer version of dependency is known to be incompatible.)
 
I have always thought you made sense on this (liberal upper bounds) and I think you do here.
 
A new major version means that the API of a package has changed in a way that might break dependent packages. But sometimes (often?) dependent packages work with this new API just fine. Why is that? Because they use a part of the API that hasn't changed at all? If so, shouldn't we track dependencies not on entire packages but on individual declarations? How common (in your experience) is it that a declaration changes? How often can the new declaration be used as a drop-in-replacement?
Add a comment...