The background to my questions/thoughts above is that at +UPPNEX
, what we have learned after taking care of 1 PB of NGS data from over 300 projects (colocated with a roughly equal number of projects from other domains), is the value of the Unix philosophy, of small independent tools, that do their job well, and are easily cooperating with all the other tools in the system.
Especially when interacting with a existing complex cluster infrastructures with multiple parallel file systems with proprietary drivers, resource managers etc that can not easily be changed, the general lesson is that only these general tools tend to be flexible enough to fit in nicely in whatever combination of the mentioned systems.
Anything that creates it's own "universe", with imposing it's own restrictions on how to handle storage, resource management etc etc (such as Galaxy), becomes a big hurdle to integrate, something that typically requires weeks of hacking and workarounds, which then easily becomes a maintenance nightmare.
We have thus been trying to find the tools that solve our problems in the most general way possible, and since we are co-locating bioinformatics and other domains, we don't even want it to be specific to the life sciences (and the basic problems aren't in fact), etc.
Thus, it would be interesting to hear a little on Arvado's philosophy regarding these issues?