I'm partially responsible for the support, operation and improvement of the foundational services that +HP
has built its #Helion
CI/CD workflow on. These services -- gerrit, zuul, nodepool, jenkins, etc - are the very same services used upstream to build #Openstack
itself. As you might expect, things are complicated, and understanding what's going on at any given point in time can often be elusive and non-obvious. So, to better assist us, we create and / or use tools to visualize the data we collect from the system. Most of the time we focus on aggregate data, but some times we'd like to drill in and analyze a specific build. I've created a tool called zuul-build-viz to do this. This tool parses the zuul.log and creates charts visualizing a specific changeset build. Because a build in the context of zuul is actually just a series of events and these events largely follow a progression (the exception being the actual jobs which run in parallel), the charts that this tool produces show when events occur and how much time passes between them. The combination of both these bits of information can really be insightful. For example, a long period of time between the time a job is submitted and the time it starts running could indicate that the demand for a given node type exceeds your supply of them and / or that Jenkins is saturated. What if you saw a job ran twice in the same build? That would indicate the job returned a "None" result and may warrant additional investigation (after all you could pay a penalty in time when this happens if it effects the overall build time). You get the idea.
I've gone ahead and made my code publicly available on GitHub (though I can't guarantee it'll stay in this repository). Please check it out!https://github.com/timrchavez/zuul-build-viz
Also, if you're curious about the chart this tool produces, I've attached a sample chart which shows a build for our internal config.