In the course of talking to other tech companies about what they consider the scope of their SRE/DevOps roles, I've realized that the scope of SRE organizations differs substantially across the industry. Many SRE organizations are limiting their potential by hiring teams to do only the work that keeps the service(s) they are responsible for running but not the work that substantially improves the service(s). It feels like their teams are stuck due to being too overwhelmed with the basics to get out of the rut and do more meaningful work.
What I'm dubbing 'Maslow's hierarchy of SRE needs' categorizes the state of a team into the following buckets:
+ physiological health - is the service functioning at all (e.g. not repeatedly hard-down/bleeding revenue)? is the pager quiet enough to get any other work done? are we learning from outages and resolving postmortem action items to avoid repeating the same outages?
+ maintain homeostasis - is it possible to carry out day to day operations (e.g. push code, tolerate machine failures) without excessive manual work? are people automating away manual work?
+ boundaries & objectives - do we have clear scopes for what we're responsible for (e.g. better to be responsible for one thing solidly than many things diffusely), and an agreed-upon SLA/standard that we aspire to achieve?
+ self-awareness - do we know when we deviate from the standards based on metrics so we can take corrective action? conversely, this also means we can ignore noise that isn't tied to these metrics because our monitoring about the things we care about is solid.
+ self-actualization - freedom in time, trust, and ability dimensions to make substantial design improvements to the service (and measure the improvements!)
You don't get to the later stages of the hierarchy of needs without hiring both systems engineers and software engineers - SRE only works at its best if you have people with both skillsets collaborating. If all you're doing is giving people from pure sysadmin backgrounds a shiny devops title and no other support, you're not going to see results that are meaningfully different from the pure operational model of sysadmin work. If you struggle to name the exceptionally strong coders on your team, you're going to have a lot of trouble with the last step of actually getting core service-level improvements delivered (e.g. improving the service components themselves, instead of just rearranging their relationships). If you don't have a solid product dev-SRE relationship with clear boundaries, it's far too easy to slip into the trap of having all the operational work pushed onto SRE without effort put into reducing the total operational burden.
It's fairly easy to spot a well-functioning organization -- if it's primarily doing work in the self-actualization category, everything less complex in the hierarchy is likely to be shipshape. If an organization is stuck earlier in the hierarchy, it requires a great deal of support in order to reach a fulfilled and functional state. The support required takes many forms - upper management support for principled "no"s and enforcing good boundaries with product dev, hiring to ensure the correct breadth and depth of skillsets is present on the team, and vision from the team itself to push towards more sophisticated work rather than becoming comfortable just doing operations.
What can you do as the leadership of an engineering organization if you're looking to make sure your SRE team grows to its full potential? First, hire people who are excited about the scaling/performance/reliability challenges that your product development generalists lack expertise in, not just people to do the grungy work you don't want to be doing. Second, make SRE's goal to change the service based on experience running it, rather than just keeping it running. Third, make sure a majority of your SRE team's time is actually developing projects and learning new things. Finally, empower your SRE team to take full ownership of the service, including backing their ability to say no to product development.
If you don't do these things, you'll have trouble attracting new talent, and your best site reliability engineers will eventually become bored and leave for where they can enjoy self-actualization.
 For a potential external hire that wants to be doing work towards the latter steps in the hierarchy, it's a rather risky proposition to join a team that is currently stuck. Visibility into the root causes of the stuckness is often opaque from outside the organization, and whether there will be organizational support for making the necessary changes is also hard to assess from the outside. There's always a great feeling of accomplishment from being empowered to fix a situation and doing so, but it's best to avoid the situations where one is set up to fail from the beginning.
It should come as no surprise to any photographer that the interaction between subject and light is important. In fact, I’d say that this interaction is the essence of the whole thing; it’s what photography is all about.
On this evening in the San Joaquin Valley I tried to create an interesting interplay between subject and light by putting a flock of Ross's geese between me and the setting sun, hoping to capture birds flying in front of that beautiful sky. I got more than I could have hoped for, as the entire flock took off just before sunset.
You can see 's video of this event, and get the full story behind this photograph, by following this link:
#landscapephotography #wildlifephotography #birdphotography #sunsunday #wildlifewednesday #birdsinfocus #sunsetsaturday #sunsetphotography #sanjoaquinvalley #birds #wildlife
~ quotes on ...
- Chevron, Sybase, Quantum, Wells Fargo
- Contributor to Linux Gazette, Oakland Local, and Oakland Voices
- NY PolyTechnic
Google Media Tools: a new intersection for newsgathering
The New York Times used Google+ Hangouts to interview U.S. Secretary of State John Kerry about Syria’s chemical weapons. The Weather Channel
HTC holding an event September 20th, Bliss reveal coming? | This is my...
This is my next… Home; Editors. Joshua Topolsky; Nilay Patel; Paul Miller; Laura June; Joanna Stern; Chris Ziegler; Ross Miller; Thomas Rick
Enterprise Efficiency - Hyperscale IT: Datacenter Lessons From Web Leaders
You can learn valuable lessons from Web companies that live or die by the efficiency of their datacenters.
#muckedup Chat Tuesday: Local Reporting With LA Weekly - Muck Rack
#muckedup Chat Tuesday: Local Reporting With LA Weekly This Tuesday on the #muckedup chat we’re talking about the ins and outs of local repo
System76 Gazelle Professional Laptop Review & Giveaway
As an avid user of Linux, I’ve been very interested in laptops specifically built to operate Linux, as my own laptop had a few minor issues
Streamline Your Company’s Tasks And Collaborate More Efficiently with Bi...
Have you ever wondered how we collaborate on projects when our team is scattered all over the globe? At MakeUseOf, we use an online project
Enterprise Efficiency - Pro Computing From Consumer Tech
Careful planning is a key to BYOD success. This show tells you how to make those plans count.
Our live blog of Microsoft's 'major announcement' event starts tomorrow ...
Microsoft has invited us to Los Angeles for for a "major announcement," and we're going to be there live. We have a few guesses of our own o
Vanguard Windsor II Adm: MUTF:VWNAX quotes & news - Google Finance
Get detailed fund performance information for Vanguard Windsor II Adm (MUTF:VWNAX) including real-time quotes, historical charts & relat
Celebrating six students receiving the AP-Google Journalism & Techno...
Last summer, we announced a joint scholarship program for aspiring journalists with the Associated Press, administered by the Online News As