Profile

Cover photo
Jason Chu
Works at Google
Attended University of Victoria
435 followers|192,635 views
AboutPostsPhotosYouTubeReviews

Stream

Jason Chu

Shared publicly  - 
1
Add a comment...

Jason Chu

Shared publicly  - 
 
Best latte art yet 
1
Debbie Chu's profile photoJennifer Clow's profile photo
2 comments
 
Better latte than never.
Add a comment...

Jason Chu

Shared publicly  - 
 
From the unfortunate-domain-name-split department. 
5
1
Berry Phillips's profile photoMatt Wachowski's profile photoTiberiu Gociu's profile photo
2 comments
 
No way that was accidental :)
Add a comment...

Jason Chu

Shared publicly  - 
 
Baby Lucas James Chu.  Born May 30th, 2014 at 9:33am.  6lbs 15oz.

This changes everything.
16
Scott Moyer's profile photoMatt Wachowski's profile photoAndreas Radke's profile photoJoan Puigcerver's profile photo
4 comments
 
Congratulations!
Add a comment...

Jason Chu

Shared publicly  - 
 
This is a pretty good explanation of my job at Google.
 
What is Site Reliability Engineering?

On the 10th anniversary of the Site Reliability Engineering team’s creation, Niall Murphy interviewed me about what SRE is, how and why it works so well, and the factors that differentiate SRE from operations teams in industry.  The interview is being published on G+ in segments; a link to the full interview is included below.
------

Niall:  So what is SRE?

Ben: Fundamentally, it's what happens when you ask a software engineer to design an operations function. When I came to Google, I was fortunate enough to be part of a team that was partially composed of folks who were software engineers, and who were inclined to use software as a way of solving problems that had historically been solved by hand. So when it was time to create a formal team to do this operational work, it was natural to take the “everything can be treated as a software problem” approach and run with it. 
So SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor.
On top of that, in Google, we have a bunch of rules of engagement, and principles for how SRE teams interact with their environment -- not only the production environment, but also the development teams, the testing teams, the users, and so on. Those rules and work practices help us to keep doing primarily engineering work and not operations work.

Niall: How is this reflected in the day-to-day work and responsibilities of an SRE team?

Ben:  In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
Many operations teams today have a similar role, sometimes without some of the bits that I’ve identified.  But the way that SRE does it is quite different. That’s due to a couple of reasons.
Number one is hiring. We hire engineers with software development ability and proclivity.   To SRE, software engineers are people who know enough about programming languages, data structures and algorithms, and performance to be able to write software that is effective. Crucially, while the software may accomplish a task at launch, it also has to be efficient at accomplishing that task even as the task grows.
During our hiring process, we examine people who are close to passing the Google SWE bar, and who in addition also have a complementary set of skills that are useful to us. Network engineering and Unix system administration are two common areas that we look at; there are others. Someone with good software skills but perhaps little professional development experience, who also is an expert in network engineering or system administration -- we hire those people for SRE. Typically, we hire about a 50-50 mix of people who have more of a software background and people who have more of a systems engineering background. It seems to be a really good mix.
We’ve held that hiring bar constant through the years, even at times when it's been very hard to find people, and there’s been a lot of pressure to relax that bar in order to increase hiring volume. We've never changed our standards in this respect. That has, I think, been incredibly important for the group. Because what you end up with is, a team of people who fundamentally will not accept doing things over and over by hand, but also a team that has a lot of the same academic and intellectual background as the rest of the development organization. This ensures that mutual respect and mutual vocabulary pertains between SRE and SWE.
One of the things you normally see in operations roles as opposed to engineering roles is that there's a chasm not only with respect to duty, but also of background and of vocabulary, and eventually, of respect. This, to me, is a pathology.

Niall: Outside Google, we often observe that there isn't parity of esteem between the SWE and operations teams, which combines poorly with the fact that they often have different incentives. That’s how we end up with the model that exists in the industry today, where SWE teams write something and throw it over a wall to the operations teams, who then try to make it work, and can’t, and throw it back, and so on.

Ben: It’s interesting in this context to also look at the organizational differences that make SRE what it is, not just the individual work habits.
One of the key characteristics that SREs have is that they are free to transfer between SRE teams, and the group is large enough to have plenty of mobility. Additionally,  SWEs are free to transfer out of SRE.  But, in general, they do not. 
The necessary condition for this freedom of movement is parity between SWEs in general, SREs who happen to be SWEs, and compensation parity between those and systems engineers in SRE. They're all groups that are held to the same standards of performance, the same standards of output, the same standards of expertise.  And there's free transfer between the SWE and the SRE SWE team. The key point about free and easy migration for anyone in the SRE group who find that they are working on a project or a system that is “bad” is that it is an excellent threat, providing an incentive for development teams to not build systems that are horrible to run.
It's a threat I use all the time.  I say, "Look, we're only hiring engineers into SRE.  If you build a system that is an ops disaster, the SREs will leave.  And I will let them." And as they leave and the group drops below critical mass, we will hand operational responsibility back to you, the development team. 
In Google, we have institutionalized this response with things like the Production Readiness Review, which helps us avoid getting into this situation by examining both the system and its characteristics before taking it on.  Also, by sharing operational responsibility between the SRE and DEV teams for a time after launches -- shared responsibility is the simplest and most effective way I know to remove any fantasy about what the system is like in the real world.  It also provides a huge incentive for the DEV team to make a system that has low operational load.
15 comments on original post
1
Add a comment...

Jason Chu

Shared publicly  - 
 
Another miserable day in rainy Seattle... 
1
Berry Phillips's profile photo
 
:'( too much white in your blue.
Add a comment...
Have him in circles
435 people
Ross Dunn's profile photo
Evan Purcer's profile photo
rahul kumar's profile photo
Dhruv Gupta's profile photo
WJ Murray's profile photo
terry wong's profile photo
Jaro Larnos's profile photo
Joanna Stephani's profile photo
Scientists in School's profile photo

Jason Chu

Shared publicly  - 
 
Just received a call from Comcast.  This is not a quoted conversation, but it definitely covers the gist of it.

Rep: I have a few questions for you.  How much do you use your internet?
Me: All the time, for lots of things.
Rep: Very good sir.  And how much do you use your tv?
Me: Not at all.
Rep: Uh huh, ok.  Let's see here.  I have a deal that might be interesting for you.  How would you like more tv channels and worse internet for more money?  Does that sound like a good deal to you?
Me: Ummm, no.
2
Add a comment...

Jason Chu

Shared publicly  - 
 
New Razer Chroma

Here are two layouts that show off the individual key colouring.
1
Sylvain Soliman's profile photoJason Chu's profile photo
2 comments
 
I don't game all that often, so I can have trouble finding less often used keys. Things like M or f5 on thief keyboard. It also helps me separate different keys that are close together, like G is for poppies and H is for health.

I don't have a lot of experience with cherry switches. I find these ones are fine. I hear they also click and reset sooner.
Add a comment...

Jason Chu

Shared publicly  - 
 
Took too long setting up the camera.  Didn't get any great pictures of lightning.
1
Add a comment...

Jason Chu

Shared publicly  - 
 
Lucas' first Canada day
1
Berry Phillips's profile photo
 
... And he slept right through it.
Add a comment...

Jason Chu

Shared publicly  - 
 
Cousins with Lucas

Sleeping

His first face-palm
1
Add a comment...

Jason Chu

Shared publicly  - 
 
Google took an album (mostly of pictures taken from a dSLR) and my location history and put together a story from my trip to Cologne.
1
Jj Del Carpio (jjdelc)'s profile photo
 
Damn! These albums! Amazing 
Add a comment...
People
Have him in circles
435 people
Ross Dunn's profile photo
Evan Purcer's profile photo
rahul kumar's profile photo
Dhruv Gupta's profile photo
WJ Murray's profile photo
terry wong's profile photo
Jaro Larnos's profile photo
Joanna Stephani's profile photo
Scientists in School's profile photo
Work
Employment
  • Google
    Site Reliability Engineer, 2013 - present
Basic Information
Gender
Male
Apps with Google+ Sign-in
Story
Tagline
My tag line is better than your tag line.
Introduction
Programmer by day and ... programmer by night. 20 year martial artist, motorcycle rider, company founder, owner of an arcade machine, and all around great guy.
Education
  • University of Victoria
    Computer Science, 1999 - 2005
Links
YouTube
Contributor to
I was in the area but could only find a pile of rubble. It burned down.
Food: Poor - FairDecor: Poor - FairService: Poor - Fair
Public - a year ago
reviewed a year ago
1 review
Map
Map
Map