Listening to @KoltonAndrus ask "How many 9s do you need?" & it made me think: the way most ppl consider 9s is flawed bcuz it assumes all time is equal & doesnt consider outage length beyond aggregates. The problem is this is getting codified into SLOs. 🧵
i.e. Most orgs arent 24/7. All orgs have peak times. Short incidents are clearly less impactful than long ones. All of this is lost when you declare an SLO like "99.9% of requests over past 2 weeks return 200 OK" 🧵
Or e.g. take our friends at Slack. They're still 99.9% for the quarter, but clearly the Jan 4 incident made headlines... & it likely wouldnt have (or at least been small news) if it was May 4 at midnight PST. 🧵
So one common perspective is "you dont need that many 9s". If your app is only used during business hours in the USA, then it can go down at night & weekends... so you only need 23.8% availability (261 work days * 8 hours).

But here's the thing:... 🧵
YOU DO NOT CONTROL WHEN OUTAGES OCCUR! You only have some influence, but can never preclude incidents (e.g. how's the no-deploy Fridays working out for you?).

Whatever 9s you choose, you need to accept that the worst incident can happen at the least opportune time. 🧵
So when you say your goal is 99.9% & you can have 8.76hrs of downtime for the year, do you really think your execs would be ok if you had a perfect year, but were down for 8.76hrs on Black Friday/some other important peak day? 🧵
If not, then maybe you need more 9s. Or better, maybe you should start rethinking what SLOs should look like. e.g. consider segmentation/slicing. 🧵
I don't know what the solution is, but there's always going to be a disconnect btwn your pragmatic exec doing worst outage * $ lost calculations vs. you doing statistics of small, occasional incidents over time to determine what is/isnt acceptable.
You can follow @gitbisect.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.