Thread by @unixdaemon, The 97 Things Every SRE Should Know book (https://www.oreilly.com/library/view/97-things-every/9781492081487) looks interesting and [...]

Dean Wilson

unixdaemon

The 97 Things Every SRE Should Know book ( https://www.oreilly.com/library/view/97-things-every/9781492081487) looks interesting and covers a number of 1-3 page short topics. I'm going to try and read a few a week and make some notes. #sre-97

97 Things Every SRE Should Know

Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will … -...

https://www.oreilly.com/library/view/97-things-every/9781492081487

The random number generator came up with "42 - Why I Hate Our Playbooks" by Frances Rees first and it's a good overview of playbooks.

Quotables include - "Any playbook that can describe the exact steps to resolve an exact circumstance should be an automated script instead."

and

"We escalate to humans for a complex response, not a fast response."

It also has some guidance:

Ideally, a playbook should only contain:
* Why do I care? Severity and qualification of the user-visible impact.
* What can I look at? Consoles, logs, and inspection tools.
* What can I do? Mitigation tooling.

My own views on playbooks are that you get out what you put in, and they are often a last minute band aid rather than a full part of the product.

Runbooks should have a lifecycle. A time to be useful and a time to die / be automated away. The trick is knowing what stage one's at

I like to capture usage and relevance information on documentation like this. A simple thumbs up thumbs down on each page gets you started but ideally you'd track a little more.

Has the page ever been read? If so how long ago? Was it actually used? When was it last reviewed? A simple "helped" / "didn't help" checkbox and a comment box can help you get started.

It's important that any feedback you capture is as frictionless as possible and is immediately actionable. Don't make people change tab to a doc review system for example.

Anything that adds friction will stop people responding and IMHO it's better to gather some data than none.

You can follow @unixdaemon.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: