Thread by @save_spoons, 1/ “AIOps” – Not sure I’ve ever had a stronger reaction to [...]

spoons

save_spoons

1/ “AIOps” – Not sure I’ve ever had a stronger reaction to a buzzword than to this union of snake-oil (“AI”) and crufty tradition (“Ops”). But it’s not just an emotional response to being replaced by robots. *It’s just that it's not going to work as promised.*

Thread

2/ First, false positives are particularly harmful in ops. Ask anyone who’s carried a pager and gotten woken up at 3am for something that didn’t need to be fixed. And not only do false positives create more work, they erode trust in tools.

3/ Second, the software itself is constantly changing. I’ve seen organizations that make 100s or even 1000s of changes to their application every week! Training an algorithm on _last_ week’s releases is unlikely to perform in any reasonable way on _this_ week's software.

4/ Third, it’s critical for humans to understand the *reasons* why an alert was triggered. Sometimes simple actions can be taken without knowing why, but nearly all mitigations and every true resolution requires a deeper understanding of what’s happened.

5/ Black box approaches don’t help operators understand what’s happening or how to address it. Building response systems that are completely automatic – or worse auto*magic* – are an anathema to building *reliable* systems.

6/ (Don’t get me started on why any sort of automation and analysis needs to be tightly integrated into your ingestion pipeline! And it's not _just_ the economics!) https://twitter.com/lizthegrey/status/1308889038221385732

https://twitter.com/lizthegrey/status/1308889038221385732

7/ Finally, whether or not a change is "good" or "bad" is a value judgement that requires business context – context that only a human has. For example, a 5% increase in latency may be totally acceptable if it’s part of launching a new feature on time… then again maybe not.

8/ Is AIOps _complete_ trash? Whether you like the term or not, there certainly *is* a lot of data so it seems like computers can/should help. But it's far from understood (at this point) what sorts of algorithms can be applied or how they can be used to _support_ engineers.

9/ So more next time on how I think AIOps *can* work, but for now just a reminder: we all know what happens when we take the humans out of the loop (ask Matthew Broderick) <EOF>

You can follow @save_spoons.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: