1/ “AIOps” – Not sure I’ve ever had a stronger reaction to a buzzword than to this union of snake-oil (“AI”) and crufty tradition (“Ops”). But it’s not just an emotional response to being replaced by robots. *It’s just that it's not going to work as promised.*

Thread 👇
2/ First, false positives are particularly harmful in ops. Ask anyone who’s carried a pager and gotten woken up at 3am for something that didn’t need to be fixed. And not only do false positives create more work, they erode trust in tools.
3/ Second, the software itself is constantly changing. I’ve seen organizations that make 100s or even 1000s of changes to their application every week! Training an algorithm on _last_ week’s releases is unlikely to perform in any reasonable way on _this_ week's software.
4/ Third, it’s critical for humans to understand the *reasons* why an alert was triggered. Sometimes simple actions can be taken without knowing why, but nearly all mitigations and every true resolution requires a deeper understanding of what’s happened.
5/ Black box approaches don’t help operators understand what’s happening or how to address it. Building response systems that are completely automatic – or worse auto*magic* – are an anathema to building *reliable* systems.
6/ (Don’t get me started on why any sort of automation and analysis needs to be tightly integrated into your ingestion pipeline! And it's not _just_ the economics!) https://twitter.com/lizthegrey/status/1308889038221385732
7/ Finally, whether or not a change is "good" or "bad" is a value judgement that requires business context – context that only a human has. For example, a 5% increase in latency may be totally acceptable if it’s part of launching a new feature on time… then again maybe not.
8/ Is AIOps _complete_ trash? Whether you like the term or not, there certainly *is* a lot of data so it seems like computers can/should help. But it's far from understood (at this point) what sorts of algorithms can be applied or how they can be used to _support_ engineers.
9/ So more next time on how I think AIOps *can* work, but for now just a reminder: we all know what happens when we take the humans out of the loop (ask Matthew Broderick) <EOF>
You can follow @save_spoons.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.