People talk about “embracing failure,” in order to innovate. Maybe. But Let me tell you, I have no tolerance for ignoring risks. A thread.
In business, you have to think ahead. What can I build that people will need in 12 months? What could go wrong? What will happen in the market? What’s urgent today? Can we weather storms brewing?
But too often in technology, I see people in reactive mode. Technology is change. If you wait to see how things shake out, you are already behind.
I’ve seen this most running cloud and software services. Praying to pager gods, fingers crossed. But if you fail to plan, you plan to fail!
If you’re reacting to incidents, you don’t belong in a modern tech company. Or at least not mine. I don’t mean to be harsh, but it’s reality. We can’t just wait for failure.
I’m not saying that all failure is unacceptable. In fact, I expect things will go wrong, and we will have to handle it. But we should make a conscious choice about what risks to accept and which to mitigate.
To me, it all starts with the customer and keeping critical customer experiences working as best as possible. Customers will accept a little bit of flakiness as long as it’s not all the time. You need to have redundancy, set expectations, and avoid massive problems.
The way to do this is to have a plan. A goal for services that’s realistic, based on data, shared and vetted. If everything’s broken, I want you to tell me “told you so.” I’m not happy the service is down, but OK we had a discussion, we made a choice.
This is what we’re doing at @nobl9inc for customers: planning for failure, even in small amounts. Modeling business risks, understanding the limits of customer expectations, connecting the dots between infrastructure and business.
If you aren’t setting clear reliability guidelines for your services and wrestling over which risks to mitigate, you are not running at optimum. And if you’re saying to your team “I want it to be perfect, no mistakes.” you are not leading; you are putting your head in the sand.
When you tell your team “I want 100% reliability” what you’re really saying is “you decide.” You have to make a tradeoff because if you don’t reality will bite you. At least be clear to your team about what is important so they can engineer and not guess.
Failure is inevitable. Business is about anticipating problems and properly managing risks. This is what I expect from everyone, especially my technology team.

#slo #sre #digial #reliability #devops #cloud #startup #ceopov

/end
You can follow @marcinkurc.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.