Thread by @mipsytipsy, All such numbers are bullshit, of course. But still quite useful, [...]

All such numbers are bullshit, of course.

But still quite useful, so let's play with this a bit.

First, start by tracking the four DORA metrics: time elapsed between code & deploy, deploy frequency, deploy failures, time to recovery. https://twitter.com/tomzalt/status/1344191437458403328

https://twitter.com/tomzalt/status/1344191437458403328

Next, ask your engineers for an honest estimate of how many hours per week they spend writing code that moves the business forward, not debugging or in meetings or dealing with tests or tool chains, etc.

Ask everyone on the team, and...ask anonymously.

Some variations might be, how much of the week do you spend on engineering problems that energize you? and don't feel self-inflicted?

Or: how many times per week do you find yourself paused & waiting on someone else?

Or, how often do you have to do a full mental context switch?

The equation I have in mind here is...well shit this is just queueing theory, isn't it? Parallelism, efficiency and performance.

But by asking some of these

squishy human questions you can hazard a rough guess, and if you measure you'll get your baseline.

All this shit can totally be measured -- how long do deploys take, how much human babysitting, how much firefighting.

I've been giving a talk lately where I sketch out the optimal failed deploy scenario vs a very ordinary "insidious scenario" -- https://twitter.com/tomzalt/status/1344193124763979777?s=21

https://twitter.com/tomzalt/status/1344193124763979777?s=21

In the insidious scenario, a deploy gets kicked off with a couple days' merges from a few devs. It fails; deployer begins frantically reading diffs, git bisecting, and pulling in the devs w/merges, until it's identified and fixed.

Time elapsed: rest of the day, for a few people?

Compare the same exact bug under a virtuous deploy loop. Engineer merges, auto deploys, is notified of the failure a few minutes later, so she goes right back and swiftly commits the fix.

Time elapsed: maybe 15 minutes? with a blast radius of 1.

Virtuous loop (with CI/CD): 15 min * 1 dev
Insidious loop: 1-5 hours * 2-5 devs

That's ~2-25 hours worth of engineer time to ship the exact same amount of business impact.

Now multiply this scenario week over week, team by team, year upon year.

But it's not just as simple as measuring deploy metrics and hours freed up for more useful work. It's also team bloat. Growth penalties.

You can see why this team would already be clamoring to hire SRE teams and build specialists -- without CI/CD they can't get shit done!

But how to account for those growth penalties? Here's a rule of thumb.

For time elapsed between writing code and code deployed,

Under

15 min

is optimal.
Hours? Double your engineers.
Days? Double again.
Weeks? Double again.
Months? (Too far outside my experience to say)

This matches my experiences pretty well. With the caveat that depending on your product and scale it might be mostly engineers or the sum of engineers, product and design (and project managers, etc).

Curious if this matches other people's experience or not.

Feeling oddly relieved that honeycomb engineering has had CD since the start. We wouldn't have survived as a company with a deploy pipeline on the order of hours-long, if my rule of thumb holds true...we couldn't have afforded the extra engineering capacity long enough.

Latest Threads Unrolled: