Charity Majors
+ Your AuthorsArchive @mipsytipsy cofounder/CTO @honeycombio, co-wrote Database Reliability Engineering, loves whiskey, rainbows, and Friday deploys. I test in production and so do you. 🌈 Oct. 01, 2019 5 min read

Open thread: after you merge to master, how does your code get validated and rolled out to users? How many human- initiated steps, if any; what is the human override, if none; and is there any automated attempt to remediate a failure?

I described our system the other day in three tweets. Is yours better? ☺️🐝

These thoughts brought to you by reading the dora report. 

To recap, you can basically tell how high performing a team is by asking four questions: time to delivery, time to recovery, failed deploys and frequency of deploys.

Most of us (I hope) know the feeling of being on a high performing team. It's a giddy, visceral high; you spend most of the week making forward progress, solving puzzles, and collaborating with other smart people.

We even get paid to do it -- what incredible fortune!!

But software development is a finicky, human-riddled sociotechnical enterprise, and when it breaks down, as all systems do, it can be a nightmare on all fronts -- technical, cognitive, social.

Before DORA, most of our attempts at improvement were (at best) cargo culted.

"At Google we didn't have this problem and we did it like this" is responsible for half the inexplicable cultural oddities in tech, lol.

But the DORA report also shows why the solutions are so counterintuitive. When we falter, our instinct is to slow down and regain control;

but the report shows us that speed/frequency of deploys and number/degree of failures are tightly, *inversely* linked.

Slowing down and tightening your grip will not prevent failures or make them less disastrous. It will do precisely the opposite.

This is the Friday deploy fallacy in a nutshell. 

If you anxiously throw up gates and processes around shipping code so it takes longer and happens less often, you are dooming yourself to a life of big bang deploys and high stakes recovery events.

If you want to work on a high performing team, if you want to improve the experience of shipping software and spend more time on the good parts, less time on the frustrating stuff: fix your eyes on those four metrics.

✨Fix your eyes on deploys.✨

Philosophically, your attitude towards deploys should be this:

The *right* thing should be easy, fast, and obvious. Ideally even automatic. Do you want code to roll out to users every time you merge a branch to master? Then it should just magically happen.

This was the animating impulse behind my cron job. We were shipping to master constantly, and why should we keep having to pause our work and manhandle a deploy? Or worse, forget to deploy, and eventually have to wrestle a deploy with MANY merges?? 😱🥶🤬

Every one of you should get an involuntary twitch at the idea of shipping multiple changes in one act of deployment.


If you have deploy aversion, I'll bet that 95% of it is a completely valid fear based in the trauma of batched up deploys.

Unbundle your damn deploys. Retrain your amygdala. Don't throw the baby out with the bathwater.

Your entire business, team and culture will suffer if you cultivate a fear of shipping code to users. Instead, step by step, you must earn their trust, and I will tell you how.

In addition to making the right thing easy and graceful and automatic, you must make the wrong thing hard and ugly and full of friction.

(This always makes me think of @lacker's terrific, hilarious keynote on great API design. Watch it. )

The right thing must happen automatically and with no effort, otherwise the wrong thing will.

The right way must be the fastest way to ship code, or engineers will reach for a shortcut at the worst possible time -- and it will be untested.

If shipping is the heartbeat of any technical organization, then your deploy code is almost like RNA -- where your culture gets encoded and replicates itself, diff after diff, day after day. 

If you have an org that is scarred and crusty and terrified of failure, ✨you can earn it back.✨

Not with a big bang change -- please! -- but with small, iterative changes that center their current pain and rebuild trust faster than you are asking them to take on new risks.

Don't swoop in with a total rewrite. Code evolves, and this is even more true of deploy code.

Watch while a new hire attempts to navigate the system to ship their code. Fix the traps they fall into. (Docs are code too.) Make it harder and uglier to do the wrong thing.

Think from first principles. What would the perfect deployment process look like for your product, team and users? Think big, like impossibly big. 🌈

It may take years to get there, and that's fine. You definitely won't get there if you don't know where you're trying to go.

Instrument your pipeline. Sorry, let me correct that:


Practice observability-driven development. Instrument *ahead* of your building, not as a trailing job or an afterthought.

Some build teams have been doing killer work using honeycomb to instrument their CI/CD pipelines, using tracing to find bottlenecks and slow tests.

@IntercomEng has a really cool case study, they used us to get time elapsed between merge and end of deploy down to *4 min*. 🥳

There's other fun examples on our blog, I'm not going to look them all up for you, but here's one: 

Look, I've said a million times that we systematically underinvest in deploy tooling as an industry

I said that even before DORA showed us that you can basically evaluate how high performing a team is by how well they ship code. ☺️📈

This has so many ripple effects on your ability to hire and retain good people. You *have* to care. You have to spend resources on it.

I do see signs that we are starting to take this seriously. For one thing, the % of teams that are high performing 3x'd in the last year.(!)

Anecdotally, there's a real hunger coming from honeycomb users to better understand and refine and simplify their build process.

Simplicity is democratizing. Simplicity and speed encourage ownership. Deploy tooling is often where culture gets encoded and stored.

Which means it can be a powerful lever for ✨systemic change.✨

If you want to take a large organization by the nose and massively transform how software gets shipped, I suggest first arming yourself with a powerful observability tool, then joining or forming the internal tools team and taking on deploys.

Plus all the other shit I've said a million times about embracing failure, etc. Gotta go now, this rant went much longer than planned. lol.

I'll come back later and read the tweet descriptions of your build pipelines. Can't wait!!

You can follow @mipsytipsy.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Since you’re here...

... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.

Follow Threader