Charity Majors
+ Your AuthorsArchive @mipsytipsy cofounder/CTO @honeycombio, co-wrote Database Reliability Engineering, loves whiskey, rainbows, and Friday deploys. I test in production and so do you. 🌈 Mar. 30, 2019 3 min read

I don't know that I drew this out clearly in my talk: my point was that deploying code should never be thought of as a flip of the switch or an atomic event.

Consider a piece of code you have just written. Do you trust it? (No you do not.). Okay, so how do you gain trust in it?

Two ways: first you test for all the known failures and those you can predict, then you push it out... and see what happens. Uncomfortable truth.

We often talk about deploys as though they are the terminating point of the development process. "It shipped!" 🍾🍾 "I'm done!!" 🥂 Ops problem now, amirite?

Kittens, the deploy is when real engineering work *begins*. Everything leading up to that is child's play.

"Ship it and see what happens" sounds terrifying: it is! Fortunately a deploy is a process, not an atomic event, and there are many ways add confidence in it.

(You do this every day already, it just stings to hear it put so bluntly. Sit with that feeling a bit.)

If you are a software engineer, your job is delivering value to users. Not to jenkins or circle-ci. Your job is not done until your SLO is met for your users.

Your development process extends waaaaaayyyyyy into prod. You should be up to your elbows in prod every goddamn day.

Similarly, the work flow and tooling and monitoring and debugging of prod needs to extend wayyy back earlier into the development process.

Often this manifests as getting code into prod fast... But easing usage up very slowly, starting with internal users only.

The Prime Directive of software is not "nothing should break", it is "users should never notice".

This is not a terrifying principle. This is a liberating principle.

I've often said that your deploy code is the most important code in your entire system. (And usually the most under-invested in.)

It's also where you get the highest leverage for validating your code against real conditions & unknown unknowns, while staying under SLO budget.

How? Ok, off the top of my head,

* canaries
* internal users first
* progressive deploys
* high cardinality tooling (🐝)
* raw event inspection (🐝)
* traffic splitters
* shadow nodes
* just fucking instrument and look at the code you wrote after you ship it (🐝)

Like seriously, all the glorious tooling in the world isn't gonna help you if you never LOOK AT IT.

Be curious. The overwhelming majority of bugs can and will be spotted IMMEDIATELY after they ship, if the developer practices observability-driven development,

by which I just mean "instrument your code so you will know if it's working, then fucking check it once or twice."

Note that I didn't say "instrument it, and tell ops how to check it." Only the author has the full context, the original intent. Devs, you gotta live in prod too.

You don't just look at prod when something gets escalated to you bc it broke. If you only look at prod when it's broken, you will have no idea what "normal" looks like.

If you reframe your job to extend to user experience, there are a million reasons to be in prod erry day.

✨commercial break, back in a few✨

now where I

ah yes production

As @lyddonb likes to say, every minute an engineer spends in an environment other than prod, is a moment spent learning the wrong lessons. (paraphrase)

To me the very definition of a senior engineer is, an engineer whose gut instincts I trust and yield to. Someone who can smell a problem long before they can articulate why exactly.

And where exactly do you think that instinct is forged and honed? Their laptop? Staging?

Every moment you spend in prod, you are learning valuable lessons about how your code interacts with users and infra.

You are learning the right tools, the right habits, the right instincts for what is fast or slow, dangerous or safe. You are leveling up at real engineering.

And every minute you spend in not-prod environments you are learning the opposite. You are learning *bad* instincts, *dangerous* intuitions.

Like that running mysql -e "drop database blah" is fiiiiinnnne.

You can follow @mipsytipsy.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Threader is an independent, ad-free project created by two developers. Our iOS Twitter client was featured as an App of the Day by Apple. Sign up today to compile, bookmark and archive your favorite threads.

Follow Threader