Charity Majors+ Your Authors @mipsytipsy CTO @honeycombio; co-wrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤Black Lives Matter🖤 Mar. 27, 2020 2 min read + Your Authors

I am convinced there are many, many (most?) software companies out there with 2x, 3x, 4x+ the headcount they really need to build and support their core product.

But they never understood their breaky, flaky systems, so they had to plaster over the problems with people.

Teams can't make forward progress on the product if they have lots of firefighting and bug fixes to do, so you hire more engineers and split up the surface area. People start to specialize. That's normal.. inevitable.

But 5 teams of engineers aren't going to make 5x as much progress as 1 team of engineers. Each of them actually slows way down. You've added more friction, more coordination, more overhead, and more externalities.

And then you start hiring specialist teams of DB engineers, operations engineers to run the thing for you, but now you've broken up the core virtuous feedback of "you build it-you run it", which takes a lot of careful work and investment to reassemble.

And then there's the layers of managers, directors, VPs involved, and the support teams to buffer and triage all the user complaints and bug reports, and product managers and project management and HR and other people who need to exist just to coordinate and route information...

If you quickly grow from one eng team to 5-6 engineering teams, you might be burning 10x as much cash every month, and getting ... 3x as much done? optimistically?

I wish there were a lot more cultural value placed on staying as small and nimble as you can, for as long as you can. Instead of humblebragging about how big your company has gotten, I wish we had the words to brag about

✨how much we have achieved✨
✨with so few✨

But without observability, you kinda don't have a choice.

That 15 minute debugging story @evanderkoogh just posted? Without instrumentation and a clean, well-understood system, that could easily have eaten up a day of someone's time. Or worse, it could have metastasized.

The problem could have generated a bunch of bug reports and user confusion, all of which needed to be triaged, routed, reproduced. It could have bounced around and eaten up hours from several different engineers' time.

It could even have been swept under the rug and ignored.

Do this for a few months, a few years, and that's how you get a hairball of a system -- one everyone is afraid to touch and nobody wants to change.

Because it breaks in ways that no one can anticipate or explain, and takes an unknowable amount of time and energy to recover.

But this is not inevitable. And that is why you should invest in observability -- because systems that are well-understood and understandable and friendly to hands-on experimentation will let you move fast and stay small.

For a startup, this could be a matter of life or death.

You can follow @mipsytipsy.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Since you’re here...

... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.

Follow Threader