Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 Mar. 11, 2019 2 min read

Yes, you should change how you gather telemetry... For precisely those reasons.

The way you are doing logs and metrics now ✨is the hard way.✨. Why do you keep inflicting it on your poor teams? If they haven't learned it by now, they _never will._

(@JeffGrigg1's first tweet:)

Getting engineers to manage schemas, keep track of verbosity and write amplification and crazy bespoke naming conventions and metrics vs logs... Is a fucking NIGHTMARE.

And it's a castle built on quicksand. Can't be saved.

The answer is not to double down on more and more complex retrofits and abstraction, or scolding and shaming your team.

The answer is a real foundation. Which, if you listen to the experts -- not just me! -- is arbitrarily wide, structured events, one per request per service.

*Why* does this matter? It may seem far-fetched, the idea that data format alone could percolate throughout the stack and make everyone miserable.

(If you aren't a data person, that is. Data people are rolling their eyes at me right now, like /duh/)

But just look at the pain points Jeff named. Devs write logs out excessively, ops raises the log level.

But in my model there is only ever one log event written out per req,per service. No log levels, no writing logs out mid-execution, no downward pressure on capturing detail.

As a reminder, there is loads more detail in my pinned tweet, the honeycomb blog, or here:  https://charity.wtf/2019/02/05/logs-vs-structured-events/ 

You never want to discourage am engineer from capturing a piece of detail that may someday be interesting or needed.

But it doesn't need to be written out as a whole new log line! Just append it to the blob. Adding a value to the blob is approximately free.

Schemas? Indexes? Don't need those either. That's why we use a columnar store. Start writing a dimension whenever you want, stop whenever you want.

(Columnar store == every dimension is effectively an index)

See what I mean? All these "fairly obvious basics" are actually GHASTLY HACKS that only ever existed to work around hardware scarcity.

The reason nobody can learn them is they make no sense in today's world. Don't blame the player, blame the dumb as fuck game.

This happens all the time -- people say "well I don't really NEED observability until I get much bigger and have harder problems, right?" Because they think it's harder than traditional monitoring.

It's partly my fault, because I emphasize the necessity of it at scale.

It's usually good sales strategy to focus on intense pain. But I wonder if this has caused us to bury the equally pressing message:

"observability is better and easier for everyone, and it ✨makes you a better engineer✨ by putting you in constant conversation with your code."

There are two good reasons to listen to me on this topic, btw.

1) you're interested in using honeycomb, and want to understand how/why we are different

2) you're a software engineer with an eye to the future, wanting to know how to unfuck your future self.

I take all comers. What I'm saying is, this advice may sound honeycomb-specific, but it's just a general best practice.

Other vendors have been copying our messaging and technical decisions for 2+ years, they're gonna get to this one too. 😉 it's ok, we're just right.


You can follow @mipsytipsy.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.