Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 Aug. 02, 2019 2 min read

Holy shit, this has been a banner week for logs. Check this out; it's an interesting take, almost a bridge between old and new.

The stripe way is essentially the honeycomb way, except they emit lines willy-nilly and print the context out *every time*, or enough of it that you can stitch it all back together post hoc.

Things I like:

* conceptually friendly to ~everybody
* doesn't depend on your code exiting or error ing cleanly
* helps you generate traces from ordinary events (yay)
* lets you do the poor man's debugger thing (ehh.. but rounding up)
* great for grepping

Things I don't:

* wasteful: you're spewing a ton of redundant data constantly
* this makes it subject to all the same terrific degradation scenarios as Logs Classic
* the oops-did-i-ddos-myself-agains
* unfriendly to graphical exploration in aggregate
* unfriendly to APM model

Let me explain those last two in further detail. People have been asking me if this approach qualifies as delivering observability, and I have been answering, hesitantly, "yeeeeessss..." -- with asterisks.

I want to be encouraging! and inclusive! This hits most of the points!

It certainly hits more of the points than newrelic or datadog or any other billion dollar company with "observability" on their website and marketing material, so PROPS.

- structured data
- raw requests
- read time aggregation
- first-person narrative

🌟🎉🌈

I have some questions about how you're going to store and query this data, though. Are there schemas, or indexes, or anything that places constraints on or forces you to choose up front what to capture or how you query it? And is there any postprocessing to reconstitute events?

If there is no post processing, then you can't actually correlate arbitrary dimensions with each other, which is a HUGE part of debugging and finding unknown unknowns.

For example, maybe you are looking for a problem which (unbeknownst to you) only turns up under ios10,

using chrome, and a particular extension, if the user account was created last week, and they are hitting the /payments endpoint, with a particular header, or on a particular shard -- you get the point.

Now, stripe is logging out SOME context each time: request ID, user ID etc.

But never ALL of the context. Which means you can't connect the dots -- unless you already know what to look for and make that extra hop by hand.

Which, as we've discussed, is the ultimate arbiter of whether or not something is observability. This does not qualify.

However, it's damn close, and all you need to get it over the edge is something like @cribl_io to pull your logs spew together into coherent events.

....And then a columnar store and exploratory UI. Hrm. I hadn't really thought of it this way, but I guess o11y demands a GUI.

Admitting this brings me sadness. There's no doubt you can get a lot of goodness out of my beloved sed and awk, but... It is much harder to explore trends in aggregate and look for outliers and the unexpected.

Great for crunching known unknowns. Less great for o11y.

Some of the sweetest hours of my life were spent teasing insights and outliers out of terabytes of logs on the command line. God I loved that shit.

... But I could've found those answers in a minute or two, if I'd had honeycomb. 😔


You can follow @mipsytipsy.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.