Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 Jul. 18, 2019 4 min read

1) impossible
2) if it were possible, you would never pay to collect and store it all
3) even if you could pay, "logging" only solves the easy part. Now you get to solve "finding all the right bits of data, at the right time, in a timely manner". 🌷☺️

Here's an old blog post I wrote on some of the problems with logging and a logging mindset:

 https://www.honeycomb.io/blog/lies-my-parents-told-me-about-logs/ 

You may have noticed that honeycomb doesn't refer to "logs" or "logging" as a thing that we do.

Even though you could totally think of honeycomb events as structured logs, or stream your structured log into honeycomb just fine. So why not call it logging?

Well, "logs" and "logging" have a lot of baggage. When you hear the term "log" you tend to assume, until proven otherwise:

* unstructured strings
* spewing to local disk
* emitted willy-nilly thru the execution path, each line containing just a few nouns of detail

Honeycomb data is the exact opposite of all of these: highly structured, streamed over the network directly from your code, and *extremely* wide and dense, with hundreds of nouns in a single emittance.

And honey libs emit only one event per request per service,

containing *all* the context amassed by that request as it executed, emitted right before the request exits or errors.

It's not unthinkable to call what honeycomb does "logging". But if a term contradicts all your assumptions in reality, maybe it's not the right term.

(Oops, looks like I got interrupted mid thread again. 🙄)

Anyway, I forget where I was going with this, so I'll just say that logs (unstructured strings, flushed to disk) ought to die in a fire with all the rest of 80s tech innovations. They are wasteful and messy,

and leaning on them heavily for your telemetry will make you a worse developer over time.

Better mental model preserves request context (honeycomb, that @lyddonb talk I'm always citing, etc). Also having a place: counters/gauges (metrics), and a debugger to step thru execution.

Logs jumble up these categories into weak, spammy little niblets: a fragment of context, a counter value, a bit of verbose output around some piece of logic.

This trains you to debug by intuition + by searching for a string you already know to exist.

This rewards cargo culting and pattern matching, not debugging. They may have done okay for you in a world of known unknowns, but they will not serve you in a world where most scenarios you encounter are novel.

(Microservices are often a tipping point.)

Debugging complex systems requires a very different motion than debugging a system whose patterns you are familiar with.

Instead of grepping for something that gets you within spitting distance of the problem, you... can't do that, because you have no idea what the problem is.

All you know is that there's a spike in errors or latency to some other service, and your job is to figure out why, and whether it's your team's responsibility or another's, or a user's (or users').

So you have to start at the top, with this rando spike, and trace it down to

the raw requests. Are all of the errors:

* a particular user?
* coming from/going to an IP?
* a particular client, or client version?
* handled by an older app version?
* unhandled language type?
* hitting a shard where a migration hasn't run yet?
* any combination or subset ^?

(and those are the easy ones. Ask me about the time Parse Push went down for eastern Europe, thanks to a combination of ASGs, UDP packet size overflow, and this one router in Poland that didn't failover DNS to TCP correctly.)

In distributed systems, that's called a "Wednesday".

If you just keep grepping for patterns you've seen before, if you just keep flipping through dashboards carefully crafted to find your last problems, you may very well never find the answer before someone else's blind fumbling accidentally restores order.

Starting high level and repeatedly drilling down, following the bread crumbs of clues, isn't that hard. It's far easier than what you're trying to do now. ☺️

But it is a sequence of high cardinality queries, and your tools havent supported that. You have been

papering over the gap by copy pasting id's from monitoring tools into logs into tracing tools. Good for you. Thanks for keeping the world alive. 🙏🌷

Now think how much time you could reclaim to use on better things if you used @honeycombio. ☺️  http://honeycomb.io/signup 

... how is it that I have not yet printed up a sticker that says,

"FUCK LOGS"

??!!? Or better yet..

"Fuck strings. Fuck debugging. Fuck a job. Fuck a career. Fuck an indecipherable amorphous mess of remnants of once-proud thoughts. FUCK LOGS"


You can follow @mipsytipsy.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.