Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 Jan. 30, 2019 2 min read

Another day, another article written about the @honeycombio-shaped hole in the world of operational tooling, without -- somehow -- ever mentioning honeycomb. 🤨  https://www.sumologic.com/blog/business-insights/kubernetes-monitoring-kubecon/ 

I am done grinding my teeth (didn't take long, loads of practice) and will instead recap it for funsies.

First off, we marvel at the growth of kubecon and nod to its intense (lol) complexity. Traditional monitoring won't work, this sounds like a job for OBSERVABILITYMAN 🦸‍♂️!!

Then the article (that doesn't mention us) links to a terrific talk (that doesn't mention us), on why the old school "three pillars" model for o11y is fatally flawed.

It's a great talk, you should read the slides.  https://schd.ws/hosted_files/kccna18/bf/Three%20Pillars%20with%20Zero%20Answers%20-%20A%20New%20Observability%20Scorecard%20%28Kubecon%20Seattle%202018%29.pdf 

Then we mention ballooning costs (yup, esp since you are paying half a dozen vendors to understand the same events slightly differently) and that tracing is a life saver in distributed systems like k8s (yup).

Now brace yourself for the greatest non sequitur leap of the new year:

... Prometheus is king! Winning everywhere!

"But wait," I hear you asking. "Is Prometheus going to help with any of those problems, of complexity and end to end tracing and the request path? Does Prometheus even do observability? Isn't it metrics and preaggregated dashboards?"

You, dear reader, have been paying attention. If you buy the highly technical, control theory-derived definition of observability that I do, then tools based on metrics (the technical definition: a number with appended tags) will never be observability tools.

Why? Because they have torn up and discarded all the connective tissue of the event before they ever write to disk.

That connective tissue is exactly what you needed to reason about the internal workings of your system as a developer. It tracks your user experience too.

I am not saying metrics/aggregate tools have no value! They can have loads of value. Primarily for ops use cases, like provisioning and system health.

Developers don't give a shit about system health. They care about the health of *each individual request*. Events.

This is the most prominent difference between monitoring (ops uses, aggregates) and observability (dev use cases, unique events).

I mean, ops doesn't give a shit about each and every request either, as long as the system is healthy and errors below SLOs.

Prometheus is a monitoring tool. It is a *great* one. It represents the peak of time series dashboard software in the wild.

My problem with it is that they claim it's more than that. Which leads to a very bad experience for users with more honeycomb-ish shaped problems.

Case in point.

Case in point.


You can follow @mipsytipsy.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.