Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 May. 21, 2019 2 min read

this is a beaut of a post applying observability principles to the nexus of software and physical objects: robotics. two key grafs:

1) "Debugging in robotics is often highly visual."

i would extend this argument: debugging is highly visual for everyone -- _observability_ is highly visual.

i say this as someone who struggles heavily with visualizations; i can whip out a multi-line sed/awk query in a smidge of the time it takes me to construct a graph.

but systems people, this is not a good thing. our reliance on textual processing is a crutch.

the cli is good at helping us find the things we already know, the patterns we are already looking for. it is shit for unknown-unknowns, and that means it's shit for observability.

2) "sensor data tells part of story, but equally impt is the metadata. Observability for robotics must support high-cardinality dimensions (software version, hw version, experiment ID, cust ID, site ID, etc) and first-class support for one of the most important dimensions: time"

as the article goes on to describe, observability can be used equally to debug failures in the system itself, or the application running on the system, or combinations/interactions between the two.

think on this a sec.

one of the ways you can look at the history of the space is that there's long been a divide between tools we use to debug and understand applications (APM, gdb, IDEs etc) and tools we use to debug and understand the systems they run on (monitoring, metrics).

there are many reasons for this, but the big ones are obviously people and money.

dev and ops were extremely specialized teams, with wildly different priorities and cultures. hardware was fucking expensive, so we were incentivized to optimize for a narrower/cheaper use case.

all of the reasons that led to monitoring/logs/APM category definitions are disintegrating.

now we know that developing code and running it in prod are twinned phases of a single lifecycle; strict roles make bad software. and storage is cheap to the point of disposable.

microservices thrust more operability concerns directly under the noses of developers, quite rudely, and the accelerating level of complexity is making it impossible to understand systems in way *but* observability.

which, as the article helpfully recaps, consists of pairing "what happened" with "what was the full context of the world when that happened", aka my arbitrarily-wide structured data blobs, or summary events (thanks @noahsussman 😑)

oops, weird, they deleted the tweet, but here it is again:

You can follow @mipsytipsy.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Since you’re here...

... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.