Charity Majors @mipsytipsy CTO @honeycombio, ex-Parse, Facebook, Linden Lab; cowrote Database Reliability Engineering; loves whiskey, rainbows. I test in production and so do you. 🌈🖤 May. 05, 2019 1 min read

Free o11y tip of the day:

If you use nginx or similar at your edge, wrap all calls out to other services and dbs with a function that logs duration to a header. Then you can swiftly identify *any* source of latency or errors using only nginx/access.log.  https://www.honeycomb.io/blog/tell-me-more-nginx/ 

You emit all the header variables containing service hop times in your log, which serves as a shockingly effective poor man's distributed tracing/service map.

You'll want to feed it into a tool that lets you slice&dice; you can set this up on @honeycombio free tier in <15 min.

The hardest part of distributed systems is usually not debugging the code; it's finding where in your system the code you need to debug is.

How many hours have you wasted trying to find where the errors are coming from, or debugging the wrong service?

At Parse, we used the nginx trick with FB scuba ; it dropped our time-to-find-what-to-debug from hours to seconds, reliably.

(At honeycomb, of course, we don't have this problem 😉)

It's actually better than tracing, in some ways. It finds problems like "service y is 25% slower across the board", or "service u is slower but only for requests that issue db read queries to one specific replica in shard z (or writes to all shards with primary in region x)"


You can follow @mipsytipsy.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.