there is no substitute for *actually looking at production graphs*. none.
you cannot rely on paging alerts to find bugs for you; the overwhelming majority of bugs will never trip a pager threshold, and that is right and proper.
you can find those bugs the hard way (waiting for a quorum of users to complain, the bugs to be triaged, and slowly wend their way back to you over the course of months)
or the easy way (instrument as you go, look at your graphs often and follow your nose if something seems off)
the point is not to make your paging alerts so gosh darn sensitive that they are constantly going off and generating false positives.
the point is to align on call pain with real user pain (ideally via SLOs). and find your bugs in other, more sustainable ways.
it's like if we expected people to go to the emergency room for every scrape and bump and general fitness exams, instead of proactively investing in their health with a primary care physician, can you imagine how stupid and wasteful that would be ha ha h-ohh wait
You can follow @mipsytipsy.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Enjoy Threader? Sign up.
Since you’re here...
... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.