I try not to make this newsletter about current events, but it’s hard to avoid some of the lessons learned from this week. The first sentence in the last paragraph says so much (emphasis mine)
We’ve done extensive work hardening our systems to prevent unauthorized access, and it was **interesting** to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making.
I think this may be a lesson in the fact that once your application and organization gets to a sufficient size the only way to properly test failures is through practices like chaos engineering. Unknown unknowns are the things you can’t plan for.