Lorin is blogging

In the last week, Lorin Hochstein has posted five new posts on his blog.  What makes this particularly exciting is that, as far as I know, Lorin is one of the few full-time resilience practitioners working in software. He is a member of Netflix’s Cloud Operations and Reliability Engineering (CORE)  team. Surprisingly, not much has been written about CORE. Perhaps the most insightful text is their Senior Resilience Engineering Advocate job post:

The goal of the education and outreach function of the SRE team is to improve reliability and resilience of Netflix services by focusing on the people within the company, since it’s the normal, everyday work of Netflix employees that creates our availability. To cultivate operational excellence, we reveal risks, identify opportunities, and facilitate the transfer of skills and expertise of our staff by sharing experiences.

Following an operational surprise, we seek to understand what the world looked like from the perspective of the people involved. We facilitate interviews, analyze joint activity, and produce artifacts like written narrative documents. Relationship building is a huge part of this role. Someone advocating for resilience engineering within Netflix will help stakeholders realize when this type of work is most effective.

We Think About

Netflix as a socio-technical system is formed from the interaction of people and software. This system has many components and is constantly undergoing change. Unforseen interactions are common and operational surprises arise from perfect storms of events.

Surprises over incidents and recovery more than prevention.We encourage highlighting good catches, the things that help make us better, and the capacity we develop to successfully minimize the consequences of encountering inevitable failure. A holistic view of our work involves paying attention to how we are confronted with surprises every day and the actions we take to cope with them.

This is not the sort of language you typically see in the job description for an SRE! Lorin’s colleague Ryan Kitchens also gave a talk at SRECon19 that touches on CORE’s approach.

Back to those blog posts

Many of the topics Lorin has written on in this flurry of posts have been covered before by folks like Allspaw, Cook, Dekker, or Woods. Lorin’s writing is a great counterpoint to these voices and I think well suited for those new to resilience. He writes plainly and smoothly, and with the perspective of someone who has been in the trenches doing the work for some time.

Lorin’s recent blogposts cover:

I look forward to more blogs from Lorin, even if he ends up slowing down a bit. Lorin is also not the only resilience practitioner blogging nor do they all work at Netflix. I hope to highlight others’ great work in future blog posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s