Quick thoughts on short papers: A typology of organisational cultures

In this six page paper from 2004 (based on work dating back to 1988), R Westrum proposes that organizational cultures approach information flow in one of three ways:

The first is a preoccupation with personal power, needs, and glory. The second is a preoccupation with rules, positions, and departmental turf. The third is a concentration on the mission itself, as opposed to a concentration on persons or positions. I call these, respectively, pathological, bureaucratic, and generative patterns.

Westrum provides a superb table of examples. This table definitely spoke to me the first time I read the paper:

This topology has been widely adopted in the DevOps literature. Accelerate reports that generative culture predicts (better) software delivery performance, job satisfaction, and organizational performance (p209).

Beyond the typology itself, I appreciate the clear discussion of how leadership’s preferences influence culture:

The underlying idea is that leaders, by their preoccupations, shape a unit’s culture. Through their symbolic actions, as well as rewards and punishments, leaders communicate what they feel is important. These preferences then become the preoccupation of the organisation’s workforce, because rewards, punishments, and resources follow the leader’s preferences. Those who align with the preferences will be rewarded, and those who do not will be set aside. Most long time organisation members instinctively know how to read the signs of the times and those who do not soon get expensive lessons.

A decade later, Mickey Dikerson suggets that things remain the same in his essay in Seeking SRE:

So, these processes determine the long-term behavior of your company and every system you manage. What do they reward? Ignore what the company says it rewards; instead, look at the list of who was promoted. Behaviors associated with these people will be emulated. Behaviors associated with those left behind will not. This evolutionary pressure will overwhelm any stated intentions of the company leaders.

When all you have is a hammer, everything is a nail. Nonetheless, I feel like the most painful organizational tensions I’ve experienced in my career all had cultural misalignment as a strongly contributing factor. I find these paragraphs a powerful tool for detecting organizational dysfunction.

Finally, there is discussion of bureaucratic culture being the “default value”. This leads to a line of inquiry to long for a QTSP: what other high-impact defaults exist in (software) companies, and where do they come from? For example, how did the Five Whys make it from the Toyota Production System into seemingly everyone’s postmortem templates?

Thanks to Randall Koutnik for recent discussions on Westrum’s topology!

How Complex Systems Fail Strikes Back

Poster by Olly Moss

When I wrote up QTSP on How Complex Systems Fail two weeks ago, I forgot to include other interesting reviews of the paper.

The first, unsurprisingly, is from John Allspaw in 2009 — this is before Allspaw coined “blameless postmortem”. Allspaw rejects the paper and embraces strict adherence to the Toyota Production System embraces the paper:

I don’t think I can overstate how right-on this paper is, with respect to the challenges, solutions, observations, and concerns involved with operating a medium to large web infrastructure.

It is interesting to see “early Allspaw’s” view on topics like the 5 Whys:

I believe that even a rudimentary process of “5 Whys” has value. (Update: I did when I first wrote this. Now, I do not. ) But at the same time, I also think that there is something in the spirit of this paragraph, which is that there is a danger in standing behind a single underlying cause when there are systemic failures involved.

There are probably many worse ways to spend your time than walking parts of the “Allspaw trail”, even a decade removed.

Six years later, the don of paper blogging, Adrian Colyer of The Morning Paper fame, picks up the mantle:

This is a wonderfully short and easy to read paper looking at how complex systems fail – it’s written by a Doctor (MD) in the context of systems of patient care, but that makes it all the more fun to translate the lessons into complex IT systems, including their human operator components.

I think about Cook’s paper often. Recently I’ve been thinking about #18, failure free operations require experience with failure. This is seemingly a paradox — we want to reduce failure, which requires experience from failure. Where does this experience comes from once the failure is reduced?

Some interesting answers might be learning focused postmortems where we can learn from failure indirectly, and chaos engineering experiments where we can learn from failure in controlled conditions. The “resilience in software” community’s focus on these domains begins to come into focus…