Before you go on, please have a look at British comedian John Oliver’s video on infrastructure – https://www.youtube.com/watch?v=Wpzvaqypav8.
OK, if you are reading this and still haven’t watched the video … please go back and try again. You can do it.
If you haven’t watched the video by now, then I concede defeat. In short, the video is a humorous take on the state of US infrastructure. Particularly bridges. Bridges have been collapsing with alarming frequency in recent years. And after much political wrangling there is still no plan to pay for fixing crumbling columns, spans and struts. It is not as if the federal and state governments don’t know how bad things are (again … watch the video).
Oliver proposes a perhaps novel reason for all of this.
Perhaps people are so enamored with heroes who resolve emergencies that no one even cares about why the emergencies are happening. Perhaps this is driven by Hollywood. The last part of the Oliver’s video is a satirical skit, showing how a movie about politicians, bureaucrats and maintenance staff doing their jobs is … well, boring. Everyone loves a movie where Dwayne Johnson saves the day, particularly if he is missing a leg and is trying to reconnect with his estranged family at the precise moment of said catastrophe.
And the same can be true of organizations. When a crisis arises, heroes emerge. And these heroes (even in an engineering production department) etch a romantic vision in our minds. They become the ‘go to guys.’ And once the fatigue sets in after the heroes save the day, we forget about why the crisis occurred in the first place.
I call this phenomenon an ‘organizational endorphin.’ An ‘endorphin’ is a hormone that is released into our blood stream in certain circumstances. Endorphins block pain and cause euphoria. But there is a catch. The things that happen to a human body to spur the release of endorphins are often not good. Think things like severe injury (losing an arm in a machine), a heart attack or internal bleeding. All bad.
An organization can seriously ‘injure’ itself when it creates an unreliable product. But when the heroes emerge to resolve the problem, sometimes the only thing we remember is the euphoria of solving the problem in a high pressure situation. The endorphins have done their job. And because we are humans, it is easy to forget or not even realize that the organization was (and remains) seriously injured.
And as W. Edwards Deming stated:
Stamping out fires is a lot of fun, but it is only putting things back the way they were.
So what does this mean? Well, if all you do is admire the heroes, then it is only a matter of time until the crisis happens again. And the same heroes emerge. And we get used to feeling good about them solving everything. And we don’t focus on not allowing fires to take hold in the first place.
But … heroes (like firefighters) don’t ‘solve’ problems. They stop them from getting worse. Firefighters may put the blaze out in your house, saving lives and your neighbors’ houses in the process. But you still need to rebuild your house.
Organizations can make people blind to this. For example, a new car is due to be launched in March. But its FMECA was not done well. And because it was not done well, the possibility for a ‘sneak’ circuit wasn’t seriously considered. New, improved airbags were installed that deploy much faster. But the sneak circuit in some circumstances would arm and trigger the airbags to deploy if the sound system was left on. And just before launch, these airbags started to deploy when vehicle testers turned off their test cars.
The electronics team (who coincidentally were part of the inadequate FMECA group) are now called in. They work around the clock. And after 10 days, they identify the sneak circuit. They add an additional relay, and the problem is resolved. But the launch a planned had to be cancelled. Journalists and VIP guests were told that the vehicle was not ready. One week later, the same guests were re-invited to the now revised launch ceremony. But only a fraction could turn up as their calendars were understandably full.
But, this organization then gives an achievement award to the electronics team for their outstanding work in identifying the sneak circuit. And their work (in this context) was outstanding. ANd
Many organizations wander into this paradigm. Others actively encourage it.
For example, consider a piece of software. It was coded by a small team. And it passed all the tests. But three months later, something happens. Who do we call? The original team. And nine times out of ten, we don’t ask them why they didn’t write more robust code in the first place.
Instead, these software engineers get valuable face time with senior management through the act of solving a problem. CEOs and VPs observe them first hand, working feverishly with skill-sets most mortals can only dream about. And this is the lasting image we all have of them.
Even geniuses can be lazy. For those software engineers out there … how many times has software been written by really smart people, but in a way that it is almost impossible for another engineer to interrogate or debug? There are no comments. No explanatory notes. Variable names mean something to the original author only. We don’t even know what algorithms they are using. There is no naming convention. And so on.
So the only people who can fix the software, are the people who created the issue in the first place.
But, as with everything in life. This is not their fault.
It is management’s fault.
One of management’s key responsibilities is to motivate its people. And in an organization that loves firefighting more than anything else, no one is motivated to prevent fires from starting in the first place.
What? How is this possible? Surely every manager wants their team to be proactive and stop fires from happening. Of course they do. They might even implore their workers to never forget to about the importance of not cutting corners so to prevent ‘fires’ in the future. But it is still possible to do this while motivating them to do the complete opposite.
If the same manager who implores their workforce to be proactive in preventing fires then goes on to assess performance in terms of budget and schedule, they are contradicting themselves in practice. There are many organizations where design budget and schedule are the only things that are focused on.
Preventing fires takes some time and resources. This means that preventing fires will have a negative impact on design budget and schedule. And if these are the only things that your manager focuses on, you are going to get yourself in trouble if you try to prevent fires.
As a rule, the time and resources associated with fighting fires are virtually nothing compared to the cost of fighting a fire. Even when the firefighting hero comes in and saves the day, you have to pay him or her (and their team) for their time. And you will almost certainly introduce a schedule delay. Which has follow on effects. The worst case scenario involves your firefighting team fixing a problem with the product after the customer has received it. The cost to reputation is enduring.
‘But what about Microsoft?’ some might say. Microsoft is famous for having a heavy reliance on its firefighting crew … or its ongoing software support team. If you think organizations like this give you a green light to continue on your merry way, fighting fires without addressing their root cause, you are sadly mistaken.
Microsoft has never forgotten to ‘hate’ fires. It tests its software products a lot before release. But because their products are so complex with so many users, there needs to be a level of pragmatism regarding how many fires can be prevented. And Microsoft has also made sure its fire fighting processes are as efficient as possible. Problems with your computer are automatically transmitted to Microsoft every day. And one fire experienced by a user in Norway will (hopefully) quickly result in a software patch that gets deployed across the world, preventing the same fire from happening to a user in Hawaii.
So if you look at your organization, and you see these legendary figures who resolved issues in your products after they were released … ask yourself if your organization ‘hates’ fires enough. Have a look at what lead designers values the most. Is it budget and schedule? If yes … there is a chance that there is no ‘hatred of fires.’
So what can you do?
Make the case. With dollar figures.
Most of the time, fire fighting costs gets absorbed in other lines of funding. Call it field service. Call it customer support. Call it whatever you want, but you (the reliability engineer) should try and work out two things:
- How much did it cost to put out the fire?
- How much did the fire damage cost?
These are different buckets of gold. Successfully fighting a fire might mean you still have to rebuild your house. The same applies to your organization. If you have to quickly deploy a team to your customer site because their entire infrastructure has gone down because your generator failed to start, this will have a lingering effect. You might deploy your team there, and find that a communications cable had fallen out of its socket. Simply reinserting it and securing it more firmly may seem like the problem has been solved.
But you can’t solve the problem of the ongoing damage to your reputation. Is your customer less likely to go with you next time they upgrade their system? Absolutely. And this is a lot of money to lose.
Even making some basic assumptions about the net loss can paint a picture. If in the scenario above you estimate that the issue meant that (conservatively) your customer is 20 per cent less likely to engage your company to provide their next generator, and your generators cost $ 1 000 000, then the ‘fire’ cost you 20 per cent of $ 1 000 000 of net potential revenue. That is $ 200 000.
Create a compelling business case that illuminates the true cost of fires. If your management team is competent … they should listen (assuming you are credible that is).
And if you are a manager, ask yourself what you value more. Really value more. Preventing fires or fighting fires? And then work out how you practically motivate your workers. Don’t kid yourself that imploring your workforce to do something without rewarding them for that behavior (directly or indirectly) will have a lasting effect.
As they say,
a stitch in time saves nine.
I would suggest in the domain of consumer products and services, the ratio starts at one hundred and then goes up.