The Day the Cement Ball Mill Went Silent

What It Taught Me About Reliability Engineering

It was a Tuesday morning, the kind that starts quietly and then doesn’t. I was on shift at the cement plant when the alarm hit the control room. One moment everything was running within parameters. The next, the Ball Mill had tripped, and the entire production line was holding its breath.

No dramatic explosion. No single obvious failure. Just silence. The kind of silence in a plant that costs real money by the minute.

I’ve been in reliability engineering long enough to know that the first question is never “what broke?” The right question is always “why did it break here, why now, and why didn’t we see it coming?”

Reliability Is Not a Maintenance Problem. It’s a Thinking Problem.

During my years at Cargill Ghana, working across mechanical and reliability engineering, I observed the same pattern in almost every significant equipment failure. The technical fault, a bearing, a seal, a sensor, was rarely the true root cause. It was the last domino to fall.

Before it, there was a maintenance plan that hadn’t been updated in 18 months. Or a work order raised but never prioritized. Or a trend in the data that someone saw but didn’t act on because the production schedule was tight.

Reliability engineering, done properly, is a discipline of thinking before things go wrong. It’s asking: what could fail, how would it fail, what would it cost, and what would it take to prevent it? This is not abstract. This is the daily work.

What Root Cause Analysis Actually Demands of You

Back to that Tuesday. Once we stabilized the situation and coordinated with the maintenance team, the real work began: root cause analysis. And here is what I’ve learned: RCA is not a paperwork exercise. It is an honest conversation with reality.

You have to be willing to trace the chain of events backwards without ego. Because the answer might reveal a gap in the preventive maintenance schedule, one you helped design. Or a recommendation you made that wasn’t followed. Or a pattern in downtime data that the team normalized because it was “manageable.”

The engineers and plants that build world-class reliability don’t just fix failures. They get uncomfortable enough to understand them fully.

Three Things I Wish Someone Had Told Me Earlier

1. Your CMMS is only as good as the discipline behind it.

SAP, MAXIMO. These tools are powerful. But I’ve seen plants where work orders are raised and closed without meaningful action, where data sits unreviewed, where the system becomes a logging exercise rather than a decision-making tool. The technology means nothing without the habits and culture to use it correctly.

2. Predictive maintenance is not a budget line. It’s an investment thesis.

Every time I used non-destructive testing techniques or analyzed data from RtDuet to catch a potential failure before it cascaded, the business saved multiples of what the intervention cost. The challenge in many African industrial operations is convincing decision-makers of this before the failure happens, not after. That’s a communication and data challenge as much as a technical one.

3. OEE tells the truth, but only if you let it.

Overall Equipment Effectiveness is one of the most honest metrics in operations. It doesn’t care about excuses. It reflects the combined impact of availability, performance, and quality. When you track it rigorously, it tells you exactly where your reliability strategy is working and where it isn’t. Don’t cherry-pick the numbers. Let OEE hold you accountable.

Reliability engineering process flow from proactive to preventive maintenance

The Bigger Picture: What This Means for Engineering in Africa

One thing I’ve observed across my career, from interning on the plant floor to running shifts as a control room operator, is that Africa’s industrial operations have enormous potential that is regularly undermined by avoidable failures. Failures that, with the right reliability culture in place, simply wouldn’t happen.

We have the talent. We have the engineers. What we sometimes lack is the institutional discipline: the systems, the data hygiene, the feedback loops that turn reactive maintenance cultures into proactive ones.

Building world-class operations on this continent isn’t about importing frameworks wholesale. It’s about developing engineers who think in first principles, who take reliability personally, and who understand that every hour of unplanned downtime is a decision someone made, or didn’t make, weeks before the alarm went off.

Closing Thought

That Tuesday, after the Ball Mill came back online, I sat down and wrote three pages of notes. Not a formal RCA report. Just honest reflections on what I had seen, what I had missed, and what I would do differently. I still do this after significant events.

Reliability engineering is ultimately a practice of continuous learning. The plants that win are not the ones with the most sophisticated equipment. They’re the ones staffed by people who refuse to stop asking why.

What’s the most important reliability lesson your career has taught you? I’d love to hear it in the comments.