Solving Human-Caused Failure Problems
Guest post by Charles J. Latino
At the root of most mechanical and system failures lurks a human cause. Insights into what to look for when solving human-caused failures are essential. Human error is generally described as behavior that goes beyond the norm. A proper definition in the context of this article is, “an action planned but not carried out according to the plan”. To find a means of minimizing human error, one must first understand its characteristics:
- Low-stress error
- High-stress error
- Error/change phenomena
When someone goes to a drawer, opens it and forgets why they came to the drawer, we have a classical example of a low stress error – an action planned but not carried out according to the plan. Or when one dresses for golf on Saturday, grabs his clubs and throws them in the back of the station wagon, and then drives to work. Someone who was interrupted by phone calls while disrobing for a shower later finds him in the shower partially clothed. We do these silly things because we are human; however, these kinds of human errors in processing situations can bring about serious consequences.
Take an example of an actual process control board with an in-line series of start/stop buttons about 0.5 in. (13mm) apart. The pushbuttons were designed on this board to control a number of operating pumps. It is not known how dangerous it is to operate the wrong pump or stop the flow of a particular fluid in the installation, but obviously it was not designed for random pump starts and stops. It is calculated, however, that there is one chance in 150 that the wrong pushbutton will be pushed or released. When the wrong pushbutton is activated or deactivated, it is an action planned but not carried out according to the plan. Therefore, by definition this design is a human error breeding ground.
The likelihood of error in this can be minimized by increasing the spacing between pushbutton switches, by offsetting adjacent switches, by color-coding alternate pushbuttons, or by changing the feel of adjacent switches. In fact these start/stop buttons can be designed so that there is only one chance in 5,000 of a human error. Human factors engineering, a science seldom used in the process industry, can reduce human error potential.
When Three Mile Island went out of control, over 100 alarms and whistles went off and the people operating the reactor did the wrong things. For example, they actually shut off the main cooling system that was so desperately needed in the emergency. Unfortunately, this type of high-stress error is not uncommon. According to a study on the error potential of people who unexpectedly are faced with imminent danger (1), if they have only one minute to react to an out-of-control situation there is a 99.9% chance of doing the wrong thing. There is a 90% chance if they have five minutes to react, 10% with a half hour to react, and 1% (still too much) with 2 hours to react.
This type of error can be avoided or minimized by:
- Hazard risk analysis for advance warning of potential hazards.
- Automated designs for short time intervals for decision making that are too rapid for humans to react.
- Design of clear information displays and systems that do not confuse and disorient people when upset conditions occur.
- Practice and rehearsal training in how to cope with system upsets.
It must be recognized that a catastrophe seldom, if ever, is caused by one human error. The premise that error is part of the human condition prompts us to accept that we are surrounded by human error – it permeates our environment. So why don’t we blow ourselves away? The answer lies in the error/change phenomena.
Every error results in a change in our environment. The error/change phenomena may occur unmistakably like the severe temperature rise one sees when a compressor is operated without a load or it may occur subtly like the vibration one feels when a reactor is upset.
A prerequisite for disaster is to have a number of these error/changes queue up in a particular pattern.
Researchers tell us that it takes about 14 of these error/change components of a chain to provide us with a bona fide disaster. The reason we survive this awesome potential is because we are continually noticing these changes and taking action to break the chains.
As pointed out earlier, to a large degree one can design out the potential for human error by making greater use of the science of human factors engineering. Hazard/risk assessments and failure mode effect analysis will help us be forewarned of the potential for disaster. Finally, employees have to be trained to be proactive and sensitive to the changes in their environment.
One must be keenly aware of human error as a likely cause for mechanical or process failure and find out the cause of error. When a human cause is detected, be alert that only one such factor does not cause a failure and the search has to continue.
One of the greatest causes of human unreliability is alienation. If people are relegated to work that does not challenge them, they will surely become alienated and withdraw their innate ability to add value to their work. If they are made subordinate to the machines they operate, they will generally withdraw their creativity. In a plant that had 50 employees working in a very hot environment – over 100oF (38oC), three robots were purchased, displacing eight of those people. The room temperature was then reduced to protect the robots’ electronics. What kind of signal did this communicate to the remaining 42 people? Could a failure analyst at this plant ignore this condition as a possible issue in an investigation of machine or process failure?
If the work must be carried out in an environment that is too hot, too cold or too unsafe, in the perception of the people doing the work, they will likely feel alienated and just follow orders, not adding anything of them to the job. Similarly, if people must work standing on concrete floors or maintain equipment that requires a contortionist to service, they will perceive that management places a low value on their worth.
If part deliveries repeatedly idle people, if safety clearances are not approached seriously if people’s need to interact with other humans is ignored or unjustifiably discouraged, workers will withdraw the very attributes that separate them from a robot, such as their creativity and their ability to add value to their work.
If the supervision is always task-oriented without due concern for well being of employees, people will stop or at least slow down their human contributions to the enterprise. If people are denied recognition for their accomplishments and if they are constantly struggling to find out what standards they are supposed to meet to gain recognition, eventually they will struggle no more and go with the flow, doing only what they are told, no more no less.
When a person lacks the knowledge or skill to perform his job we term this a human deficiency. It is difficult psychologically to reveal deficiencies in our society, particularly in the industrial society. People generally compensate for the deficiency by forming mental models that they attempt to follow. If the behavior of someone else or their own trail appears successful, people may try to adopt this behavior as a model for that situation or one that is similar. In most situations, these people get by or even become comfortable while functioning without understanding the operations they are trying to control. In some situations, however, the results of human deficiencies can be less than desirable, sometime even disastrous. The problem is compounded when these deficient humans are asked to train others.
Characteristics of deficiencies are: When jobs are new or changed a high potential for deficiencies exists. If the tasks are very complex the potential for deficiencies will also be high. High sensory loads where the individual must respond to a variety of signals will test competence. If one has a task that must be accurately performed but only occur rarely such as certain disaster sequences, severe deficiencies may exist when expertise is most needed. In some cases people selected to perform specific work may not have the ability or aptitude to perform the work.
People at work who are alienated but know how to perform the task at hand will usually do the job correctly if they know they are being observed. On the other hand, people who suffer from deficiencies will not be able to perform the job when being observed, as they simply are not trained to perform the work.
Human problems can be suspected if an operation has an inordinate amount of downtime that can either be observed from records or found by questioning personnel. A human problem is even more likely to exist if the downtime is recurring for the same root reasons.
High turnover is often a sign of an organizational malady. Turnover can be observed from personnel files or arrived at by questioning. Poor morale is another indicator of poor organizational health. Questioning can test morale but it requires a sensitivity to look beyond the answers, such as the tone and inflections used and the hidden meaning of the words spoken by the employees.
One of the ways of gauging the health of a facility is by analyzing “protective reports”. If there are a great deal of letters and reports with a lot of copyholders it can mean that people are afraid to make decisions without spreading the responsibility just in case something goes sour.
When interviews are conducted to gather failure information, there are some simple but very important guidelines to consider. The interviewer’s inherent prejudices must be controlled during the interview and the analysis of the data. The goals of the interviews should be clearly and succinctly presented. Always start the interview with broad questions narrowing the field of inquiry and giving enough time to respond. It is important to listen and be respectful of the silence created while the respondent is thinking. The presence of electronic tape machines often reduces the candor of the interview. Note taking is recommended that will not interrupt the flow of the respondent’s answers, unless electronic recording is essential in a formal inquiry.
Human Problem Solving
The definition of the problem is essential, but often this step is overlooked. If the plant were operating efficiently, if their equipment had not failed, if the cash flow was positive, the company possibly would not have a problem. The problem definition should start with the comparison of two states: the ideal state that is to be achieved and the state the company is in. Careful attention to the description of both states is required as first step to human problem solving. Problem statements are not cause statements. The problem-free or ideal state can be attained only by eliminating the causes for departing from the state.
Four main causes of human unreliability are:
- Human error: low-stress error and high-stress error
- Human deficiency or lack of knowledge or skill
- Alienation: withdrawal of the ability to add value to work and suppression of the human asset
- Lack of motivation or incentives to do the work
Available information on each category is evaluated to see the possibility of hypothesizing a cause. Each hypothesizing must then be tested before a cause can be established. In dealing with human performance problems, multiple causes are expected for some of our problems. Start with signals or indicators. What was presented in each category should serve to point you in the right direction.
For human error look for:
- An action planned but not carried out according to the plan
- An event that must be controlled but occurs infrequently
- An event that should have been avoided but occurs infrequently
- An unlikely chain of omissions and/or wrong actions that would have had to occur to cause the problem
For human deficiencies look if:
- One or more persons are observed not performing the intended task
- New technology has recently been introduced
- New people are involved
- Task needed to be accomplished occurs infrequently
- Training is not competency-based; not tailored or irrelevant to the job; only knowledge-based without proper regard for skill
- Task calls for application of a principle that required decision skills
For alienation look if:
- The task lacks imagination or challenge
- Logistics are poor
- Layouts and human factors engineering are not considered in the facility design
- Supervision is largely work- or task-centered
- Machines, not people, are the focus of conversations respecting productivity
- Working conditions are at extremes respecting housekeeping, temperature, and hazards
- Management lacks consistency and predictability
- No mechanism exists for exchanging views on productivity issues
For motivation or incentive causes look if:
- Tasks are socially unacceptable
- Feedback mechanisms are lacking
- No means exists for recognizing unusual contributions
- People considered stars are always given the challenging work
- Informational session on company products, progress and problems are not routinely held
Undoubtedly, the United States has the technology to produce the most goods, the best quality and at the right price. Why then is it losing ground? Perhaps the US has become so captivated with technology that it has forgotten the essential element that makes it all work, the human being. Technology must be designed and introduced to serve us, not just designed for us to serve it. Most of all we must honor human dignity by providing meaningful reward systems at the workplace.
About the Author
Charles J. Latino, (1929-2007) Founder of Reliability Center, Inc., was a chemical engineer with a background in psychology and human factors engineering. He was a leader in the development of an integrated approach to achieving greater reliability in manufacturing and industrial systems and processes. He served as consultant to many companies in the United States and abroad. He is the author of Strive for Excellence…The Reliability Approach. He has left his Reliability legacy to his wife and five children who continue to spread his visionary Reliability Approach to companies throughout the world.
Swain, A.D., and H.E. Guttman, “Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications,” Sandia Labs, Albuquerque, NM for U.S. Nuclear Regulatory Commission (Oct., 1980)