Guest Post by Bill Pomfret (first posted on CERM ® RISK INSIGHTS – reposted here with permission)
In any given PSM audit, the auditors will usually face one or more situations that represent a dilemma because the situation has not happened before, or no thought has been given on how to resolve it. These dilemmas usually require the individual auditor and/or the audit team leader resolve the situation in the field. These “on the fly” resolutions require both astute judgment and practical solutions that fit not only the governing regulations or company/facility standards for PSM, but also how those mandatory requirements should be interpreted and applied to the specific design, operations, and PSM program of the facility being audited.
The PSM audit dilemmas described in this paper are based on the experiences of seasoned auditors during actual PSM audits. Several of them are PSM adaptations of those published by Safety Projects International Inc., et al in 5 Star Health & Safety Management System™ Health and Safety Audits. In general, there are no absolute right or wrong answers to solving these dilemmas, although in some cases a resolution might seem obvious.
Typical dilemmas faced by PSM auditors are described below by way of examples with proposed resolutions or conclusions.
- The De-Minimis Sampling Issue
- The Voluntary Protection Program (VPP) Defense
- Practices Not Institutionalized
- The Boundaries of PSM Programs
- When Facility/Company Requirements Exceed Regulations
- Repeat findings
- Just-in-Time Compliance
- Stretched Definition of “Annual”
- “Timely” and how is it measured
- Recognized and generally accepted good engineering practices (RAGAGEPs) and what they require
- What Does Replacement-In-Kind Mean?
The De-Minimis Sampling Issue
Dilemma: As a PSM audit team member, you have been given the responsibility to audit the Process Knowledge Management element of a specialty batch chemical facility which uses or produces a large number of chemicals, including a dozen toxic/reactive materials and several dozen flammable materials or mixtures. You have just completed your review of the MSDS file, which the PSM Coordinator has told you suffice for the process safety knowledge regarding the chemicals.
MSDSs were available for all of these chemicals at the facility, and they were generally up to date. However, there was one exception – the MSDS for a flammable mixture that is created at the facility as part of the manufacturing process for a particular product was not available. This mixture is only onsite when a campaign to make the product occurs. What is the nature of the audit finding that should be documented in the report? The PSM Coordinator argues that one missing MSDS out of dozens required should not represent a finding. Is there are de-minimis level of sampled records below which missing or incomplete records should be considered a finding?
Possible Resolution or Conclusion: While even one missing, incomplete, or improperly completed record can constitute a finding, many auditors apply some amount of discretion in creating a finding when the potential sample size is large. In these situations, a pattern of missing, incomplete, or mistaken records is what the auditor is looking for before creating a finding.
However, the application of this discretion has to be tempered by the importance of issue in question. In this particular case, a missing MSDS, even for a material that is onsite for temporary periods of time, would be an important record omission because MSDSs support so many other important PSM and occupational safety and health activities, in addition to the Hazard Communications regulations that require them.
Therefore, this should probably be recorded as a finding, but if the auditor is confident, based on his/her sampling and testing that it is an isolated situation, the finding can be described as representing a unique situation and the recommendation would probably not include a provision to check for all other applicable MSDSs.
Practices Not Institutionalized
Dilemma #1: You are conducting the Contractor Management of a PSM audit at a chemical plant. The PSM Coordinator tells you that a purchasing supervisor screens and approves the safety performance of prospective contractors. He has developed and implemented a pre-qualification questionnaire that includes detailed information about the contractor’s safety program and performance. The PSM Coordinator is able to show you an example of one of the completed questionnaires for a new prospective inspection contractor. When you ask to interview the purchasing supervisor so that you can learn more about the contractor pre-hire screening process and sample additional records, you are informed that the supervisor recently won the state lottery two months ago and is in on an extended vacation period.
To verify whether the contractor pre-screenings are being conducted properly, you request to review the computer records of them and also to review the procedure which governs contractor management. Unfortunately, facility staff cannot gain access to the records because they don’t have the inspector’s password, and a contractor management procedure reflecting how the purchasing department manages contractors has not been developed yet. Calls to the supervisor are not successful – he has turned off his cell phone. How do you handle this situation as an auditor?
Possible Resolution or Conclusion #1: A finding that the records confirming that contractor pre-qualification reviews were not available and could not be reviewed should be created because the sampling and testing for this aspect of the contractor management program could not be completed.
Because the governing regulation does not explicitly require a management system procedure for this PSM element, the finding that a procedure does not exist and that the purchasing supervisor’s good practices are not institutionalized is a finding. Alternatively, the auditor can include a recommendation for the finding (if recommendations are within the scope and objectives of the audit) to include such a management system to help correct the systemic problem.
Dilemma #2: You are conducting an audit of the Process Knowledge Management portion of a specialty batch chemical plant that manufactures a diverse set of products and functions as a toller for several different industries.
New products involving new chemicals and conditions that have not been experienced by the facility before are not an unusual situation. The facility has a reputation for quickly incorporating new chemical products into their processes with very high quality. During a review of the relief device design and design basis records, you notice that many of them are not complete and do not reflect the properties and conditions imposed on the reactor relief device (rupture disks) by the current products being manufactured.
The Engineering Manager states that because of the quick turnaround on product incorporations required by their customers, a full quantitative analysis of the relief device design and design basis is not possible. The pressure vessels, piping, and relief devices are all over-designed and can handle the full range of pressures and temperatures that their processes impose, and there has never been any process leak that has been traced back to an overpressure or overtemperature transient.
Technical interviews with the Engineering Manager and several engineering and operations personnel that are involved in the introduction of new products into the facility indicate that relief device issues are studied qualitatively during MOC safety reviews and HIRAs. This includes a review of the properties of the materials and some simple lab tests that are performed onsite. Interviews with facility engineering and operations personnel involved indicate that they seem to understand the ramifications of the prospective new chemicals on reactor pressure. Reviews of operational records confirm that there have been no pressure transients that resulted in any leakage or releases. How do you handle this situation as an auditor?
Possible Resolution or Conclusion #2: A finding that relief device design and design basis process safety knowledge is missing for the currently installed reactor relief rupture disks should be created. Clearly, from the interviews and records review the process for evaluating new and modified products against the relief design of the reactors is flawed, however, the governing regulations do not explicitly require a management system and internal controls for this, just that is done properly. Therefore, the auditor can include a recommendation (if recommendations are within the scope and objectives of the audit) to include such a management system to help correct the systemic problem, or alternatively,
a finding can be created to address the lack of a management system for evaluating reactor relief capability when products are changed.
Dilemma #3: While auditing the Asset Integrity element of a chemical plant PSM program, the Maintenance Manager states that the requirement for written maintenance procedures is satisfied by the original equipment manufacturer (OEM) manuals. You notice that the office bookshelves of many of the maintenance supervisor’s and engineers contain some of these manuals. Several locations in the various maintenance shops also seem to have them available. A brief review of several of them reveals that some were published decades ago, and, despite the ISO 9001 certification of the facility, the manuals are not formally issued and approved documents. Is this a finding? Why or why not?
Possible Resolution or Conclusion #3: A finding that the maintenance procedures are not up to date should be created if the manuals are actually out-of-date with respect to the equipment and its maintenance. Maintenance personnel should be interviewed to determine if this is true. The finding should not include the fact that the ISO-9001 document control system does not include these procedures unless the ISO document control procedure specifies that the PSM related AI procedures are within the scope of the procedure. ISO is a voluntary method of maintaining the documents, and certainly an ISO-9001 certified facility would likely elect to do that, but it is not a mandatory PSM requirement, even when the ISO certification exists, unless the facility or company has specified that it is to be used for PSM related documents. Therefore, a finding is appropriate.
The Boundaries of PSM Programs
Dilemma: You are conducting a PSM audit of a petrochemical plant with large inventories of flammable materials. All of the processes using, storing, or manufacturing these materials are included in the PSM program as defined by the facility PSM manual. The PSM manual also states that the fixed and mobile fire protection system is also included in the PSM program. When reviewing the HIRAs, you notice that a HIRA of the fire protection system has not been performed, nor has any of the process HIRAs included an analysis of the fire protection system, except to list it as a safeguard. Also, the Asset Integrity program procedures do not include any ITPM information for the fire protection system.
The Safety Manager states in an interview that they test the fire protection system, but a review of the test records reveals that the fire pumps have not been tested in over two years and the fire monitors are lubricated but not flow tested. The only records you can find is monthly external inspections of the sprinkler systems, some testing records by a contractor for the fire alarm system, and a lengthy list of fire extinguishers which shows that dates and technician who inspected them.
You prepare two findings that state that 1) the HIRAs do not include analysis of the fire protection system failures, and 2) that the AI program does not include all of the ITPM tasks specified in NFPA 25 for water-based fire protection equipment. Are these appropriate findings? If yes, what are the appropriate recommendations? If not, why not?
Possible Resolution or Conclusion: The two findings regarding the lack of HIRAs for the fire protection system and missing ITPM tasks required by the governing RAGAGEP for water-based fire protection systems are correct. Fire protection systems are not explicitly required to be included in a PSM program by the governing regulations, and water is not a highly hazardous chemical per those regulations.
Therefore, the inclusion of these systems and equipment in the PSM program of the facility in question is voluntary. However, because the facility has defined the PSM program to include them, the other elements of PSM become compliance requirements. Also, taking credit for the fire protection system in a HIRA means that it has to be functional. If not, then a finding for having an incorrect safeguard in the HIRA could also be written.
When Facility/Company Requirements Exceed Regulations
Dilemma #1: While auditing the AI program at a chemical plant that manufactures toxic materials that are covered by OSHA’s PSM Standard, you determine from document reviews and interviews that the area toxic gas detectors are not included in the AI program. The fixed detectors are used to detect the same highly hazardous chemicals that are covered by the PSM Standard (in this case chlorine), the detectors are located inside the battery limits of the PSM-covered process, the detectors provide indications of chlorine concentration levels and alarms if the concentrations reach pre-set limits. Therefore, they fit the definition of a control, indication, and alarm in the PSM Standard. Do you have a finding? Why or why not?
Possible Resolution or Conclusion #1: Since the fixed chlorine detectors are within the process area covered by the PSM Standard, are specifically intended to indicate, and alarm when a highly hazardous chemical is released, a finding should be written.
Dilemma #2: During the same audit in Dilemma #1, you determine from document reviews and interviews that the portable chlorine gas detectors worn by facility personnel, contractors, and visitors are not included in the AI program. Do you have a finding? Why or why not?
Possible Resolution or Conclusion #2: Although portable toxic gas detectors provide the same type of warning as a fixed detector, these devices are not considered process equipment, but PPE and are worn for industrial hygiene or emergency action plan purposes. Therefore, a finding should be written if the manufacturer specifies some sort of ITPM related activity for them.
Repeat Findings
Dilemma: You are auditing the HIRA element of a of PSM audit. You have observed that the facility has not yet resolved ten recommendations from a PHA performed on the Reactor #1 process two years prior to the audit. The audit report from three years ago includes the following finding: “Fifteen recommendations from the most recent HIRA on the Reactor #3 process are overdue for resolution.” You have determined that none of the ten current unresolved recommendations were overdue three years ago – all of the fifteen recommendations that were noted as overdue during the previous audit were resolved in the intervening three years. Is this a repeat finding? Why or why not?
Possible Resolution or Conclusion: The question in this situation is whether overdue HIRA recommendations in general in successive PSM audit cycles, and not just a particular HIRA recommendation(s) represents a repeated finding. The OSHA Field Operations Manual (OSHA, 2009) states that an employer may be cited for a repeated violation if that employer has been cited previously for the same or substantially similar conditions or hazards. Also, if an originally cited violation has at one point been abated but subsequently recurs, a citation for a repeated violation may be appropriate. Therefore, if overdue HIRA recommendations are findings in audits, even if they are not consecutive audits, and even if the finding is not being written for the same exact overdue recommendations, it should be treated as a repeat finding. Of course, the definition of “timely” is also relevant when determining if a recommendation is overdue. Also, the management system for the HIRA recommendations should also be reviewed to determine if they were included there.
Just-in-Time Compliance
Dilemma: Another auditor during the same PSM audit who is reviewing the MOC program observes that the forms for twelve active and recently completed MOCs are incomplete. Signatures are missing and data various fields on the form requiring information be entered are blank. The auditor prepares a draft finding that facility MOC procedure is not being followed, citing the specific MOCs that are deficient. The PSM Coordinator acknowledges the problems and proceeds to correct them by re-routing the forms to get the incomplete information and signatures inserted. Before the closing meeting the PSM Coordinator returns to the MOC auditor with copies of the twelve corrected MOCs forms and requests that the finding be deleted. Are these corrected MOCs still findings? If so, why? If not, why not?
Possible Resolution or Conclusion: A fairly common situation that occurs in PSM audits is that the facility attempts to correct the findings as the onsite portion of the audit is progressing. There are two schools of thought regarding this practice: 1) A simple finding that reports that the records are incomplete in a particular PSM element point only requires that the missing data be inserted for the subject records to be complete and satisfy the requirement. Such findings should be able to be corrected at any time after they are discovered and if they are corrected before the closing meeting or the issuance of the final audit report they should not be mentioned in these forums because they are moot.
2) An audit finding is a report of the conditions as the auditor found them and the findings should be described and published as the auditor found them because that was the status of that aspect of the PSM program on the date found. The dilemma facing auditors in this situation is that a facility that strives to correct findings as they occur may be more concerned with the existence of the findings (or the number of them in a specific audit) rather than what the findings are telling them about their PSM procedures and practices. The other issue associated with this dilemma is that facilities that work hard to close findings during the audit usually feel that they are being unfairly “punished” if the audit team advocates the second school of thought. Incomplete records can be a simple oversight, or they may represent a systemic problem with the procedure or practice that is generating the records in question.
Strictly speaking, if the procedure is inferred rather than explicitly written and the governing regulations (if any) do not require a procedure or specify a documentation method for the activity, then the incomplete records would not represent a finding. Therefore, auditors should attempt as much as possible to determine if there is a systemic problem or not so that the finding can be written to fully describe not just the evidence discovered but any possible problems with the way that the PSM procedure or activity related to the evidence in question is being practiced.
It is not likely that a facility will be able to correct systemic problems in the few days that the onsite portion of an audit is occurring given that these corrections will generally require changes to procedures, additional training, revised recordkeeping procedures, additional administrative steps, or other activities before the finding can be permanently corrected. This dilemma can also helped be solved by all parties fully understanding and agreeing to the audit ground rules in advance that allow or do not allow the correction of findings during the audit.
Stretched Definition of “Annual:”
Dilemma: It’s November 2021 and you are conducting the Asset Integrity portion of a PSM audit at an oil refinery, which has 350 employees and 50 maintenance personnel. In assessing the SWP training for the maintenance department, you note that an excellent needs assessment matrix has been developed for all applicable training modules and for all job classes of maintenance personnel; computerized and pretty impressive at first glance. You check on ten employee records and note that three of them have missed their required annual training for hot work permits or line breaking/process opening for 2021.
The last recorded training on these topics for the ten employees was in January-February 2020. You are told by the EHS training coordinator, who didn’t seem aware of these deficiencies, not to worry that this was probably due to summer vacations and a short turnaround in early 2020. The employees will make up the training in December, which means that some employees will have an interval of almost two years between sessions. The EHS training coordinator argues that this meets the annual requirement for this training. Is this a finding? Why or why not?
Possible Resolution or Conclusion: The general meaning of annual in this context is interpreted to mean a rolling day period and not one occurrence of an activity in successive calendar years. Therefore, competing SWP training anytime in 2020 and anytime in 2019 for a given person does not meet this definition. However, some facility PSM personnel argue that “annual” means a once-in-the calendar-year basis. The most common practice is to observe the rolling 365-day definition of annual. Therefore, findings which describe exceeding this limit typically survive because it makes common sense to most people. Note that this issue also applies to any requirement in the PSM Standard with a time limit, i.e., HIRA revalidations, audits, certification of SOPs, etc.
“Timely” and how is it measured
Dilemma: While performing a PSM audit at an oil refinery, the auditors for the HIRA, audit, and incident investigation elements notice that none of the recommendations for these elements are overdue. However, many of them have very long due dates, even for the simple recommendations (e.g., modify a procedure). Also, it appears that the due dates for many of the recommendations have been extended multiple times, and that many of the new dates were changed very shortly before the audit you are conducting. Is there a finding here? Why or why not? What does “timely” mean with respect to resolving and implementing reco
Possible Resolution or Conclusion: Very long due dates for administrative changes such as changing the wording of a procedure fails the test of “timely” explained in Chapters 1, 2, and the Glossary. These would be findings because the facility’s schedule was not being followed. However, each situation must be examined on its own merits before the finding can be written.
For example, a wholesale change to a procedure that implements new or modified documentation methods requiring new software, substantial training, etc. may reasonably take more than multiple months to accomplish, whereas the changing of the wording of one warning statement in a SOP should be reasonably completed within a few months, or even less. Also, the multiple extensions indicate a breakdown in the application of “timely” in practice at the facility.
Since the term “timely” has no single uniform definition that covers every situation, auditors should apply a reasonable definition on a case-by-case basis, seeking consensus from within the audit team for each situation. The last-minute extensions just before the audit are probably attempts to avoid findings without addressing the underlying issues.
Recognized and generally accepted good engineering practices (RAGAGEPs) and what they require?
Dilemma: You are auditing the Asset Integrity element of a chemical facility PSM program. The facility has not implemented ANSI/ISA S84.01 for safety instrumented systems (SIS). Despite the clear indication of interlocks
trips, and other automatic controls on the P&IDs and in other PSK, the facility lead control systems engineer states that the facility has no SISs or ESDs. Is this a finding? Why or why not? Note that is dilemma could also be faced by the auditor of the Process Knowledge Management element.
Possible Resolution or Conclusion: ANSI/ISA S84.01 is the governing RAGAGEP for safety instrumented systems (SIS). But like any RAGAGEP, it can be substituted for with an equivalent written practice, even a company-designed practice, if it accomplishes the same goals and objectives. There are several large companies in the chemical/processing sector that have long-standing engineering standards in place that specify how control systems are to be defined, specified, designed, installed, and tested.
These home-grown procedures offer an alternative approach to that provided in ANSI/ISA S84.01 and have been found to be acceptable. This acceptance, however, has not been made in writing, but has been obtained by its long and successful practice. Therefore, not using ANSI/ISA S84.01 is not a finding if an equivalent practice is in place.
However, simply declaring that there are no SISs/ESDs at the facility without a documented risk based analytical process in place that confirms it would be finding.
What Does Replacement-In-Kind Mean?
Dilemma #1: You are auditing the MOC element of an oil refinery PSM program. The refinery manages some changes in PSM activities that would not fit the definition of being RIK but are not controlled and managed using the refinery MOC procedure. For example, you discover that the refinery controls the removal or bypass of safety features using a separate procedure from MOC. A review of the bypass procedure reveals that it requires that a form be completed that includes the technical justification for the bypass or removal, a time limit, and an approval (by the Maintenance Manager). Is there a finding here? Why or why not?
Possible Resolution or Conclusion #1: The use of alternative change control procedures for different types of change situations is an acceptable practice, as long as the alternative procedures accomplish the basic requirements of MOC. In this case, the safety feature bypass procedure does not address the impact of the proposed bypass on safety and health, which is an important requirement of the MOC process. A finding to modify the bypass procedure to include this analysis and a documentation of it on the permit form should be generated.
Dilemma #2: During the same audit you discover that the facility definition of replacement-in-kind would not include changing an isolation valve from a ball to a gate valve and would not require a MOC. The Engineering Manager states that this change would be considered merely a drawing symbol change and would be managed using the document control procedure for engineering drawings. Is this as finding? Why or why not?
Possible Resolution or Conclusion #2: Any physical change to a process that alters the hydraulic characteristics, in this case the pressure drop across a different type of valve, constitutes a change subject to MOC. However, if an engineering specification allows the substitution of a ball valve with a gate valve, then this type of change is pre-approved and would not require a MOC. Valve changes are relatively minor changes, but they are not replacement-in-kind, and they are not simply P&ID/drawing changes. Therefore, a finding should be generated for not applying the MOC procedure to the change from a ball to a gate valve unless it is allowed via specification.
CONCLUSION
In general, there are no absolute right or wrong answers to solving these dilemmas, although in some cases a resolution might seem obvious. Often, there is a resolution that is better than others, or appears to be, but many times that would require adoption of a policy or practice that is not a compliance requirement, and that should be carefully considered before a final decision is made. Even when the dilemma seems to represent a compliance issue, and its resolution clear-cut, there may be substantial flexibility to craft the resolution and correct the problem.
Bio:
Dr. Bill Pomfret of Safety Projects International Inc who has a training platform, said, “It’s important to clarify that deskless workers aren’t after any old training. Summoning teams to a white-walled room to digest endless slides no longer cuts it. Mobile learning is quickly becoming the most accessible way to get training out to those in the field or working remotely. For training to be a successful retention and recruitment tool, it needs to be an experience learner will enjoy and be in sync with today’s digital habits.”
Every relationship is a social contract between one or more people. Each person is responsible for the functioning of the team. In our society, the onus is on the leader. It is time that employees learnt to be responsible for their actions or inaction, as well. And this takes a leader to encourage them to work and behave at a higher level. Helping employees understand that they also need to be accountable, visible and communicate what’s going on
Paul Gladieux says
Excellent!!!!!
Greg Hutchins says
Thank you for the very kind comment.