Guest post by Dr. Amir Segal & Yizhak Bot of BQR
Reliability engineers are equipped with an arsenal of techniques (FTA, RBD, Markov, FMEA / FMECA, SIL) for reliability, availability, safety and maintainability analysis. However, it is not always clear when to use each technique.
In order to design a safe and reliable product, reliability engineering techniques should be integrated with the system design process. This fact is well known, and today many system engineering conferences include discussions regarding reliability and safety [1,2].
Reliable and safe design is especially important in the aerospace and automotive industries, for example: a review of the Boeing KC-46 refueling aircraft revealed that some redundant wire bundles were too close to each other, thereby increasing the probability of common failure modes. The result was a massive rewiring project that cost Boeing millions of dollars and delayed the aircraft qualification .
The fact that the problem was found before deployment is a testimony for the rigorous review process of manned aircraft. However, there is no well-defined methodology for reliability, maintainability and safety of drones or self-driving cars. Drone failure rate was found to be two orders of magnitude higher than that of manned aircraft [4, 5].
In this paper we discuss how reliability engineering techniques can be combined effectively with the product design process in order to create reliable and safe products. As a specific example, drones are considered.
One of the most famous system engineering models is the “V” model , shown in Fig. 1.
The 1st steps in the “V” model are conceptual i.e. defining the system required functionality and verifying feasibility of the various sub-systems.
Feasibility is strongly connected to reliability and availability. Even if an existing technology can perform the required sub-system function, it might not be sufficiently mature and reliable.
At this initial stage, FTA and RBD can be used for rough estimates of the concept safety, reliability and availability. System FTA and RBD are usually calculated by using MTBF and MTTR estimates for the events / low level components. However, in this early stage it is useful to use RBD and FTA the other way around. Define the required system level safety, reliability and availability, and allocate the resulting requirements for the sub-systems.
One of the first design reviews is the System Functional Review (SFR) whose goals is to ensure that the system’s lower-level performance requirements are fully defined and consistent with the system concept, and whether lower-level systems requirements trace to top-level system performance requirements.
Fig. 2 presents a hierarchical tree of the initial drone design. Component failure rates were allocated such that the system mission reliability of 99.9% will be achieved.
The next step in the “V” model is a drill-down of each sub-system, down to the component level. At this step FMECA is very useful for identifying the component failure modes, effects and criticality. Criticality calculation requires an estimate of components’ failure rates. If field data exists, field data analysis  can be used for extracting the component failure distribution. If field data does not exist, one can use OEM MTBF data, MTBF prediction  methods, Mechanical Reliability Simulation (MRS) and/or databases .
Using the component failure rates and FMECA results, detailed FTA and RBD models can be constructed and compared to the sub-system requirements.
Sometimes the sub-system is supplied by a sub-contractor. In such cases the sub-contractor prepares FMECA reports that are reviewed by the system engineers.
The FMECA, FTA and RBD analyses are important for the Preliminary Design Review (PDR) which is the 2nd review stage in the “V” model.
In some cases, standard RBD cannot model the actual system reliability and availability. A standard example is the case of load sharing where failure of one item affects the failure rate of another item. Markov modeling is used for solving such cases.
In the drones’ case, actual components were found to have a failure rate which is too high. Therefore, redundancies were added to the design.
Fig. 3 presents the revised design including redundant flap motors and position encoders, CPUs, GPS, and Gyros. Figure 4 presents the Reliability Block Diagram (RBD) which corresponds to Fig. 3.
Two motors and position encoders control each flap. This redundancy and load sharing was modeled using a Markov Model (see Fig. 5).
As shown in Fig. 4, the revised design is expected to provide the required mission reliability.
Maintenance and field operation
Another important KPI is the field availability of the drones. In order to guarantee the high availability, an efficient maintenance and logistics policy has to be defined. The optimal policy should provide the required system availability at the lowest possible Life-Cycle Cost (LCC).
In order to optimize the maintenance and logistics policy, a model of the product field behavior is required.
BQR’s apmOptimizer uses analytic and numerical calculations in order to predict the system behavior over the Life-Cycle. The apmOptimizer takes as input the previously collected reliability data as well as logistic and maintenance data:
- Operation profile
- Spare parts in the various warehouses
- Transportation and procurement times
- Scheduled inspections
- Scheduled preventive maintenance
- Maintenance tasks and resources (optional)
The output includes predictions for:
- Component failures
- Repair times
- Quantity of Inspections, Corrective and preventive maintenance
- Spare parts consumption
- Availability and downtime
When financial data is added (cost of spare parts, maintenance, inspections and downtime), the apmOptimizer can predict the LCC.
This allows the designer to compare results of various scenarios (“What If” analysis).
Next, the apmOptimizer optimizes the maintenance and logistic policy in order to minimize LCC while keeping a high product availability. apmOptimizer’s analytic approach (as opposed to Monte-Carlo) allows for very fast and accurate comparison between many maintenance and logistics policies, yielding the following recommendations:
- Component repair / discard policy
- Spare part provisioning
- Scheduled inspections
- Scheduled preventive maintenance
In the drones’ case, 17 drones are to be stationed in three sites (5 drones to site A, 5 drones to site B, and 7 drones to site C). Fig. 6 shows a screenshot from apmOptimizer regarding the field operation of the drones.
An initial policy was defined based on prior experience. Then, apmOptimizer was used in order to produce a revised maintenance and logistics policy.
The expected LCC was reduced by almost 20% compared to the initial program while increasing the expected field availability.
Note that the logistics and maintenance policy are reviewed during the Critical Design Review (CDR) .
The product design flow that we presented includes essential RAMS activities that ensure the product reliability, availability, maintainability and safety.
The order of activities is designed such that:
- Each activity can use gathered data from previous activities (for example, FTA uses FMECA results).
- RAMS analyses are in sync with the various reviews (SFR, PDR, CDR).
RAMS calculation activities are not a substitution for common sense and good engineering practices, rather, they are tools for focusing the engineers on potential design issues. Furthermore, RAMS activities are important for improving the organizational RAMS culture. For example, FMECA brainstorming sessions put the engineers in a new position where they have to play “the devil’s advocate”.
- The 9th Israel International Conference on Systems Engineering, 2017, http://www.iltam.org/incose_il2017/incose_il2017_61#posincose_il2017_61
- The 2nd International Conference on Reliability Systems Engineering
- Development of Unmanned Aerial Vehicles Maintenance Strategy under an Asset Management Framework, P. Gonçalves, J. Sobral, L. Ferreira, ESREL 2016
- Bone, E. & Bolkcom, C. 2003. Unmanned Aerial Vehicles: Background and Issues for Congress. Report to Congress, Congressional Research Service, Library of Congress, pg. 2
- System Engineering Fundamentals, 2001, DOD, Systems Management College
- NPRD and OREDA
- The Technology Management Handbook, 1999, CRC press in cooperation with IEEE press, Ch. 20, p. 27