Overview of System Reliability Models

Building and Using a System Reliability Model

Last Verified July 29, 2024

From the simplest to the most complex system, building and using a reliability model permits the entire team to make better decisions.

Understanding and monitoring system reliability involves knowing both:

the reliability of elements within the system,
as well as how the elements relate to each other reliability-wise.

We use system reliability models to identify weak links, and focus resources, to meet our desired reliability goals.

Being able to build the right model to meet your team’s needs best is one of your roles as a reliability professional.

Reliability Block Diagrams (RBD)

Often depicting elements within a system as a block within a diagram, RBD models provide a graphical and mathematical model of the system’s reliability, given the reliability and relationships of the elements within the system.

The diagram may not reflect the functional diagram of a system as it focuses on the reliability relationships between components or subsystems. For example, within a series system, the RBD will show a string of blocks such that anyone block failing results in the system failing.

It looks like a chain, hence the common analogy of “weakest link.”

Another example has two elements in parallel, such that if one fails, the other keeps the system operating. Parallel structures can get complicated. There are:

Standby Redundancy with Equal Failure Rates and Perfect Switching
Standby Redundancy, Equal Failure Rates, Imperfect Switching

Of course, there are unequal failure rate situations, as well as many other situations.

Parallel structures may also include more than two elements. The k out of n structure means the system continues to operate if k of the n parallel elements remains functional, thus permitting n – k elements to fail without system failure.

Fault Tree Analysis (FTA)

A fault tree analysis (FTA) is a logical, graphical diagram that starts with an unwanted, undesirable, or anomalous state of a system.

The diagram then lays out the many possible faults and combinations of faults within the subsystems, components, assemblies, software, and parts comprising the system, which may lead to the top-level unwanted fault condition.

The key to these models being effective is to select the important top-level failures or faults to model. For systems with more than a few top-level faults of concern, then RBD may be a better starting point.

FTA models use a set of symbols to relate system elements, events, etc. The creation of a useful FTA is not difficult, yet may take some time to fully depict all the paths that may lead to failure.

FTA provides the design team a way to organize the relationships between elements and events that may lead (or prevent or mitigate) failures.

FTA is a useful tool for your reliability program.

Success Tree Analysis (STA)

Very similar to an FTA except the top event is a success state rather than a failure/fault state.

Instead of focusing on how the system can fail the model focuses on how the elements of a system relate, including events, such that the system functions as expected.

Markov models

Let’s assume that the future reliability performance of a system relies on the current state of the system, not on its history. This memoryless property is called a Markovian property.

Markov models work well with complex repairable systems when we’re interested in long-term average reliability and availability values.

A nice description of Markov Models is by Kevin Brown with an early version of the book “Markov Models and Reliability.”

One of the notable strengths of Markov models for reliability analysis is that they can account for repairs and failures. This makes the technique particularly useful for assessing the long-term average reliability of one or more devices with established maintenance and repair strategies.

Petri net models

A Petri net graph is a depiction of a system using symbolic language. The modeling permits the analysis of complex systems or networks of systems.

It is possible to include elements of the system that are neither functional nor failed. In other words, it permits modeling a system when one or more elements are in a degraded state or under repair.

Petri net modeling is useful when the repair/restore times are long compared to operating times, as reliability block diagrams and fault tree analysis approaches assume short or insignificant repair times in most cases.

Failure mechanism models or Physics of Failure (PoF) models

Elements, specific components, may have one or more ways they can fail. Sometimes there are known and dominant failure mechanisms.

Modeling these mechanisms permits us to evaluate design or use changes, differences in use conditions, etc.

Models may be derived from empirical data for a specific failure mechanisms and use conditions. Or, it may be analytically derived and experimentally verified.

PoF models permit you to model specific failure mechanisms in detail. If you know your customers may use the product in different ways or environments, then the PoF model allows you to estimate failure rates or distributions for each customer group.

Summary

You have options when modeling your system concerning reliability.

Simple systems will do fine with basic RBD models supplemented by PoF models. Complex or very high system availability systems often require using Markov or Petri Net models and may require specialized resources to create and maintain the system reliability models.

The model is not useful unless it is useful for decision-making across the team. Creating a model should support the team’s ability to focus resources, make design decisions, and evaluate risks.

Which model do you typically use, and how well is it working for you? Please feel free to leave a comment below.

Comments

Sathish Rajendran says
March 28, 2019 at 5:34 AM
Hi Fred , Can i please get your contact number to talk to you directly? i have some clarifications on the Reliability Model building for Fall Protection products? i would appreciate if you could spare few minutes from your busy schedule
Thanks and Regards,
Sathish Rajendran
- Fred Schenkelberg says
  March 28, 2019 at 10:23 AM
  Sure, sent my contact information directly and in case you miss it, I’m at
  fms@fmsreliability.com and (408) 710-8248
  Let me know when to expect a call as I receive so many spam call I don’t answer unless I know the number or expect the call.
  Cheers,
  Fred
Catherine Chandioux says
June 17, 2021 at 11:28 AM
Hi Fred,
Which software would you recommend in System reliability ?
What is the best method/ software for forecasting warranty failures and costs ?
Thank you and
B.Regards
Catherine C.
- Fred Schenkelberg says
  June 18, 2021 at 5:48 AM
  Hi Catherine,
  I recommend the software that you are familiar with and does a good job of both modeling and understanding the data.
  The best method is using field data and determining the pattern of failures for each salient failure mechanism. Then us a reliability block diagram to assembly the data into a system model.
  Second, using internally generated time to failure data supplemented with time to failure data from vendors
  Another consideration to supplement the data is physics of failure approaches.
  cheers,
  Fred
  PS: Never us parts-count or similar approaches