In light of the International Day of Failure, Oct 13th, let’s consider failure from a reliability engineer’s point of view. We work to understand and avoid product failures. When a product fails to deliver the desired performance attribute, it is tossed away, returned, replaced, repaired, or tolerated. This may occur before or after the product’s value has been achieved. [Read more…]
Dependability

Ran across an interesting graphic in a new book recently. It single-handedly placed dependability in its proper context. It is an umbrella term that includes most of what we commonly think of as reliability and the other ‘ilities.’ It encompasses the various connotations of dependable and reliable that are conveyed during common use. And, the term dependability permits the overarching context for defining very clearly the various [Read more…]
Root Cause Knowledge and Models

Two short questions to evaluate your knowledge of failure mechanisms (root causes) and common reliability models. The answers will be posted in a comment, later.
Which of the following failure root causes is most likely NOT due to power line variation (electronic-based product)?
A. Circuit design margin exceeded
B. Power dissipation
C. In-rush current response
D. Mechanical fatigue [Read more…]
Two Pumps Problem

Let’s say we have two identical pumps share a load in parallel. The failure rate for a pump in this mode of operation is 0.0002 failures per hour. If one pump has to carry the full load alone, that pumps failure rate increases to 0.0009 failures per hour.
What is the reliability of the two pump system over a 168 hour week of operation? [Read more…]
AQL decision
Recently I received a question related to setting an Acceptable Quality Level (AQL) for a sampling of fielded electricity meters. The question was on how to select the right AQL for use with the sampling plan. I was not sure from the question if the sample would determine if the population would be replaced or not (expensive), or simply an experiement to determine how the meters are doing after 15 years of service (information only). [Read more…]
Permutations and Combinations

A foundational element of probability and statistics is counting. How many ways could something occur? A simple example is a pass or fail criteria, thus when evaluating a product there are two possible outcomes. [Read more…]
Hypergeometric Distribution

In those situations where we sample without replacement, meaning the odds change after each sample is drawn, we can use the hypergeometric distribution for modeling. Great, sounds like statistician talk. So, let’s consider a real situation. [Read more…]
Poisson Distribution Calculation

Let’s say the results of software testing averaged three defects per 10,000 lines of code. The criteria for release is 90% probability of 5 or fewer defects per 10k lines.
If this product ready for release?
The Poisson distribution is appropriate here as it is useful for modeling defects per unit, count per area, or arrivals per hour. If the data, in this case, the defect count per lines of code to be modeled by the Poisson distribution, the probability of an occurrence (defect in this case) has to be proportional to the interval (lines of code in this case). Also, the number of occurrences (defects) per interval must be independent (more on statistical independence in another post). [Read more…]
Calculating Lognormal Distribution Parameters

The lognormal distribution has two parameters, μ, and σ. These are not the same as mean and standard deviation, which is the subject of another post, yet they do describe the distribution, including the reliability function. [Read more…]
Hypothesis Testing

This week in our CRE Test Prep class/webinar we covered the Advance Statistics section of the CRE Primer, and I felt great that we stayed an hour more going over these though topics.
One of the topics was Hypothesis Testing.
Let me share some of the questions that arose during that section. [Read more…]
Common Cause Failures

A guest post by Andrew O’Connor, of Relken Engineering Pty Ltd
Common Cause Failures (CCF) is one of the reasons why a classical reliability model of your system may dangerously underestimate the risk of failure. It directly attacks the benefits of providing redundancy by creating a single point of failure. In fact, studies have shown that CCF events may contribute between 20% – 80% of the unavailability of safety systems within nuclear reactors [Werner 1994]. This post will “Describe this type of failure (also known as common cause mode failure) and how it affects design for reliability. (Understand)” [CRE BOK III.A.4] [Read more…]
Confidence Limits

Last week during our CRE Test Prep class, we were covering the Basic Statistics section in the CRE Primer and had several questions regarding Confidence Limits for Reliability.
All of them were fair questions, and when students are asking these types of questions, the class gets better… [Read more…]
ALT and HALT

The other day I got a question about the difference between ALT and HALT. There was some confusion probably because of the similar words in the acronym. ALT is Accelerated Life Test, and HALT is Highly Accelerated Life Test. [Read more…]
Effective FMEA Principles

Failure mode and effects analysis, or FMEA, is a tool for the identification and prioritization of possible ways a product or process can fail. The intent is to use that information to make improvements to the product or process.
I think of FMEA (and related processes like FMECA, dFMEA, etc.) as structured brainstorms that provide a means to focus on what’s important. [Read more…]
Reliability Goal

The reliability goal is a key element across the entire product lifecycle. From product definition to determining warranty to judging performance, knowing the goal in clear terms sets the stage for a successful product.
Reliability in engineering terms is the probability of satisfactory product performance within a defined environment over a stated duration. [Read more…]