Coauthored with Mark Fiedeldey
The geometric distribution is a discrete distribution often encountered in reliability work, which contains some valuable properties worth understanding. Let’s take a look at some of these characteristics.
This model is useful when trying to answer the question, “How many trials are needed to get a particular result?” So, the number of trials is not set initially; we need to figure that out.
The geometric distribution is similar to the binomial in that the trials are independent. That is, the result of one trial does not affect any of the other results. Also similar to the binominal, there are only two possible outcomes like “pass” and “fail”. And finally, the outcome probability for each trial is the same. In other words, this distribution isn’t used when testing various designs of a part that would likely have different probabilities of failure. But unlike the binomial, the trials in the geometric distribution are not fixed.
The probability mass density function (PDF) for the geometric distribution is written as follows:
$$ \displaystyle\large f\left(x\right)=p\left(1-p\right)^{x-1} $$Where f(x) is the probability that the first failure occurs on the xth trial, and p is the probability of failure for each test sample.
For example, on a fair die there are six possible outcomes, each with a probability of 1/6. Therefore, we may want to know the probability of rolling a “6” for the first time on the fourth trial, and NOT on any roll prior to the fourth.
Let’s consider the assumptions to determine whether this can be solved using the geometric distribution.
First, given that the outcome of one roll has no impact on the outcome of any other roll, the rolls are independent.
Second, there are only two possible outcomes: “6” or not “6”. Of course, we could roll any of the six numbers on a die, but it’s only the “6” we’re concerned with. The other values are irrelevant in our analysis.
Third, the probability of rolling a “6” is the same for each roll.
And lastly, we have no idea how many times we’re going to have to roll the dice to get a 6. It might come up on the first trial. It might be the fifth trial, might be the tenth trial. We don’t know. We’re just going to have to roll the die again and again until we see a 6.
Given these four conditions, applying the geometric distribution to this problem is appropriate.
Let’s plug in the numbers.
since we’re interested in the probability of rolling a “6” on the 4th roll, and since the probability of rolling a 6 (or any other number) is 1/6th. Therefore,
$$ \displaystyle\large f\left(4\right)=1/6\left(1-1/6\right)^{4-1} $$ $$ \displaystyle\large f\left(4\right)=0.0965 $$The probability of rolling a 6 for the first time on the 4th roll is therefore just under 10%.
Continuing to the cumulative density function (CDF), the sum of all the individual probability densities up to the point that we’re evaluating, we can use the following formula,
$$ \displaystyle\large F\left(x\right)=1-\left(1-p\right)^{x} $$In this equation, an uppercase “F” is used as a means of distinguishing it from the PDF.
From our previous example of rolling a die, well, what’s the probability of rolling a “6” by the third roll? In other words, what’s the probability of rolling a single “6” on the first, second or third roll.
But we still don’t know what the number of trials we would need to do to get a 6. It could be 1 attempt, 10 attempts, or some other number of rolls.
Plugging our numbers into the CDF, we get:
$$ \displaystyle\large F\left(3\right)=1-\left(1-1/6\right)^{3} $$ $$ \displaystyle\large F\left(3\right)=0.4213 $$So we have about a 42% chance of rolling a six within the first three rolls of the die.
Moving to the world of reliability, imagine we’re planning a reliability test on a prototype part that’s expensive to manufacture. We don’t want to have to make any more of them than absolutely necessary. We also have only one test fixture in this scenario so we’re going to have to test all our parts sequentially.
The reliability requirement on our part is 90% on this test. How many parts do we need to include in the test to be reasonably assured that our prototype meets this requirement?
This is a case where using the geometric distribution will help.
Since the reliability of each part is 90%, then the probability of failure, p, is equal to 10%. Using our CDF formula, we can easily create a table of cumulative probabilities in Excel with the following formula:
Filling out our table up to 15 samples, we obtain the following results:
Interpreting the results, we find for instance that if the second sample fails, we have only a 19% probability that the prototypes are 90% reliable. Chances are, if I have a failure by the second sample, my parts are not good enough and I may want to perform some failure analysis and redesign my parts accordingly.
But if I have no failures, I have to test at least seven samples before having at least a 50% chance that the parts are 90% reliable. Down to 15 samples with no failures, we still have only an 80% chance of meeting our reliability goal.
Therefore, if the parts are bad, we’ll find out early. But if the parts are good, it will be expensive to demonstrate with this test that our prototypes meet the reliability requirement. We may want to consider a means of accelerating the test or running the test longer to evaluate whether we actually have a 90% reliable part.
Mark Fiedeldey is a reliability engineer living near Cincinnati, Ohio.
JD Solomon says
Good to see a few articles on distributions from time to time. Would love to see a few on comparing different ones on the outputs of some simple Monte Carlo Simulations.