# What is Process Capability Analysis? Really.

I recently read the results of a AAA survey where Americans were asked to rate their own driving abilities. 73% of those surveyed considered themselves “better-than-average” drivers. Obviously only 50% of the drivers can actually be better than average. So it follows that at least 23% of those surveyed are mis-estimating their own skills.

This over-confidence in one’s abilities seems to find its way into all sorts of areas … including process capability analysis. Everyone who’s been around manufacturing for any length of time has certainly heard of Cp and Cpk. Most of them know that “higher is better” when it comes to these indices. And many will nod their heads and smile when you suggest that Cpk and Ppk account for centeredness, whereas Cp and Pp do not. But only a small percentage of manufacturing professionals can cogently answer the question, “What is Process Capability analysis?”

Without referencing any of the mathematics, process capability analysis is simply comparing the specification range for a characteristic to its process range. The specification range is fixed, and in manufacturing, it typically comes from a part’s blueprint. Diameters, lengths and thicknesses specified on working prints are generally assigned tolerances that limit the range of a given part’s acceptability.

The process range is derived by measuring a random sampling of parts, then with the help of a few statistical formulas, generating the expected max and min value for the entire population of parts. The process range is the difference between those max and min values.

A given manufactured part could have dozens of dimensioned features on its blueprint and undergo a variety of processes before it’s completed. Therefore, a capability index (such as Cp, Cpk, Pp, or Ppk) applies only to a specific feature like diameter or length, not to the entire part.

So in fact, higher is better when it comes to process capability. Consider the following generalized formula,

**Process Capability = Specification Range / Process Range**

As the process range gets smaller relative to the fixed specification range, the capability index gets larger. A larger capability index equates to a lower probability that the process will create parts outside the specification limits. Voila! You now understand the essence of process capability.

In the next article in this series, we’ll examine the key differences between Cp and Pp, and their corollaries, Cpk and Ppk. But since these indices have far more in common than not, let’s back-burner those differences for the moment and consider their formulas:

$$ \displaystyle\large C_p=\frac{\text{USL}-\text{LSL}}{6 \hat{\

\sigma }} \hspace{12 mm} P_p=\frac{\text{USL}-\text{LSL}}{6 \sigma } $$

In these formulas, sigma and sigma-hat are both measures of dispersion and estimates standard deviation that for this conversation, we can treat as equivalents^{1}. Decades ago, a super smart mathematician figured out that for a *normal* probability distribution of data – the kind of distribution most typically encountered in manufacturing, nature, social sciences, and just about everywhere else – six times the standard deviation as calculated from a sample equates to about 99.7% of the values expected within the population from which that sample was drawn. And this idea that, “by a small sample we may judge of the whole piece”, as Miguel de Cervantes so succinctly stated, under-girds the entire field of inferential statistics.

Where I live in Northeast Ohio, sugar maples trees are remarkably common. In a study I read about maple syrup production in the region, the heights of 112 mature sugar maple trees were measured and the resulting data analyzed. The heights of the trees measured in the study were normally distributed with an arithmetic mean of 87.5 feet and a standard deviation of 6.25 feet. Using this data and what we now know about the normal distribution, we can infer that the range of heights among the population of mature sugar maple trees is 37.5 feet (6 * 6.25 feet). Since we know the mean tree height is 87.5 feet tall, we can equally distribute this range above and below it to predict the heights between which 99.7% of the mature sugar maple trees in this region will measure:

87.5 + 3 * 6.25 = 106.25 feet

87.5 – 3 * 6.25 = 68.75 feet

This is quite similar to the procedure used in manufacturing to conduct a process capability study:

- Draw a sample of parts from the population of parts under study.
- Measure the feature in question on each sample.
- Verify the data is normally distributed.
- Calculate the average and standard deviation.
- Calculate predictions about max and min values in the population from which the sample came.
- Compare those predictions to the specification limits.

The obvious difference between manufactured components and maple trees is the specification limits. Trees have no specs, but of course manufactured parts do.

Imagine we used the above procedure to examine the diameter of part cut in lathe operation. In our study, we find a mean of 12.140 mm and a standard deviation of .110 mm among the samples we drew. Assuming our measured values were normally distributed, we can then estimate the process range at .660 mm (6 x .110mm). If the print specification is 12.150 +/- .400 mm, our specification range is .800 mm. Using our capability formulas,

**Cp (or Pp) = .800 mm / .660 mm = 1.21**

Notice that Cp and Pp take no account for the centeredness of the process within the specification limits. In this example, our average diameter may be the length of a tennis court with all of the parts out of specification, but as long as the standard deviation is .110 mm and the specification range is .800 mm, our Cp will be 1.21.

So why bother? Because we can draw some important conclusions about our process using Cp or Pp. The most important conclusion we can draw in our example is that the specification range is 1.21 times bigger than our process range. Therefore, *if* the process were centered, a very high percentage of the total parts in the population would remain within our specification limits. That’s an important detail to consider.

Referring back to formula,

$$ \displaystyle\large C_p=\frac{\text{USL}-\text{LSL}}{6 \hat{\sigma \

}} $$

we can see that a Cp of 1.0 requires identically sized process and specification ranges, leaving no room for off-centeredness where parts begin to transgress either the upper or lower specification boundaries. From this we can conclude that a higher Cp “buffers” a certain amount our off-centeredness, lowering the probability of out-of-specification parts.

It is at this concern over the centeredness of a process within its specification limits where Cpk and Ppk can serve us well. Like Cp and Pp, these indices have far more in common than not, and the only mathematical difference between the two formulas is the measure of dispersion. So let’s again set aside their differences and focus on what these indices can tell us about our process.

The formulas are as follows:

$$ \displaystyle\large C_{\text{pk}}=\min \left(\frac{\text{USL}-\bar{x}}{3 \hat{\sigma }},\frac{\bar{x}-\text{LSL}}{3 \hat{\sigma }}\right) $$

$$ \displaystyle\large P_{\text{pk}}=\min \left(\frac{\text{USL}-\bar{x}}{3 \sigma },\frac{\bar{x}-\text{LSL}}{3 \sigma }\right) $$

The first thing we notice is that Cpk and Ppk are the minimum value of two different equations. So when calculating these indices, only half your effort is utilized since one of the equations is not used in the final value.

The second thing people generally notice is that the two equations have the same denominator, 3*sigma (or 3*sigma-hat), what we already know as half the process range. So the equation with the smaller numerator generates the capability index while the equation with the larger numerator falls away. Considering then just the numerators, (USL – mean) and (mean – LSL), the smaller of the two is always the one with the shorter distance between the process mean and the nearest specification limit.

Qualitatively speaking, given that the process range is equally distributed on both sides of the process mean, if the process mean is closer to the upper specification limit than the lower, then it’s more probably that parts will be out of specification on the upper side instead of the lower side of the range. Naturally, the opposite is also true if the process mean is closer to the lower specification limit. This idea that parts are more likely to transgress one specification limit rather than the other is what justifies Cpk and Ppk’s sole focus on the smaller numerator.

Cp and Pp are a ratio of the entire specification range for a given characteristic to its entire process range, whereas Cpk and Ppk are a ratio of the distance between the process mean and closest specification limit to half the process range.

Let return to our lathe example:

- Average = 12.140 mm
- Standard deviation = .110 mm
- Print specification = 12.150 +/- .400 mm
- USL = 12.550 mm
- LSL = 11.750 mm

Feeding these numbers into our Cpk equation, we obtain:

- Cpk = min[(12.550 – 12.140) / 3(.110), (12.140 – 11.750) / 3(.110)]
- Cpk = min[(1.24), (1.18)]
- Cpk = 1.18

What does this tell us? That the distance from the process mean to the closest specification limit is 1.18 times larger than half the process range. Logically we can conclude that as Cpk or Ppk get larger, the probability of the far-reaching ends of the process range transgressing a specification limit gets smaller.

Here are a few other logical conclusions you can draw from these capability formulas:

- If a process is perfectly centered within specification limits, Cp = Cpk and Pp = Ppk.
- In a given study, Cpk can never be larger than Cp, and Ppk can never be larger than Pp.
- No one capability index is better than the others. They work together to describe a process.

Now, the next time someone asks you how much you know about process capability analysis, you can confidently respond, “more than the average person.” In the next article in this series, we’ll continue to move from familiarity to mastering of capability analysis by examining the similarities and differences between Cpk and Ppk.

**Footnote**:

- This article uses the conventions sigma and sigma-hat to represent estimations of the standard deviation and the sample standard deviation, respectively. Other texts use sigma to represent the population standard deviation and ‘s’ to represent the sample standard deviation. Regardless of the convention used, the method is effective is measuring process capability.

*
Also published on Medium. *

## Leave a Reply