# Determining the Subgroup Size and Frequency on a Control Chart

Anyone who knows me knows I love hearing from the students who take my online classes. One reason is that they ask the most challenging questions. Here’s a recent question from a student taking my **“Process Capability Analysis”** class:

“In case of time-based sampling, on what basis we decide the sample size and also what is the criteria behind deciding the sampling interval?

“I am a Process Development Scientist for a Pharmaceutical manufacturing Company. Compression of 0.5 million tablets from powder takes around 10 hours to complete. So, to ensure that all my tablets weight is with the specification limits and to study the process capability, I perform stratified sampling. What factors are to be considered while selecting the sample size (number of tablets to be checked for weight) and the interval (total number of sampling points throughout the manufacturing)?”

## Answer:

Excellent questions. Let’s start with the subgroup size. If the subgroup is too large, your chart may be “hyper sensitive”, identifying conditions as out-of-control, when in fact, they are not. This is called a Type I error (aka “Alpha”), analogous to convicting an innocent person. In process control, an excessively large sample size could result in over adjusting an in-control process.

If the subgroup is too small, your chart may lack the sensitivity to identify significant process shifts, increasing the probability of a Type II (aka “Beta”) error. This is analogous to acquitting a criminal, and in the context of process control, may lead to validating an out-of-control process.

And these errors are joined at the hip … as alpha goes up, beta goes down, and vice versa. The trick therefore, is to find a happy medium.

For most applications, engineers go with standard subgroup sizes like 3, 5 or 7. In his classic reference titled “Quality Engineering Handbook”, Thomas Pyzdek recommends a maximum subgroup size of 9 for X-bar and R charts.

For most applications, the standard subgroup sizes provide an adequate snapshot of your process. Working in pharmaceutical though, rules of thumb may not be good enough. But it would seem to me that some oversight body in your industry has worked through the issues and published it for everyone to use. Nonetheless, in order to calculate the optimum sample size, you need to know the acceptable limits on your alpha and beta errors, and have a sense of the magnitude of shift you’re trying to detect.

The formula for subgroup size, n is:

$$ \displaystyle n=\frac{\left(\frac{Z_{alpha}}{2}+Z_{beta}\right)^{2}\sigma^{2}}{D^{2}}$$

where D is the size of the process shift that bears significance in your application. Maybe it’s 1/5 of your process range. Maybe 1/10. It has to be larger than the low end of what your measurement system can discriminate, but not so small that it’s insignificant. Z_{alpha/2 }and Z_{beta} are found using the Z score tables once you determine your maximum acceptable limits of the Type I and II errors.

In applications where the control limits are set by estimating +/- 3 standard deviations from the mean, the alpha error is usually set at a corresponding .0027, resulting in an alpha/2 of .00135. You can see this connection by consulting a Z-score table.

Regarding the sampling interval, I think you’ll find the concept of “rational subgroups” helpful in deciding how to proceed.

Manufacturing pills, like most other processes, has a variety of raw material, process and tooling variables that change over time and strokes of your machine. Maybe you know that the die used to press the pill is expected to last about 10,000 shots, after which, you need to shut down and change the die. If you’re sampling interval is too long, you’re more likely to miss the point at which the die is worn out, and end up with a bunch of defective pills.

If your sampling interval is too short, then the cost of sampling increases with little benefit to show for it. An excessively short sampling interval matters most in destructive testing or with a measurement system that requires hands-on intervention by an operator. With an automated measurement system in a nondestructive test, there may be no cost implications to over-sampling.

Once you determine the sampling interval required to “sneak up” on your wearing die tool and other process variables, you set your sampling interval accordingly. You may decide that weighing a sub-group of pills after every 1,000 pills produced gives the machine operator an adequate opportunity to make adjustments and tool changes with little risk of unknowingly running an out-of-control process.

In high-speed operations with complex measurements, the time required to perform the measurements sometimes drives the sampling interval. If it takes 10 minutes to perform the inspection, then a minimum number of parts will have been produced prior to drawing the next subgroup.

Before setting your interval though, you need to understand how at least all your major process variables move over time. Ambient conditions (day to night), temperature cycles, material cycles, etc. all cause the output characteristics (like weight in your case) to vary. Understanding these cycles in your process will help you set the appropriate sampling interval. And like anything new, you can err to the side of caution, and open up the interval later.

## Leave a Reply