A common tool for comparing if two populations are the same is the “student t-test.” This is often used in reliability, and science, if we want to investigate if a factor has caused a change in a respnse.
A population was assembled in location “A”. Another population was assembled in location “B”. Population “A” has an average defect rate of 4%. Population “B” has an average defect rate of 5.5%. Does the location of assembly affect defect rate? That’s just a big argument unless we can project the statistical likelihood that what we have measured is not just an overlap of noise.
What are we really asking? We are asking if the difference between the two failure rates is more significant than the natural variability if we measured this several times at the same location. Could the failure rate at location “A” be 5% next month instead of 4%? This change could flip some people from having said that the populations “were different” to “they are not different.” But all that happened was that we were able to observe the natural variability month to month at the same location. In two months we may measure 4% again at location “A”.
We are effectively looking for a signal to noise ratio. Is the difference between the populations just the noise due to a small population size and the expected variability?
A Descriptive statistic vs an Inferential statistic
A Descriptive Statistic only describes the sample we have. This would simply be comparing the failure rates. We can say that one is larger than the other by 1.5% of the total population. This doesn’t tell us if our results are likely to happen again
An Inferential Statistics doesn’t just describe our sample. It tells us what to expect in new samples we do not have. This allows us to make inferences about the population beyond our data. This is the t-test! This is a powerful tool!
The t-value is the ratio of variance between groups to variance within groups. (Remember: Signal to Noise)
A big t-value = different groups A small t-value = similar groups
Each t-value has corresponding p-value. The p-value tells us the likelihood that our calculation is correct. The p-value is the probability that the pattern of data in the sample could be produced by random data. The probability that we got tricked.
- A p-value of .1 means there is a 10% chance we would get these results with random data. That’s too high for my comfort level. So we conclude that our t-value is invalid and there is no real difference between the populations.
- A p-value of .01 means there is only a 1% chance we would get these results with random data. So it is very likely we have calculated a valid t-value. We conclude there is a true difference in the populations.
- A p-value of .05 is a 5% chance we would get these results with random data. This is the common “rule of thumb” threshold in the statistics world for these two conclusions. So if p=.05 pop some popcorn, sit back, and watch the show as the arguing commences.
The bigger the samples the better the “signal to noise” ratio. There are diminishing returns on this. A good rule of thumb is to have 25 samples, but no less than 10.
Three common types of data sets for t-test
- Independent samples: There is no correlation between the two groups. Product from location “A” and product from location “B.”
- Paired samples: The two groups are related to a common factor. Measure a subject at two different settings. measure the group at location “A” at 10 volts and then measure the same group “A” again at 13 volts.
- Ons sample test: Measure a single group and compare to known population. A group IQ vs the known average IQ of the country.
The t-test is a very helpful tool to be familiar with. If you practice it you can do the calculations in Excel in two minutes (literally). That argument that is happening live in a meeting…, you can stand up, hold your hands out like you are stopping traffic, and say conclusively that “The answer is …” Then just sit back down smugly and wait for that promotion and raise to roll right in.
TLDR (too long didn’t read)
- The t-value is the likelihood the difference we are observing is just noise.
- The p-value is the likelihood we got tricked by our data and have a junk t-value calculation (i.e. “P” stands for “Probably should have stayed home today”)
John Kreucher says
Very nice article, Adam, on the t-test!! It was very clear and concise. I plan to share it with my team.