t-test Hypothesis Testing for Means with Unknown Variance

In the situation where you have a sample and would like to know if the population represented by the sample has a mean different than some specification, then this is the test for you. In this case, you do not know the actual variance of the population, you just have a sample.

This test is often the second one in a textbook that describes hypothesis testing. It is a useful hypothesis test and applies in many situations as we rarely know the population variance.

Assumptions

A good practice when applying any statistical application is to consider the related assumptions. For this test there are two assumptions involved:

The sample is randomly selected from the population under investigation
The population distribution is a normal distribution. Note as the sample size goes up this becomes less of a concern due to the central limit theorem. Generally, when n > 25 the difference between the z and t-tests is very small.

If either assumption is not true the results of the t-test statistic may not be informative.

Test Setup

The null hypothesis for the t-test is

$$ H_{0}:\mu=\mu_{0} $$

where $-\mu_{0} -$ is specified

Next, specify the alternative hypothesis. There are three choices depending if you want to check if the mean has changed from an expected value either higher or lower (two-sided). Or, if the test is to detect a shift higher or lower (one-sided)

The alternative hypothesis for a z-test may be:

$$ H_{a}:\:\mu\neq\mu_{0} $$

$$ H_{a}:\:\mu>\mu_{0} $$

$$ H_{a}:\:\mu<\mu_{0} $$

The test statistic is the calculated number of t values the sample mean $- \bar{y} -$ indicates the population has shifted. The test statistic is

$$ t=\frac{\bar{y}-\mu_{0}}{{s}/{\sqrt{n}}} $$

Here, s, is the sample standard deviation and n is the number of samples.

The last step for the setup is to determine the rejection region. This is the value of t that indicates the alternative hypothesis has sufficient evidence to suggest the population mean value has indeed changed.

To do this we need to specify the value $-\alpha -$ which corresponds to the risk we are willing to take that the sample indicates a shift of the mean when in fact is has not changed. We call this a Type 1 error. If we are willing to accept the risk of 1 in 40 times a random sample will result in a sample mean falling in the rejection region, we would establish $-\alpha=\dfrac{1}{40}=0.025 -$, for example.

We also need to determine the degrees of freedom, df. Which then determines the specific t-distribution used for calculating the rejection region value. The df value is equal to the sample size minus one, df = n – 1.

We determine the critical value of $-r_{\alpha}-$ corresponding to the rejection region or area corresponding to the probability of a type 1 error occurring using the area under the tail of a normal distribution we can determine the critical value thus defining the rejection region.

For a two-tailed test, we divided the probability of a type 1 error by two as the rejection region is divided equally between the two tails of the distribution.

Sometimes we approach setting the probability of a type 1 error by setting confidence. This is the probability stated as a percentage that the sample would not lead to a type 1 error. The relationship is

$$ C=\left(1-\alpha\right)100\% $$

Example

Let’s say we have a process creating the top tube for a bicycle and we want to know if a change in the process has indeed lowered the mean weight. We do not know the standard deviation, thus we use the sample standard deviation. We start by setting up the hypothesis test.

The null hypothesis is that the process results in the same mean value, $-\mu-$ of the former process of 3.127, $-\mu_{0}-$.

$$H_{0}:\mu=\mu_{0}=3.127 $$

The alternative hypothesis is the new process results in a lower mean value.

$$H_{a}:\:\mu<\mu_{0} $$

$$H_{a}:\:\mu<3.127 $$

The test statistic is calculated based on the data collected in a sample of readings. The data is in the following table.

Sample New Process
1 3.110
2 3.095
3 3.115
4 3.120
5 3.125
The mean of the new process sample values, $-\bar{y}-$ is 3.113 and the sample standard deviation, s is 0.0103. Since n = 5, df = 5 – 1 = 4. Therefore, we can calculate the test statistic, t.

$$t=\frac{\bar{y}-\mu_{0}}{{s}/{\sqrt{n}}} =\frac{3.113-3.127}{\frac{0.0103}{\sqrt{5}}}=-3.039 $$

Given the desire for a confidence value of 95% or an $-\alpha=0.025-$ we can determine the critical value from the rejection region corresponding to the probability of a type 1 error under the lower tail of a t-distribution with df = 4. In this case, $-r_{0.05, 4}=-2.776-$ keeping in mind the t-distribution is symmetrical and we are interested in the lower tail region.

Since the sample mean results in a t-value of -3.039 which is below the critical t value of -2.776, we can conclude the test provides evidence that the top tube process change results in a lower weight with a 95% confidence. The process change reduces the tube weight based on this analysis.

Comments

Larry George says
May 11, 2021 at 2:35 PM
Your article provoked me into dealing with a two-sample test where the sample(s) don’t resemble the population, such as the COVID-19 vaccine phase III trials. Pfizer and Moderna took convenience samples and randomized. The case rate in the placebo group is much lower than in the US population. Type III error is testing against the wrong or restricted hypotheses. Type IV error is wrong model or distributions. Is Type V error testing where sample doesn’t resemble population? What is the cost of non-representative sample??? E.g., sample variance in t-test differs from population variance?

Assumptions

Test Setup

Example

About Fred Schenkelberg

Comments

Leave a Reply Cancel reply