In the last article, we learned how to determine which effects are statistically significant. This is an important step to develop the predictive model(s) because only the statistically significant factors and interactions belong in the model. If we include insignificant terms in the model, the predictive ability of the model will appear to be better than it really is and we will overstate the ability of our model to predict the response(s).
In this article, we focus on the development of first-order models. These kinds of models are easy to develop when running screening experiments where the factors are set at 2 levels (for efficiency). The first-order models can contain terms for main effects and interaction effects.
A simple first-order model (with main effect only) has the well-known form shown below.
First Order Models
A simple linear model has the form:
$$ \displaystyle Y=b_{1}X_{1}+b_{0} $$
where b1 is the slope and b0 is the y-intercept.
The slope identifies the change in Y for a 1 unit change in X1. An effect identifies the change in Y when we change X1 from its low level to high level (2 coded unit change).
This example does not have any interaction term and with only one factor, the model is the equation of a straight line. The question is: how can we estimate the unknown parameters (b1 and b0 terms: the slope and the intercept)? In two-level designs, this is extremely easy as we leverage a simple coding system for the low and high levels of each factor. The more complex calculations to find regression model coefficients are not necessary in this case.
Recall from 7th grade algebra that the slope is the change in response (y) for a one-unit increase in the factor level (x). Hopefully, this reminds you of main effects which is the change in the response (y) as the factor level (x) moves from low to high. If we define the low level as “-1” and the high level as “+1” (and the middle as “0”), then the distance between low and high is two units. So, to derive the slope, we simply take our effect and divide by two! This is illustrated in the graphic below.
$$ \displaystyle b_{1}=\frac{E_{1}}{2} $$
Each coefficient (for all of the significant main effects and interaction effects) is calculated the same way.
What about the intercept term? Well, the y-intercept in the simple linear model is the value of y when x=0. So, in our coded system, this is the predicted value of y when x is halfway between the low and high value. Since we are enforcing linear relationships here, this is simply the average response! In the example below the average of 75 and 25 is 50.
The y-intercept is the value of Y when X1 = 0.
It is also the average response (for two-level designs).
How many terms might our model have? It depends on the number of factors and interactions that could be significant. Here are some examples. Note that the number of terms shown is the maximum possible (as only statistically significant terms will be included).
The model form for a two-factor study is:
$$ \displaystyle \hat{Y}=b_{0}+b_{1}X_{1}+b_{2}X_{2}+b_{12}X_{1}X_{2} $$
The 22 model has 4 terms.
The model form for a four-factor study is:
$$ \displaystyle \begin{alignedat}{1}\hat{Y}=\: & b_{0}+b_{1}X_{1}+b_{2}X_{2}+b_{3}X_{3}+b_{4}X_{4}+\\
& b_{12}X_{1}X_{2}+b_{13}X_{1}X_{3}+b_{14}X_{1}X_{4}+\\
& b_{23}X_{2}X_{3}+b_{24}X_{2}X_{4}+b_{34}X_{3}X_{4}+\\
& b_{123}X_{1}X_{2}X_{3}+b_{124}X_{1}X_{2}X_{4}+\\
& b_{134}X_{1}X_{3}X_{4}+b_{234}X_{2}X_{3}X_{4}+\\
& b_{1234}X_{1}X_{2}X_{3}X_{4}+
\end{alignedat} $$
The 24 model has 16 terms.
Model Coefficients
To summarize, the model coefficients are calculated as follows:
Model coefficients are determined from the (significant) effects. If factor X1 is significant, the “slope” coefficient for that term is:
$$ \displaystyle b_{1}=\frac{E_{1}}{2} $$
Coefficients on interaction terms are computed the same way. In general for the Xij interaction, the term coefficient is:
$$ \displaystyle b_{ij}=\frac{E_{ij}}{2} $$
We finish this article with an example of writing out the complete model, given the significant effects. Suppose an experiment is designed to understand camera battery life and the factors that may affect it. The response is a quantitative measure of battery life. The factors are:
- X1 – Wall thickness
- X2 – Cover strength
- X3 – Material Type
- X4 – Ambient Temperature
The effects for each of the effect columns in our matrix is shown below. The effect columns that are statistically significant are shown and highlighted in green below. The average response is also shown at the bottom of the response column (last column). Note that the effects are simply the average response for the highs (+) minus the average response for the lows (-) for that factor.
The predictive model (remember coefficients are 1/2 of the effects) is then:
$$ \displaystyle \hat{Y}=46.5-2.5X_{1}-3.5X_{4}-3.5X_{3}X_{4} $$
In summary, we have learned how to write out the predictive model for 2-level studies (once the significant effects have been determined).
In the next article, we will focus on using our model to find solutions to our problem.
Leave a Reply