This website is out of date. The new site is at http://learnche.mcmaster.ca/4C3

# Design and analysis of experiments (2012)

Class date(s): 8, 15, 22, 29 March 2012
Video material (part 1)

Video material(part 2)

Video material (part 3)

Video material (part 4)

Video material (part 5)

Video material (part 6)

Video material (part 7)

Video material (part 8)

Video material (part 9)

Video material (part 10)

## Materials

 Class date: 15, 22 March 2012 I want my notes with: 1x1 (landscape) 2x1 (portrait) 3x1 (portrait) 3x1 (but with space for notes) 2x2 (landscape) 3x2 (portrait) pages per physical page Use page frames?

##  Software code (8 March 2012)

You can use R to find the DoE model:

T <- c(-1, +1, -1, +1)  # centered and scaled temperature
S <- c(-1, -1, +1, +1)  # centered and scaled substrate concentration
y <- c(69, 60, 64, 53)  # conversion is the response, y
mod <- lm(y ~ T + S + T*S)
summary(mod)

Call:
lm(formula = y ~ T + S + T * S)

Residuals:
ALL 4 residuals are 0: no residual degrees of freedom!

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)     61.5         NA      NA       NA
T               -5.0         NA      NA       NA
S               -3.0         NA      NA       NA
T:S             -0.5         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:     1,	Adjusted R-squared:   NaN
F-statistic:   NaN on 3 and 0 DF,  p-value: NA

You could also use MATLAB if you prefer:

% Set up the X matrix:

n = 4;
temperature = [-1, +1, -1, +1];
substrate = [-1, -1, +1, +1];

X = [ ones(n, 1), temperature', substrate', (temperature .* substrate)'];

% or you can type it in directly (but this is error prone)
X = [+1 -1 -1 +1; ...
+1 +1 -1 -1; ...
+1 -1 +1 -1; ...
+1 +1 +1 +1];

y = [69, 60, 64, 53]';

% Regression coefficients = b = [Intercept, b_T, b_S, b_{TS}]
b = inv(X'*X)*X'*y

##  3-factor example (8 March 2012)

Could not pull and update the repository; please report this problem to site administrator.

The data are from a plastics molding factory which must treat its waste before discharge. The $$y$$ variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were:

• $$C$$: the chemical compound added [A or B]
• $$T$$: the treatment temperature [72°F or 100°F]
• $$S$$: the stirring speed [200 rpm or 400 rpm]
• $$y$$: the amount of pollutant discharged [lb per day]
Experiment Order $$C$$ $$T$$ [°F] $$S$$ [rpm] $$y$$ [lb]
1 5 A 72 200 5
2 6 B 72 200 30
3 1 A 100 200 6
4 4 B 100 200 33
5 2 A 72 400 4
6 7 B 72 400 3
7 3 A 100 400 5
8 8 B 100 400 4

We showed the cube plot for this system in the class on 10 March. From the cube plot we could already see the main factors, and even the CS interaction was noticeable.

• C effect: There are 4 estimates of $$C = \displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}$$

• T effect: There are 4 estimates of $$T = \displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}$$

• S effect: There are 4 estimates of $$S = \displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}$$

• CT interaction: There are 2 estimates of $$CT$$. Recall that interactions are calculated as the half difference going from high to low. Consider the change in $$C$$ when

• $$T_\text{high}$$ (at $$S$$ high) = 4 - 5 = -1
• $$T_\text{low}$$ (at $$S$$ high) = 3 - 4 = -1
• First estimate = [(-1) - (-1)]/2 = 0
• $$T_\text{high}$$ (at $$S$$ low) = 33 - 6 = +27
• $$T_\text{low}$$ (at $$S$$ low) = 30 - 5 = +25
• Second estimate = [(+27) - (+25)]/2 = +1
• Average CT interaction = (0 + 1)/2 = 0.5
• You can interchange $$C$$ and $$T$$ and still get the same result.
• CS interaction: There are 2 estimates of $$CS$$. Consider the change in $$C$$ when

• $$S_\text{high}$$ (at $$T$$ high) = 4 - 5 = -1
• $$S_\text{low}$$ (at $$T$$ high) = 33 - 6 = +27
• First estimate = [(-1) - (+27)]/2 = -14
• $$S_\text{high}$$ (at $$T$$ low) = 3 - 4 = -1
• $$S_\text{low}$$ (at $$T$$ low) = 30 - 5 = +25
• Second estimate = [(-1) - (+25)]/2 = -13
• Average CS interaction = (-13 - 14)/2 = -13.5
• You can interchange $$C$$ and $$S$$ and still get the same result.
• ST interaction: There are 2 estimates of $$ST$$: (-1 + 0)/2 = -0.5, calculate in the same way as above.

• CTS interaction: There is only a single estimate of $$CTS$$:

• $$CT$$ effect at high $$S$$ = 0
• $$CT$$ effect at low $$S$$ = +1
• $$CTS$$ interaction = [(0) - (+1)] / 2 = -0.5
• You can calculate this also by considering the $$CS$$ effect at the two levels of $$T$$
• Or, you can calculate this by considering the $$ST$$ effect at the two levels of $$C$$.
• All 3 approaches give the same result.

Next, use computer software to verify that

$y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S$

The $$\mathbf{X}$$ matrix and $$\mathbf{y}$$ vector used to calculate the least squares model:

$\begin{split}\begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &=\begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\ +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\ +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\ +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\ +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\ +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\ +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\ +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\\end{bmatrix}\begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS} \end{bmatrix} \\\mathbf{y} &= \mathbf{X} \mathbf{b}\end{split}$

# Create the design matrix in a quick way in R
C <- T <- S <- c(-1, +1)
design <- expand.grid(C=C, T=T, S=S)
design
C  T  S
1 -1 -1 -1
2  1 -1 -1
3 -1  1 -1
4  1  1 -1
5 -1 -1  1
6  1 -1  1
7 -1  1  1
8  1  1  1

C <- design$C T <- design$T
S <- design$S y <- c(5, 30, 6, 33, 4, 3, 5, 4) # Full factorial model (error prone) mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S) # or the much more powerful notation will expand all terms up to the 3rd order interactions mod.full <- lm( y ~ (C + T + S)^3 ) summary(mod.full) Call: lm(formula = y ~ C + T + S + C * T + C * S + S * T + C * T * S) Residuals: ALL 8 residuals are 0: no residual degrees of freedom! Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.25 NA NA NA C 6.25 NA NA NA T 0.75 NA NA NA S -7.25 NA NA NA C:T 0.25 NA NA NA C:S -6.75 NA NA NA T:S -0.25 NA NA NA C:T:S -0.25 NA NA NA # Guide to estimating significant coefficients, ignoring the intercept (1st coefficient) coeff.full <- coef(mod.full)[2:length(coef(mod.full))] # Pareto plot of the absolute coefficients library(lattice) barchart(sort(abs(coeff.full)), xlab="Magnitude of effect", groups=coeff.full>0, col=c("lightblue", "orange")) # Another alternative: qqPlot identifying the 4 largest coefficients) library(car) qqPlot(coeff.full, id.n=length(coeff.full), envelope=FALSE) # Build a partial model with only the significant effects mod.partial <- lm(y ~ C + S + C*S) summary(mod.partial) # notice the standard errors are all the same; why? # and check confidence intervals confint(mod.partial) # Try to add back coefficients that were possibly significant and recheck confidence intervals # The temperature, T, was the next highest coefficient: mod.revised <- update(mod.partial, . ~ . + T) summary(mod.revised) # we see it is now significant at the 90 to 95% level ##  Example from class (15 March 2012) Could not pull and update the repository; please report this problem to site administrator. Your group is developing a new product, but have been struggling to get the product's stability, measured in days, to the level required. You are aiming for a stability value of 50 days or more. Four factors have been considered: • A = monomer concentration: 30% or 50% • B = acid concentration: low or high • C = catalyst level: 2% or 3% • D = temperature: 393K or 423K These eight experiments have been run so far: Experiment Order A B C D Stability 1 5 $$-$$ $$-$$ $$-$$ $$-$$ 40 2 6 $$+$$ $$-$$ $$-$$ $$+$$ 27 3 1 $$-$$ $$+$$ $$-$$ $$+$$ 35 4 4 $$+$$ $$+$$ $$-$$ $$-$$ 21 5 2 $$-$$ $$-$$ $$+$$ $$+$$ 39 6 7 $$+$$ $$-$$ $$+$$ $$-$$ 27 7 3 $$-$$ $$+$$ $$+$$ $$-$$ 27 8 8 $$+$$ $$+$$ $$+$$ $$+$$ 20 1. How was the experimented generated? 2. What is the defining relationship? 3. What will be aliased with A; with D and with BC? 4. Describe the aliasing structure (resolution)? 5. What is the model's intercept; main effect for A; and for the AD interaction? 6. If the least squares model is: $$y = 29.5 -5.75x_A -3.75 x_B -1.25 x_C + 0.75 x_D + 0.50 x_A x_B + 1.0 x_A x_C - 1.0 x_A x_D$$ what is the predicted stability when operating at: • monomer concentration of 25% • low acid concentration • 1.5% catalyst level • a temperature of 408 K ##  Other class examples (15 March 2012) A <- B <- C<- D <- c(-1, 1) d <- expand.grid(A=A, B=B, C=C, D=D) y <- c(45,71,48,65,68,60,80,65,43,100,45,104,75,86,70,96) A <- d$A
B <- d$B C <- d$C
D <- d$D mod.full <- lm(y ~ (A + B + C + D)^4 ) coeff.full <- coef(mod.full)[2:length(coef(mod.full))] # Pareto plot of the absolute coefficients library(lattice) barchart(sort(abs(coeff.full)), xlab="Magnitude of effect", groups=coeff.full>0, col=c("lightblue", "orange")) # Another alternative: qqPlot identifying the 4 largest coefficients) library(car) qqPlot(coeff.full, id.n=length(coeff.full), envelope=FALSE) # Build a partial model with only the significant effects mod.partial <- lm(y ~ A + C + D + A*C + A*D) summary(mod.partial) # and check confidence intervals confint(mod.partial) # Try to add back coefficients that were possibly significant and recheck confidence intervals # The temperature, T, was the next highest coefficient: mod.revised <- update(mod.partial, . ~ . + B + A*B*D) summary(mod.revised) confint(mod.revised) #--------------- A <- B <- C<- D <- c(-1, 1) g <- expand.grid(A=A, B=B, C=C) y = c(40,27, 35, 21, 39, 27, 27, 20) A <- g$A
B <- g$B C <- g$C
D <- A*B*C
D
# [1] -1  1  1 -1  1 -1 -1  1

mod.doe <- lm(y ~ A +B+ C+ D+A*B+A*C+A*D)
summary(mod.doe)

##  General contour plots (22 March 2012)

Try running your own set of response surface optimizations for these processes:

• A simple optimum: $$y = 83 +9.4x_A +7.1x_B -6 x_A x_B - 7.4 x_A^2 -3.7 x_B^2$$
• A stationary ridge: $$y = 83 + 10.0x_A + 5.6x_B - 7.6 x_A x_B - 6.9 x_A^2 - 2.0 x_B^2$$
• A rising ridge: $$y = 83 + 8.8x_A + 8.2x_B - 7.6 x_A x_B - 7.0 x_A^2 - 2.4 x_B^2$$
• A saddle point: $$y = 83 + 11.1x_A + 4.1x_B - 9.4 x_A x_B - 6.5 x_A^2 - 0.4 x_B^2$$

Binary variables should be treated exactly like continuous variables in all respects during the optimization. The only difference is that it makes sense to visualize and use them at the two levels.

bound <- 4.2

N <- 150  # resolution of surface (higher values give smoother plots)

# The lower and upper bounds, in coded units, over which the surface is visualized
A <- seq(-bound, bound ,length=N)
B <- seq(-bound, bound, length=N)
grd <- expand.grid(A=A, B=B)

# A simple optimum:
grd$y <- 83 + 9.4*grd$A + 7.1*grd$B - 6*grd$A*grd$B - 7.4*grd$A**2 - 3.7*grd$B**2 # A stationary ridge: grd$y = 83 + 10.0*grd$A + 5.6*grd$B - 7.6*grd$A*grd$B - 6.9*grd$A**2 - 2.0*grd$B**2

# A rising ridge:
grd$y = 83 + 8.8*grd$A + 8.2*grd$B - 7.6*grd$A*grd$B - 7.0*grd$A**2 - 2.4*grd$B**2 # A saddle point: grd$y = 83 + 11.1*grd$A + 4.1*grd$B + 9.4*grd$A*grd$B - 6.5*grd$A**2 - 0.4*grd$B**2

library(lattice)
contourplot(y ~ A * B,
data = grd,
cuts = 30,
region = TRUE,
col.regions = terrain.colors,
xlab = "A",
ylab = "B",
main = "Predicted response to factors A and B, in coded units"
)

##  Contour plots for response surface methods (22 March 2012)

# Initial set of 4 runs
T <- c(-1,    +1,   -1,   +1,    0)
S <- c(-1,    -1,   +1,   +1,    0)
y <- c(193,  310,  468,  571,  407)

# After first step
T <- c(-1,    +1,   -1,   +1,    0,    1)
S <- c(-1,    -1,   +1,   +1,    0, 2.43)
y <- c(193,  310,  468,  571,  407,  669)

# Second factorial around point 6 as baseline, with points 8, 9, 10, 11
T <- c(-1,    +1,   -1,   +1,    0)
S <- c(-1,    -1,   +1,   +1,    0)
y <- c(694,  725,  620,  642,  688)

# The rest of the code is common to all factorial models
# ------------------------------------------------------
mod <- lm(y ~ T + S + T*S)
summary(mod)

N <- 50  # resolution of surface (higher values give smoother plots)

# The lower and upper bounds, in coded units, over which we want
# to visualize the surface
bound <- 4
T_plot <- seq(-bound, bound ,length=N)
S_plot <- seq(-bound, bound, length=N)
grd <- expand.grid(T=T_plot, S=S_plot)

# Predict directly from least squares model
grd$y <- predict(mod, grd) library(lattice) contourplot(y ~ T * S, data = grd, cuts = 10, region = TRUE, col.regions = terrain.colors, xlab = "Temperature", ylab = "Substrate", main = "Predicted response to factors T and S, in coded units" ) trellis.focus("panel", 1, 1, highlight=FALSE) lpoints(T, S, pch="O", col="red", cex=2) trellis.unfocus() # Add point 13 and 14 (you don't need to do all the CCD experiments) T <- c(-1, +1, -1, +1, 0, 0, +sqrt(2)) S <- c(-1, -1, +1, +1, 0, -sqrt(2), 0) y <- c(694, 725, 620, 642, 688, 720, 699) # Add all CCD points: then add 15, 16 T <- c(-1, +1, -1, +1, 0, 0, +sqrt(2), 0, -sqrt(2)) S <- c(-1, -1, +1, +1, 0, -sqrt(2), 0, sqrt(2), 0) y <- c(694, 725, 620, 642, 688, 720, 699, 610, 663) # Add quadratic terms to the model mod.quad <- lm(y ~ T + S + T*S + I(S^2) + I(T^2)) summary(mod.quad) # Predict directly from least squares model grd$y <- predict(mod.quad, grd)
library(lattice)
contourplot(y ~ T * S,
data = grd,
cuts = 20,
region = TRUE,
col.regions = terrain.colors,
xlab = "Temperature",
ylab = "Substrate",
main = "Predicted response to factors T and S, in coded units"
)
trellis.focus("panel", 1, 1, highlight=FALSE)
lpoints(T, S, pch="O", col="red", cex=2)
trellis.unfocus()