This website is out of date. The new site is at http://learnche.mcmaster.ca/4C3

Design and analysis of experiments (2012)

From Statistics for Engineering

Jump to: navigation, search
Class date(s): 8, 15, 22, 29 March 2012
Video material (part 1)
Download video: Link (plays in Google Chrome) [227 Mb]


Video material(part 2)
Download video: Link (plays in Google Chrome) [161 Mb]


Video material (part 3)
Download video: Link (plays in Google Chrome) [261 Mb]


Video material (part 4)
Download video: Link (plays in Google Chrome) [166 Mb]


Video material (part 5)
Download video: Link (plays in Google Chrome) [317 Mb]


Video material (part 6)
Download video: Link (plays in Google Chrome) [206 Mb]


Video material (part 7)
Download video: Link (plays in Google Chrome) [306 Mb]


Video material (part 8)
Download video: Link (plays in Google Chrome) [249 Mb]


Video material (part 9)
Download video: Link (plays in Google Chrome) [241 Mb]


Video material (part 10)
Download video: Link (plays in Google Chrome) [143 Mb]

Contents

[edit] Materials

Class date: 15, 22 March 2012
I want my notes with:  

  pages per physical page

Use page frames?

[edit] Software code (8 March 2012)

You can use R to find the DoE model:

T <- c(-1, +1, -1, +1)  # centered and scaled temperature
S <- c(-1, -1, +1, +1)  # centered and scaled substrate concentration
y <- c(69, 60, 64, 53)  # conversion is the response, y
mod <- lm(y ~ T + S + T*S)
summary(mod)
 
Call:
lm(formula = y ~ T + S + T * S)
 
Residuals:
ALL 4 residuals are 0: no residual degrees of freedom!
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)     61.5         NA      NA       NA
T               -5.0         NA      NA       NA
S               -3.0         NA      NA       NA
T:S             -0.5         NA      NA       NA
 
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:     1,	Adjusted R-squared:   NaN 
F-statistic:   NaN on 3 and 0 DF,  p-value: NA


You could also use MATLAB if you prefer:

% Set up the X matrix:
 
n = 4;
temperature = [-1, +1, -1, +1];
substrate = [-1, -1, +1, +1];
 
X = [ ones(n, 1), temperature', substrate', (temperature .* substrate)'];
 
% or you can type it in directly (but this is error prone)
X = [+1 -1 -1 +1; ...
     +1 +1 -1 -1; ...
     +1 -1 +1 -1; ...
     +1 +1 +1 +1];
 
% Conversion is your y-variable
y = [69, 60, 64, 53]';
 
% Regression coefficients = b = [Intercept, b_T, b_S, b_{TS}]
b = inv(X'*X)*X'*y

[edit] 3-factor example (8 March 2012)

Could not pull and update the repository; please report this problem to site administrator.

The data are from a plastics molding factory which must treat its waste before discharge. The \(y\) variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were:

  • \(C\): the chemical compound added [A or B]
  • \(T\): the treatment temperature [72°F or 100°F]
  • \(S\): the stirring speed [200 rpm or 400 rpm]
  • \(y\): the amount of pollutant discharged [lb per day]
Experiment Order \(C\) \(T\) [°F] \(S\) [rpm] \(y\) [lb]
1 5 A 72 200 5
2 6 B 72 200 30
3 1 A 100 200 6
4 4 B 100 200 33
5 2 A 72 400 4
6 7 B 72 400 3
7 3 A 100 400 5
8 8 B 100 400 4

We showed the cube plot for this system in the class on 10 March. From the cube plot we could already see the main factors, and even the CS interaction was noticeable.

Next, use computer software to verify that

\[y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S\]

The \(\mathbf{X}\) matrix and \(\mathbf{y}\) vector used to calculate the least squares model:

\[\begin{split}\begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &=\begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\ +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\ +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\ +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\ +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\ +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\ +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\ +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\\end{bmatrix}\begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS} \end{bmatrix} \\\mathbf{y} &= \mathbf{X} \mathbf{b}\end{split}\]

# Create the design matrix in a quick way in R
C <- T <- S <- c(-1, +1)
design <- expand.grid(C=C, T=T, S=S)
design
   C  T  S
1 -1 -1 -1
2  1 -1 -1
3 -1  1 -1
4  1  1 -1
5 -1 -1  1
6  1 -1  1
7 -1  1  1
8  1  1  1
 
C <- design$C
T <- design$T
S <- design$S
y <- c(5, 30, 6, 33, 4, 3, 5, 4)
 
# Full factorial model (error prone)
mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S)
 
# or the much more powerful notation will expand all terms up to the 3rd order interactions
mod.full <- lm( y ~ (C + T + S)^3 ) 
 
summary(mod.full)
Call:
lm(formula = y ~ C + T + S + C * T + C * S + S * T + C * T * S)
 
Residuals:
ALL 8 residuals are 0: no residual degrees of freedom!
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    11.25         NA      NA       NA
C               6.25         NA      NA       NA
T               0.75         NA      NA       NA
S              -7.25         NA      NA       NA
C:T             0.25         NA      NA       NA
C:S            -6.75         NA      NA       NA
T:S            -0.25         NA      NA       NA
C:T:S          -0.25         NA      NA       NA
 
# Guide to estimating significant coefficients, ignoring the intercept (1st coefficient)
coeff.full <- coef(mod.full)[2:length(coef(mod.full))]
 
# Pareto plot of the absolute coefficients
library(lattice)
barchart(sort(abs(coeff.full)), xlab="Magnitude of effect", groups=coeff.full>0, col=c("lightblue", "orange"))
 
# Another alternative: qqPlot identifying the 4 largest coefficients)
library(car)
qqPlot(coeff.full, id.n=length(coeff.full), envelope=FALSE)
 
# Build a partial model with only the significant effects
mod.partial <- lm(y ~ C + S + C*S)
summary(mod.partial) # notice the standard errors are all the same; why?
 
# and check confidence intervals
confint(mod.partial)
 
# Try to add back coefficients that were possibly significant and recheck confidence intervals
# The temperature, T, was the next highest coefficient:
mod.revised <- update(mod.partial, . ~ . + T)
summary(mod.revised)
 
# we see it is now significant at the 90 to 95% level


[edit] Example from class (15 March 2012)

Could not pull and update the repository; please report this problem to site administrator.

Your group is developing a new product, but have been struggling to get the product's stability, measured in days, to the level required. You are aiming for a stability value of 50 days or more. Four factors have been considered:

These eight experiments have been run so far:

Experiment Order A B C D Stability
1 5 \(-\) \(-\) \(-\) \(-\) 40
2 6 \(+\) \(-\) \(-\) \(+\) 27
3 1 \(-\) \(+\) \(-\) \(+\) 35
4 4 \(+\) \(+\) \(-\) \(-\) 21
5 2 \(-\) \(-\) \(+\) \(+\) 39
6 7 \(+\) \(-\) \(+\) \(-\) 27
7 3 \(-\) \(+\) \(+\) \(-\) 27
8 8 \(+\) \(+\) \(+\) \(+\) 20

  1. How was the experimented generated?
  2. What is the defining relationship?
  3. What will be aliased with A; with D and with BC?
  4. Describe the aliasing structure (resolution)?
  5. What is the model's intercept; main effect for A; and for the AD interaction?
  6. If the least squares model is: \(y = 29.5 -5.75x_A -3.75 x_B -1.25 x_C + 0.75 x_D + 0.50 x_A x_B + 1.0 x_A x_C - 1.0 x_A x_D\) what is the predicted stability when operating at:
    • monomer concentration of 25%
    • low acid concentration
    • 1.5% catalyst level
    • a temperature of 408 K

[edit] Other class examples (15 March 2012)

A <- B <- C<- D <- c(-1, 1)
d <- expand.grid(A=A, B=B, C=C, D=D)
y <- c(45,71,48,65,68,60,80,65,43,100,45,104,75,86,70,96)
A <- d$A
B <- d$B
C <- d$C
D <- d$D
 
mod.full <- lm(y ~ (A + B + C + D)^4 )
coeff.full <- coef(mod.full)[2:length(coef(mod.full))]
 
# Pareto plot of the absolute coefficients
library(lattice)
barchart(sort(abs(coeff.full)), xlab="Magnitude of effect", groups=coeff.full>0, col=c("lightblue", "orange"))
 
# Another alternative: qqPlot identifying the 4 largest coefficients)
library(car)
qqPlot(coeff.full, id.n=length(coeff.full), envelope=FALSE)
 
# Build a partial model with only the significant effects
mod.partial <- lm(y ~ A + C + D + A*C + A*D)
summary(mod.partial)
 
# and check confidence intervals
confint(mod.partial)
 
# Try to add back coefficients that were possibly significant and recheck confidence intervals
# The temperature, T, was the next highest coefficient:
mod.revised <- update(mod.partial, . ~ . + B + A*B*D)
summary(mod.revised)
confint(mod.revised)
 
 
#---------------
A <- B <- C<- D <- c(-1, 1)
g <- expand.grid(A=A, B=B, C=C)
y = c(40,27, 35, 21, 39, 27, 27, 20)
 
A <- g$A
B <- g$B
C <- g$C
D <- A*B*C
D
# [1] -1  1  1 -1  1 -1 -1  1
 
 
mod.doe <- lm(y ~ A +B+ C+ D+A*B+A*C+A*D)
summary(mod.doe)

[edit] General contour plots (22 March 2012)

Try running your own set of response surface optimizations for these processes:

Binary variables should be treated exactly like continuous variables in all respects during the optimization. The only difference is that it makes sense to visualize and use them at the two levels.

bound <- 4.2
 
N <- 150  # resolution of surface (higher values give smoother plots)
 
# The lower and upper bounds, in coded units, over which the surface is visualized
A <- seq(-bound, bound ,length=N)
B <- seq(-bound, bound, length=N)
grd <- expand.grid(A=A, B=B)
 
# A simple optimum:
grd$y <- 83 + 9.4*grd$A + 7.1*grd$B - 6*grd$A*grd$B - 7.4*grd$A**2 - 3.7*grd$B**2
 
# A stationary ridge:
grd$y = 83 + 10.0*grd$A + 5.6*grd$B - 7.6*grd$A*grd$B - 6.9*grd$A**2 - 2.0*grd$B**2
 
# A rising ridge:
grd$y = 83 + 8.8*grd$A + 8.2*grd$B - 7.6*grd$A*grd$B - 7.0*grd$A**2 - 2.4*grd$B**2
 
# A saddle point:
grd$y = 83 + 11.1*grd$A + 4.1*grd$B + 9.4*grd$A*grd$B - 6.5*grd$A**2 - 0.4*grd$B**2
 
library(lattice)
contourplot(y ~ A * B, 
            data = grd,
            cuts = 30, 
            region = TRUE,
            col.regions = terrain.colors,
            xlab = "A",
            ylab = "B",
            main = "Predicted response to factors A and B, in coded units"
           )

[edit] Contour plots for response surface methods (22 March 2012)

# Initial set of 4 runs
T <- c(-1,    +1,   -1,   +1,    0)
S <- c(-1,    -1,   +1,   +1,    0)
y <- c(193,  310,  468,  571,  407)
 
# After first step
T <- c(-1,    +1,   -1,   +1,    0,    1)
S <- c(-1,    -1,   +1,   +1,    0, 2.43)
y <- c(193,  310,  468,  571,  407,  669)
 
 
# Second factorial around point 6 as baseline, with points 8, 9, 10, 11
T <- c(-1,    +1,   -1,   +1,    0)
S <- c(-1,    -1,   +1,   +1,    0)
y <- c(694,  725,  620,  642,  688)
 
 
# The rest of the code is common to all factorial models
# ------------------------------------------------------
mod <- lm(y ~ T + S + T*S)
summary(mod)
 
N <- 50  # resolution of surface (higher values give smoother plots)
 
# The lower and upper bounds, in coded units, over which we want
# to visualize the surface
bound <- 4 
T_plot <- seq(-bound, bound ,length=N)
S_plot <- seq(-bound, bound, length=N)
grd <- expand.grid(T=T_plot, S=S_plot)
 
# Predict directly from least squares model
grd$y <- predict(mod, grd)
 
library(lattice)
contourplot(y ~ T * S, 
            data = grd,
            cuts = 10, 
            region = TRUE,
            col.regions = terrain.colors,
            xlab = "Temperature",
            ylab = "Substrate",
            main = "Predicted response to factors T and S, in coded units"
           )
 
trellis.focus("panel", 1, 1, highlight=FALSE)
lpoints(T, S, pch="O", col="red", cex=2)
trellis.unfocus()
 
# Add point 13 and 14 (you don't need to do all the CCD experiments)
T <- c(-1,    +1,   -1,   +1,    0,        0, +sqrt(2))
S <- c(-1,    -1,   +1,   +1,    0, -sqrt(2),        0)
y <- c(694,  725,  620,  642,  688,      720,      699)
 
# Add all CCD points: then add 15, 16
T <- c(-1,    +1,   -1,   +1,    0,        0, +sqrt(2),       0, -sqrt(2))
S <- c(-1,    -1,   +1,   +1,    0, -sqrt(2),        0, sqrt(2),        0)
y <- c(694,  725,  620,  642,  688,      720,      699,     610,      663)
 
# Add quadratic terms to the model
mod.quad <- lm(y ~ T + S + T*S + I(S^2) + I(T^2))
summary(mod.quad)
 
# Predict directly from least squares model
grd$y <- predict(mod.quad, grd)
library(lattice)
contourplot(y ~ T * S, 
            data = grd,
            cuts = 20, 
            region = TRUE,
            col.regions = terrain.colors,
            xlab = "Temperature",
            ylab = "Substrate",
            main = "Predicted response to factors T and S, in coded units"
           )
trellis.focus("panel", 1, 1, highlight=FALSE)
lpoints(T, S, pch="O", col="red", cex=2)
trellis.unfocus()
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox