This website is out of date. The new site is at http://learnche.mcmaster.ca/4C3
Univariate data analysis (2011)
From Statistics for Engineering
 
Video timing  
 
 
Video timing  

Contents 
[edit] Course notes
 (PDF) Course notes
 Please print pages from Chapter 2.
 The full PDF is provided so that hyperlinks for crosssections will work as expected.
[edit] Projector overheads
[edit] Audio recordings of 2011 classes
Date  Material covered (approximate: may differ somewhat)  Audio file 

10 January 2011  About variability, histograms and frequency distributions  No audio available 
12 January 2011  Samples, population, robust methods, central limit theorem, independence  Class 4 
13 January 2011  The normal distribution; testing for normality with the qq plot  Class 5 
17 January 2011  The \(t\)distribution and confidence interval for the mean with given variance  Class 6 
19 January 2011  Confidence interval with unknown variance; tests for differences/similarity with a reference set  Class 7 
20 January 2011  Tests for differences with without a reference set  Class 8 
24 January 2011  Continued with tests with without a reference set; paired tests  Class 9 
Thanks to the various students responsible for recording and making these files available
[edit] Code used in class
Code used to illustrate how the qq plot is constructed:
N < 10 # What are the quantiles from the theoretical normal distribution? index < seq(1, N) P < (index  0.5) / N theoretical.quantity < qnorm(P) # Our sampled data: yields < c(86.2, 85.7, 71.9, 95.3, 77.1, 71.4, 68.9, 78.9, 86.9, 78.4) mean.yield < mean(yields) # 80.0 sd.yield < sd(yields) # 8.35 # What are the quantiles for the sampled data? yields.z < (yields  mean.yield)/sd.yield yields.z yields.z.sorted < sort(yields.z) # Compare the values in text: yields.z.sorted theoretical.quantity # Compare them graphically: plot(theoretical.quantity, yields.z.sorted, asp=1) abline(a=0, b=1) # Builtin R function to do all the above for you: qqnorm(yields) qqline(yields) # A better function: see http://connectmv.com/tutorials/rtutorial/extendingrwithpackages/ library(car) qqPlot(yields)
Code used to illustrate the central limit theorem's reduction in variance:
# Show the 3 plots side by side layout(matrix(c(1,2,3), 1, 3)) # Sample the population: N < 100 x < rnorm(N, mean=80, sd=5) mean(x) sd(x) # Plot the raw data x.range < range(x) plot(x, ylim=x.range, main='Raw data') # Subgroups of 2 subsize < 2 x.2 < numeric(N/subsize) for (i in 1:(N/subsize)) { x.2[i] < mean(x[((i1)*subsize+1):(i*subsize)]) } plot(x.2, ylim=x.range, main='Subgroups of 2') # Subgroups of 4 subsize < 4 x.4 < numeric(N/subsize) for (i in 1:(N/subsize)) { x.4[i] < mean(x[((i1)*subsize+1):(i*subsize)]) } plot(x.4, ylim=x.range, main='Subgroups of 4')
Code used to illustrate unpaired and paired tests:
#d.data < c(11,18,16,20,12,8,26,12,17,14) #m.data < c(25,27,30,33,16,28,27,12,32,16) d.data < c(11,26,18,16,20,12,8,26,12,17,14) m.data < c(25,3,27,30,33,16,28,27,12,32,16) d.n < length(d.data) m.n < length(m.data) d.mean < mean(d.data) m.mean < mean(m.data) d.sd < sd(d.data) m.sd < sd(m.data) # Unpaired difference #  DOF < m.n  1 + d.n  1 var.pooled < ((m.n1)*(m.sd)^2 + (d.n1)*(d.sd)^2) / DOF sample.diff < m.mean  d.mean denom.sd < sqrt(var.pooled * (1/m.n + 1/d.n)) z < sample.diff / denom.sd pt(z, df=DOF) ct < qt(0.975, df=DOF) CI.LB < sample.diff  ct * denom.sd CI.UB < sample.diff + ct * denom.sd c(CI.LB, CI.UB) # Paired difference #  diffs < m.data  d.data diffs.mean = mean(diffs) diffs.sd = sd(diffs) c(diffs.mean, diffs.sd) diffs.N = length(diffs) t.crit = qt(0.975, df=diffs.N1) t.crit LB = diffs.mean  t.crit * diffs.sd / sqrt(diffs.N) UB = diffs.mean + t.crit * diffs.sd / sqrt(diffs.N) c(LB, UB)