Factors
Factor = every aspect of the experimental conditions which may influ-
ence the response, as determined in the experiment
- 2 factors are additive when there is no interaction between them.
- Factors can be labelled with a +1 or -1 when there are 2 levels
- Level = 20oC and 80oC for example
effect = ̄y+ − ̄y−
Higher order interactions, for example, TPF (temperature, pressure, flow rate) are negligible, since
they are hard to be found in nature.
HOW TO CALCULATE MAIN EFFECT
Take the difference between the average of all values with a + and the average of all values with a –
(tutorial 1 question 2a)
HOW TO CALCULATE INTERACTION EFFECT
1. Write down the sum of the factors for each experiment:
+ and + -> +
+ and - -> -
- and - -> +
2. Take the difference between the average of all values with a + and the average of all values
with a –
(tutorial 1 question 2a)
Standard deviation
An effect is usually significant if it is 2-3 times larger than the standard error.
Degrees of freedom = number of measurements - 1
HOW TO CALCULATE THE STANDARD ERROR IF THERE ARE 2 REPLICATES
1. (per experiment)
2. (per experiment)
3.
4.
5. (2.1) (tutorial 1 question 2b)
Design matrix:
Number of experiments = levelsfactors
HOW TO MAKE A REGRESSION MODEL
x are the factors
1. Fill in a, this is the intercept, this is represented by a column of +. So a is the average of all
values
2. Fill in the main and interaction effects for b, c, d…
The regression model is the line with the values for a, b, c… filled in (tutorial 1 question 3a)
In fractional design only the most important factors are selected with screening
Screening = selecting most important factors
In a full factorial design all factors are used
HOW TO MAKE A EXPERIMENTAL DESIGN
1. Look at the number of experiments, then decide if you can use a full factorial design or a
fractional design
2. Make the design matrix, vary the combinations of + and -
Variable scales
1. Nominal: value is described with words, no ranking
2. Ordinal: also described with words, values are ranked
3. Interval: numbers
4. Ratio: numbers, has a 0 value
Mean and errors
Mean:
(4.1)
Mean = average
Mode = most occurring value
Median = middle value
Variance: (4.3)
Standard deviation: (4.4)
(4.6)
Range (R) = difference between highest and lowest value
When the number of measurements is limited,
d2 is a tabulated value, approximated by
Repeatability: under same circumstances
Reproducibility: under different circumstances
Blunders = personal errors that cause large deviations
Random errors = inevitable errors, related to precision, increase the spread around the central value.
Systematic errors = influence the result in a specific direction, instrumental errors, solutions, reading
off values
Distributions
Of discrete variables:
- Uniform: every possible outcome of an experiment is equally likely
- Binominal:
(5.2)
(5.3)
p = probability to find exactly k blue balls in n attempts
- Poisson distribution for low probability of succes
The probability for k events to occur (observe a given number of events within a fixed interval
of time)
with p the probability
Of continuous variables (real valued numbers):
- Continuous uniform: the probability for each possible value of the variable is equal
(5.5)
- Exponential (wait times between number of events in Poisson distribution)
Pdf: (5.6)
Cdf: (5.7)
- Normal
Pdf: (5.9)
- Lognormal: measurements are not symmetrically spread around the mean
Pdf: (5.10)
- Students t: For , the student’s t-distribution is exactly equal to the standardized normal
distribution. The smaller the number of degrees of freedom, the larger the difference
between the normal distribution and the student’s t-distribution
- distribution: described how the square of a standard normal variable is distributed
- F-distribution: ratio of two -distributed variables
A Q-Q plot is used to check if data is normally distributed, this is on the y axis and a theoretical
distribution on the x-axis. If the data is normally distributed it has a straight line.
The binomial distribution is a generalisation of the Bernoulli
distribution for the probability of observing a specific number of
‘success’ from multiple ‘trials’
Confidence interval
HOW TO CALCULATE A CONFIDENCE INTERVAL
When number of measurements is less than 30:
95% confidence interval:
(6.6)
Look for u in table B2, α = 0.05 unless stated otherwise
(or
68% -> t = 1
95% -> t = 2
99% -> t = 3)
Hypothesis tests
Ho = no significant difference
H1 = significant difference
Two sided = can be both smaller and higher
One sided = can be either smaller or higher -> look at question, not data
One-sample = compare one sample with reference
Two-sample = compare two samples
Significance level α is mostly 0.05, gives probability of type 1 error
Type 1 error: when H0 is incorrectly rejected given by α
Type 2 error: when H0 is incorrecty accepted given by β (mostly 0.2)
HOW TO DO A ONE-SAMPLE T-TEST
(7.1)
Look for utab in table B1 or B2 (two-sided test) or B3 or B4 (one-sided test)
If ucal > utab, the null hypothesis is rejected
When the sample is small (n is less than 30), u is replaced by t
HOW TO DO AN UNPAIRED TWO-SAMPLE T-TEST
(7.2)
(7.3)
(7.4) Welch’s test
The null hypothesis is rejected if tcal > t’
HOW TO TAKE A PAIRED TWO-SAMPLE T-TEST
Paired measurements are taken from the same object or person.
(7.6)
(7.7)
HOW TO DO AN F-TEST
While an t-test compares averages between two groups, an f-test is used to compare the standard
deviations between two groups
1. (7.8)
The largest variance is the numerator, the smallest variance is the denominator
2. Look at table B7 or B8 for Ftab, if Fcal < Ftab, H0 is accepted (the variances are not different) and
you can pool the variances. If Fcal > Ftab, use Welch’s test
Always first do an F-test, then a T-test
When to use an F-test:
- When samples are not dependent on each other
- When n < 30
When no F-test:
- Simple comparison of means (you can assume the variances are the same)
- Non-normal data
- One-sample t-test
Because you put the largest variance in the numerator, and the smallest in the denominator, the f-
test is one-sided
Non-parametric tests:
Wilcoxon signed-rank test (reader page 81)
1. Calculate the differences between the observations and the reference value
2. Rank the differences ignoring the minus sign
3. Restore the original minus sign in the ranked differences
4. Rank the ranked differences (1, 2, 3, 4…) (when 2 values are equal do _.5 for both)
5. Calculate the sum of the absolute value for both the positive and negative ranks
6. The smallest of the 2 sums is Tcal
7. Table B9 lists Ttab, if the null hypothesis is rejected, this is opposite from normal
Correlation and regression
Correlation coefficient determines how good the points are on a straight line, the larger r, the better
the correlation between x and y:
(8.1)
with a intercept and b slope
Is the linear relation significant? -> 2 ways:
1. (8.2)
Ttab in table B5, degrees of freedom is n-2 because you have x and y
2. Calculate the confidence interval of the slope and intercept, if 1 lies inside the confidence
interval of the slope and 0 in the one for the intercept, the methods give the same results
HOW TO MAKE A REGRESSION LINE
(8.7) (8.5) same equation
a = average y – b * average x
HOW TO CALCULATE THE CONFIDENCE INTERVALS OF A REGRESSION LINE
residuals:
(8.8)
(8.9)
(8.10)
ANOVA
Compares more than 2 series of measurements
H0: all means are equal
H1: at least one mean is significantly different
(9.2)
(9.3)
(9.4)
- ANOVA doesn’t test whether all group means are different
- ANOVA doesn’t require that the groups sizes are equal
- ANOVA doesn’t test whether the variances of different groups are equal