STA215 Test2 F 2014 A Solutions
STA215 Test2 F 2014 A Solutions
STA215 Test2 F 2014 A Solutions
TA (circle one)
Eman
Tutorial (circle)
Thomas
Wael
Sudipta
Narges
Wed 11-12
Wed 12-1
Wed 1-2
Wed 5-6
Mon 4-5
Tues 12-1
Tues 1-2
Wed 4-5
Tues 9-10
Tues 10-11
Mon 4-5
Mon 5-6
Wed 9-10
Wed 10-11
Check that you have all the consecutively numbered pages of this test. Please give all probabilities
and proportions to four decimal places unless they are unnecessary zeroes.
Best marks go to best answers, as a general rule, particularly where some explanation is requested, so
try to be complete but also clear and concise; a lot of nonsense can decrease your grade.
Show your work and answer in the space provided (or indicate clearly where to look), and in
ink. Pencil may be used, but then remarks will not be allowed.
Marks are shown in brackets at the end of the question parts, and are distributed as follows:
Question
Max
Grade
1
5
2
15
3
10
4
10
Total
40
Good luck!
350
1) Below is some R Commander output from a regression analysis performed on some human
subjects. The persons heat output during a particular exercise was plotted against their mass (in
Kg) as shown below.
[5]
Max
29.036
Coefficients:
Estimate Std. Error t value
(Intercept) 129.8182
7.3561
17.65
Mass
3.8574
0.1924
20.05
Residual standard error: 14.81 on 21 DF
R-squared: 0.9503
F-statistic: 401.8 on 1 and 21 DF
p-value: 3.587e-15
250
3Q
7.530
200
Median
1.169
Heat Output
Residuals:
Min
1Q
-19.964 -10.898
300
Call:
lm(formula = Heat ~ Mass, data = Dataset)
20
30
40
50
Mass
a) Give the equation of best fit obtained from the software. (1)
= 129.8182 + 3.8574
b) Something was minimized to obtain this equation. Explain what was minimized in nonstatistical language, as though you are speaking to a first-year science student.
. (1)
.
Using the fitted model from the R output,
c) Predict the heat output of a 35 Kg person
( ). (1)
e) My mass is roughly 90 Kg. What heat output would you expect me to have?
. (1)
3
2) Below is some output from a regression analysis performed on a dataset containing the age and
systolic blood pressure measurement for 30 patients. These patients were a random sample from
all of the patients at a medical clinic in Toronto.
[15]
The regression equation is
Coef
98.71
0.971
S = 17.3137
SE Coef
10.00
0.2102
T
9.87
4.62
P
0.000
0.000
R-Sq = 43.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
28
29
200
blood_pressure
Predictor
Constant
age
220
180
160
140
120
SS
6394.0
8393.4
14787.5
MS
6394.0
299.8
F
21.33
100
10
20
30
40
age
50
60
70
Unusual Observations
Obs
age blood_pressure
2
47.0
220.00
a) Two items have been replaced with letters. Fill in what they should be.
(A) (B) -
98.71 (1)
age (1)
b) What does the estimated slope of 0.971 tell us about the relationship between blood
pressure and age? Be specific a qualitative answer will not suffice here.
, (1)
0.971 , (1)
c) ___43.2%_(1)_ of the variation in __Blood Pressure (1)_ can be explained by the relationship
with _____Age__(1)_. Fill in the blanks and be specific.
4
d) R has identified one unusual observation. This observation has: (circle one each) (1)
i) High leverage
True or False
True or False
True or False
e) Above is a plot of the residuals, with the outlier removed (it turns out this was a data entry
error). Do you see any problems? If so, list the problems in order of importance. If not, state
why you came to this conclusion.
. (1) (1).
f) A researcher wants to use this new model to predict the blood pressure of all Toronto
residents between the ages of 18-70. Is this an appropriate use of regression? Explain why or
why not using terminology from class. (2)
(1) (1)( ).
. (1)
( ). (1)
3) Choose one phrase from list A and however many phrases from list B that best describe each
situation. You may use some phrases more than once, or not at all, or you may combine them.
Indeed, some dont even make sense.
[10]
List A
Survey/Sample
Observational Study
Experiment
List B
Cross-sectional Study
Retrospective
Prospective
Cohort Study
Case-Control Study
a) A UTM employee wants to see if the program a student chooses has an effect on whether
they stay in school until graduation. She looks through the last ten years of graduation
records, noting what programs they enrolled in and whether they graduated or not.
This is a(n)
c) A researcher is given a dataset containing physician reports for all patients in the U.K. Using
this dataset, he randomly samples 200 patients with Crohns disease and 200 patients
without Crohns to determine if exercise levels can predict onset of the disease.
This is a(n)
_____Experiment_________ (List A)
_____Survey/Sample_____ (List A)
4) A researcher wants to investigate whether different forms of exercise can be used to help
hyperactive children. A group of 90 children is divided into two groups according to age - those
aged 9-12 and those aged 5-9. Within each age group the children are randomly assigned to one
of three groups. The first group will just do their normal physical activities. The second group
will be given an additional moderately demanding exercise routine. The third group will be
given an additional exercise routine that is very strenuous in nature. At the end of a four month
period parents will be asked to evaluate their children's progress as either {None, Low or High}.
Identify all the key design elements, such as:
[10]
a) the factors, levels, and treatments
Factor: Physical Activity (1)
Levels: Normal, Moderate, Strenuous (1)
Treatments: same as factor levels (1) since only 1 factor design
b) any blocking variables (if present)
c) response variable(s)
Progress (1)
d) use of blinding
None (1) the students surely know what exercise regime they are doing, and we figure the parents
would know this as well
e) possible improvements to the design (2 1mk for any of)
Blind the parents to the Tx by having students do exercise at school
Control for caloric intake
Block by sex as well as age
Evaluate results using a numerical metric instead of categories
Use more than 90 children / replicate the experiment (1 mk max for both)
<anything else that demonstrates the students understanding of the principles of
control, blinding, blocking, or randomization>
f) We could use side-by-side boxplots to compare the progress of children across the different
exercise groups.
True or False ? Response is categorical
g) If we notice a significant difference between the proportion of High progress from each
group, we can infer a cause-effect relationship. True or False ? No blinding