Tutorial 4
Tutorial 4
TUTORIAL 4
Download the t4e1, t4e2, t4e3, t4e4, t4e5a and t4e5b Excel data files from the subject
website and save them to your computer or USB flash drive. Read this handout and try to
complete the tutorial exercises before your tutorial class, so that you can ask help from your
tutor during the Zoom session if necessary.
After you have completed the tutorial exercises attempt the “Exercises for assessment”.
You must submit your answers to these exercises in the Tutorial 4 Homework Canvas
Assignment Quiz by the next tutorial in order to get the tutorial mark. For each assessment
exercise type your answer in the relevant box available in the Quiz or upload your typed
answer separately in PDF format as an attachment. In either case, if the exercise requires
you to use R, save the relevant R/RStudio script and printout and upload it together with
your written answer in PDF format.
In certain situations, we might be interested to know whether a certain treatment has some
significant effect on the central location (measured by the mean or the median) of a
population, while in some other situations we might wish to compare the central locations of
two distinct populations. We label the first scenario as paired-sample design (or matched
pairs experiment) and the second as independent measures design. In both cases the focus
is on the difference between two population central locations, in the case of the paired-
sample design we are interested in the difference between the before treatment and after
treatment central locations, while in the case of the independent measures design we are
interested in the difference between the central locations of two distinct populations.
To illustrate the paired-sample design, suppose that in order to find out whether some newly
designed golf clubs improve golfers’ performance, we ask a group of golfers to play a round
on a familiar golf course with their own clubs and then another round with the new clubs. Or,
suppose we want to find out whether a particular real estate agency tends to overvalue the
properties of potential vendors in order to secure more business, and we compare a sample
of evaluations by this agency to the evaluations of the same properties by some independent
property valuer.
In both of these examples, there is just one set of experimental units (golfers; properties),
one variable of interest (golfers’ scores on the given course; appraised values of properties),
and a single random sample of pairs of observations (pairs of scores with the old and new
clubs, respectively; pairs of appraised values provided by the real estate agency and the
independent property valuer, respectively). Most importantly, the sample elements (golfers;
properties) are supposed to be selected randomly but the observations in any particular pair
of observations are related to each other.
To illustrate the independent measures design, suppose that we are interested in the
customer satisfaction levels of two competing paid television channels ‘A’ and ‘B’, and ask
1
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
a sample of viewers who usually watch channel ‘A’ and another sample of viewers who
usually watch ‘B’ to answer a few questions about their level of satisfaction. Or, suppose we
are interested in the relationship between job tenure and qualification at a company, and
compare the length of time employees with a bachelor’s degree or higher have been working
at the company with that of employees who do not have such a degree.
In these examples, there are two different sets of experimental units (viewers of the two
television channels; employees with a bachelor’s degree or higher and employees without
such degree), one variable of interest (customer satisfaction; job tenure), but two random
samples (samples of the viewers of the two channels; samples of the two types of
employees). Crucially, these random samples are supposed to be independent of each
other.
Paired-Sample Design
Once the differences between the corresponding observations are calculated, we can apply
the same inferential procedures on the central location of the population of differences than
on the central location of any quantitative population.
These procedures, i.e. the matched-pairs Z / t tests and the corresponding confidence
interval for the difference (D) between the before and after population means are based on
the following assumptions:
i. The data is a random sample of pairs of observations (i.e. the before and after samples
are not independent of each other).
ii. The variable of interest is quantitative and continuous.
iii. The measurement scale is interval or ratio.
iv. Either (Z test) the population standard deviation of the differences, D, is known and
the sample mean of the differences is at least approximately normally distributed, or
(t-test) D is unknown but the population of the differences is normally distributed (at
least approximately).
A pupilometer is a device used to observe changes in pupil dilations at the eye exposed to
different visual stimuli. Since there is a direct correlation between the amount an individual‘s
pupil dilates and his or her interest in the stimuli, marketing organizations sometimes use
pupilometer to help them evaluate potential consumer interest in new products, alternative
package designs, and other factors (Optical Engineering, Mar. 1995). The Design and
Market Research Laboratories of the Container Corporation of America used a pupilometer
to evaluate consumer reaction to different silverware patterns for a client. Suppose 15
2
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
consumers were chosen at random, and each was shown the same two silverware patterns.
Their pupilometer readings (in millimetres) are saved in the t4e1 Excel file.
a) What are the appropriate null and alternative hypotheses to test whether the mean
amount of pupil dilation differs for the two patterns?
Suppose the researcher shows two silverware patterns one after the other to a client
and after each experiment measures his/her pupil dilation. Denote these pupilometer
readings as X1 and X2, respectively. These measurements form a pair of matching
observations, and the experiment itself is based on a paired-sample design.
If Di denotes the difference between the two measurements for client i, i.e. Di = X1i - X2i,
and μD the mean of population D, then the question implies the following null and
alternative hypotheses:
H0 : D 0 , HA : D 0
b) Conduct the test in part (a) using α = 0.05, assuming that the population of D is normally
distributed. Interpret the results.
Launch RStudio, create a new RStudio project and script, import the data from the Excel
file to RStudio and load it into your current project. The pupilometer measurements are
named Pattern1 and Pattern2. Calculate the differences between the corresponding
measurements:
D = Pattern1 - Pattern2
Since the standard deviation of the population of D is unknown, but the population of D
is assumed to be normally distributed, you can now perform a t-test. Do not worry about
doing the calculations manually during the tutorial class, you can do so later.1 Right now
use R like in Exercise 3 of Tutorial 3 and execute the following command:2
t.test(D)
1
To save time, we are going to perform the test with R. Note, however, that you are expected to be able to do
the required calculations with your hand calculator as well. Since this is a simple t-test on the population mean
of the differences, this should not be a problem, granted that you know how to use your calculator efficiently.
2
Recall that by default t.test performs a two-tail test with zero hypothesized population mean at the 5%
significance level. Hence, this time we need to specify only the name of the variable.
3
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
The observed test statistic is tobs = 5.7637 and the p-value is practically zero, so H0 can
be rejected at any significance level. Therefore we conclude that the mean amount of
pupil dilation differs for the two patterns.
In order to highlight the equivalence between the t-test for a single population mean and
the paired-sample t-test, we first calculated D and then run a t-test on it. Using R,
however, there is no need for this two-step procedure, we can perform the paired-
sample t-test straight on the the original variables Pattern1 and Pattern2 by adding the
paired = TRUE argument to the t.test command:
The new printout is on the next page. As you can see, the two printouts are formatted
slightly differently, but otherwise they are indeed alike.
c) Interpret the 95% confidence interval for the difference between the two pupil dilation
population measurements, i.e. for population D.
With 95% confidence, the difference in the mean pupil dilation between pattern 1 and
pattern 2 is somewhere between 0.1503 and 0.3283 millimetres.
d) Is the paired-sample design used for this study preferable to the independent measures
design? For independent samples we could select 30 consumers, divide them into two
groups of 15, and show each group a different pattern. Explain your preference.
e) In part (b) it was assumed that the population of differences is normally distributed. Since
the sample size is only 15, this assumption is fairly crucial. However, given this small
sample size, the usual diagnostics for normality can be unreliable and misleading. For
this reason perform the appropriate non-parametric test(s) for the median of the
differences between the two pupil dilation measurements. Do you arrive at the same
conclusion than in part (b)?
Last week on the tutorial you used two non-parametric alternatives of the t-test for a
population mean, the one sample sign test and the one sample Wilcoxon signed ranks
test for the population median. The same tests can be performed on D, or on Pattern1
and Pattern2.
4
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
Let’s start with the sign test. When it is performed on two samples, it has the following
requirements:
i. The data is a random sample of independent pairs of observations (i.e. the before
and after samples are not independent of each other but the selected pairs are).
ii. The variable of interest is qualitative or quantitative.
iii. The measurement scale is at least ordinal.
Since the consumers were selected randomly and each was shown the same two
silverware patterns, and pupilometer reading is a quantitative variable measured on a
ratio scale, all requirements are satisfied.
H0 : 0 , HA : 0
Execute
library(DescTools)
SignTest(D)
Alternatively, you can run the test on the original variables instead of D, just like before.
SignTest(Pattern1, Pattern2)
5
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
This time the test is labelled Dependent-samples Sign-test, but otherwise the two
printouts are equivalent. The test statistic is S = 14 and the p-value is less than 0.001,
so H0 can be rejected at any reasonable significance level. Therefore we conclude that
the median amount of pupil dilation differs for the two patterns.
Let’s move on to the two-sample Wilcoxon signed ranks test. It assumes that
i. The data is a random sample of pairs of observations (i.e. the before and after
samples are not independent of each other but the elected pairs are).
ii. The variable of interest is quantitative and continuous.
iii. The measurement scale is interval or ratio.
iv. The distribution of the differencesis symmetric.
As we already saw, the first three requirements are satisfied in this example, so we
need to consider only the fourth requirement.
library(pastecs)
round(stat.desc(D, basic = FALSE, desc = TRUE, norm = TRUE),3)
The first two commands return a histogram with a normal curve superimposed on it and
the next two commands return a Q-Q plot (see them on the next page). Finally, the last
two commands call the pastecs library and display the following statistics:
Try to evaluate these outputs the way we did last week. You should conclude that they
do not cast any doubt on the assumption of symmetry. It is important to remember
though that this conclusion is not very sound this time due to the small sample size. We
do not need to worry about this uncertainty at this stage because if the sign test and the
Wilcoxon signed ranks test lead to the same conclusion, then it does not really matter
whether the population of D is symmetric or not.
6
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
Histogram of D
3.0
1.5 2.0 2.5
Density
0.5 1.0
0.0
0.4
0.3
0.2
0.1
0.0
-1 0 1
Theoretical Quantiles
Similarly to the sign test, for the sake of illustration, perform the Wilconon signed ranks
test twice by executing the following commands:
library(exactRankTests)
wilcox.exact(D)
wilcox.exact(Pattern1, Pattern2, paired = TRUE)
7
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
They return:
and
Again, these two printouts are equivalent. They show that the p-value is less than
0.00015, so H0 can be rejected at any reasonable significance level implying that the
median amount of pupil dilation differs for the two patterns. This is the same conclusion
as the one we arrived at before on the basis of the sign test. Note also that apart from
the fact that in part (b) we tested the population mean while this time the population
median, the conclusions are the same. Therefore, it is not really crucial this time
whether the sampled population is normally distributed (required by the t-test) or is at
least symmetric (required by the Wilcoxon signed ranks test).
Suppose we have two independent random samples of size n1 and n2 drawn from two
quantitative populations that have 1 and 2 means and 12 and 22 variances. The
parameter of interest is the difference between the population means, 1 - 2, which can be
estimated with the difference of the sample means and, depending on 12 and 22, we
distinguish three possible scenarios.
x1 x2 z /2 x x where x x 1 2
2 2
1 2 1 2
n1 n2
and hypotheses about 1 - 2 can be tested with z-tests based on the following test
statistic:
Z
X 1 X 2 D ,0
N (0,1)
x x
1 2
8
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
(2) The population variances are unknown but equal.
In this case the common population variance can be estimated with the sample
variance of the combined or pooled sample
and the standard error of the difference between the two sample means is
s 2p s 2p 1 1
sx1 x2 sp
n1 n2 n1 n2
Assuming that the sampled populations are not extremely non-normal, the
confidence interval estimator of the difference between the population means is
x1 x2 tdf , /2 sx x 1 2
where the degrees of freedom is d f n1 n 2 2 ,
and hypotheses about 1 - 2 can be tested with t-tests based on the following test
statistic:
T
X 1 X 2 D ,0
t df
s x1 x2
(3) The population variances are unknown and different.
In this case the two population variances have to be estimated separately with the
corresponding sample variances, s12 and s22. Similarly, the variances of the two
sample means have to be estimated separately in the usual way, i.e.
s12 s2
sx21 , sx22 2
n1 n2
Given these variances, the standard error of the difference between the two sample
means can be estimated with
s12 s22
sx1 x2 s s 2
x1 2
x2
n1 n2
Assuming again that the sampled populations are not extremely non-normal, the
confidence interval estimator of the difference between the population means is like
in scenario (2),
x1 x2 tdf , /2 sx x 1 2
9
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
and hypotheses about 1 - 2 can be tested with t-tests based on the same test
statistic than in scenario (2), i.e.
T
X 1 X 2 D ,0
t df
s x1 x2
s
2
2
x1 x2
df .
s
2 2
2
x1 / ( n1 1) s 2
x2 / ( n2 1)
Marketing strategists would like to predict consumers’ response to new products and their
accompanying promotional schemes. Consequently, studies that examine the differences
between buyers and non-buyers of a product are of interest. One classic study conducted
by Shuchman and Riesz (Journal of Marketing Research, Feb. 1975) was aimed at
characterizing the purchasers and non-purchasers of Crest toothpaste. The researchers
demonstrated that both the mean household size (number of persons) and mean household
income were significantly larger for purchasers than for non-purchasers. A similar study
utilized independent random samples of size 20 on the age of the householder primarily
responsible for buying toothpaste. Householders were categorized as non-purchaser or
purchaser of a particular brand of toothpaste coded as N and P, respectively. The data are
saved in the t4e2 file.
a) Obtain and interpret a 90% confidence interval for the difference between the mean
ages of purchasers and non-purchasers.
Let’s denote the age of non-purchasers as X1 and the age of purchasers as X2. We have
two independent random samples of the same size n1 = n2 = 20 on X1 and X2 and the
experiment is based on an independent measures design. We need to develop a
confidence interval for the difference between the population means, 1 - 2. As we have
just discussed, there are three possible scenarios, but since the population variances
10
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
are unknown, we can discard the first one. In order to decide whether scenario (2) or (3)
is the more appropriate, we need to compare the sample variances.
For the sake of illustration, we develop the required confidence interval first manually,
but to save time we are going to obtain the sample means and sample standard
deviations with R.
Launch RStudio, create a new RStudio project and script, import the data from the Excel
file to RStudio and load it into your current project.
Last week you learnt that for a single variable these statistics are provided by the mean
and sd commands. This time, however, we need them separately for the two different
types of housholders. This can be achieved by the
function, where data is the variable or data frame to be analysed, byvar is the variable that
specifies the groups (i.e. grouping variable), and fun is a function to be applied to the subsets
of data.
Hence, execute
Householder: N
[1] 47.2
Householder: P
[1] 39.8
and
Householder: P
[1] 10.03992
The two sample variances are s12 = (13.621)2 = 185.532 and s22 = (10.040)2 = 100.802.
Their ratio, s12 / s22 = 1.84, seems to be too big to assume that the corresponding
11
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
population variances are equal, so let’s follow the third scenario.3 Assume, moreover,
that the sampled populations are not extremely non-normal. Then we can develop the
required confidence interval as follows.
and the estimate of the standard error of the difference between the two sample means
is
s
2
2
x1 x2 (3.7842 ) 2
df 34.9 35
s s
2 2
2 2 9.277 2 5.040 2
x1 x2
19 19
n1 1 n2 1
Putting all these together, the 90% confidence interval estimate of the difference
between the mean ages of purchasers and non-purchasers is
x1 x2 tdf , /2 sx x 1 2
(47.2 39.8) 1.690 3.784 (1.005;13.795)
It means that with 90% confidence the difference between the mean ages of purchasers
and non-purchasers is somewhere between 1.0 and 13.8 years.4
b) What assumptions did you make in part (a)? Are they likely satisfied?
The confidence interval in part (a) was based on the assumptions that the data consists
of two independent random samples, the variable of interest is quantitative and
continuous, the measurement scale is interval or ratio, the population standard
deviations are unknown but the sampled populations are normally distributed (at least
approximately).
3
The ratio of the two sample variances is s12 / s22 = 1.84, while the ratio of the two sample standard deviations
is s1 / s2 = 1.36, i.e. much smaller. In order to decide whether to follow scenario (2) or (3), we compared the
sample variances not the sample standard deviations because the question is whether the unknown population
variances are equal or not.
4
We shall get this confidence interval with R as well a bit later.
12
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
We can take the first assumption granted as it was explicitly mentioned that the study
was based on “independent random samples”. The variable of interest is the age of the
householder primarily responsible for buying toothpaste. It is a quantitative and
continuous variable. However, the actual observations are rounded to the nearest year,
so they are discrete values, but since there are large number of possible values, for the
purpose of hypothesis testing we can still treat this variable as being continuous.
As for normality, although the sample sizes are a bit small, let’s apply stat.desc on the
two samples separately. We can select certain obserbvations with the
subset(x, cond)
function, where x is the object to be subsetted and cond is a logical expression that indicates
which elements of x to keep.
In this case x is Age and cond is based on the Householder grouping variable. Hence,
library(pastecs)
stat.desc(subset(Age, Householder == "N"),
basic = FALSE, desc = TRUE, norm = TRUE)
returns the descriptive statistics and the Shapiro-Wilk test result for the Age of those
Householders who are not primarily responsible for buying toothpaste,
and
returns the same for the Age of those Householders who are primarily responsible for
buying toothpaste,
As you can see, for both groups, the mean and the median are close to each other5, the
skewness and kurtosis statistics are smaller in absolute value than twice their standard
5
For the Householder = “N” group the difference between the mean and the median is 4.8, which is less than
10% of the mean and also smaller than 2 standard errors of the mean (6.092).
13
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
errors6, and the p-values of the Shapiro-Wilk tests are larger than 0.1. All in all, these
statistics do not cast any doubt on the normality assumption.
Recall, that we also assumed that population variances are unequal based on a simple
comparison of the sample variances. Next week you will be asked to perform a formal
hypothesis test to see whether there is indeed a significant difference between the two
population variances.
c) Do the data present sufficient evidence to conclude that there is a difference in the mean
age of purchasers and non-purchasers? Assume that the populations are normally
distributed and use α = 0.10. Perform the test first manually and then with R.
H0 : 1 2 0 , HA : 1 2 0
Recall that confidence interval estimation with a 90% level of confidence is equivalent
to two-tail hypothesis testing at the 10% level. Since the confidence interval in part (a)
does not include zero, we can conclude at the 10% level of significance that there is a
significant difference between the mean ages of purchasers and non-purchasers.
Still, for the sake of illustration, let’s now perform a formal hypothesis test as well. Under
scenario (3) the test statistic is
T
X 1 X 2 D ,0
t df
s x1 x2
The degrees of freedom is the same than in part (a), i.e. 35, and thus the upper 5%
critical value is the same than the reliability factor in part (a), i.e. 1.690, and the lower
5% critical value is -1.690. The null hypothesis is to be rejected if the observed test static
value is smaller than the lower critical value or larger than the upper critical value, or in
brief, if the absolute value of the observed test statistic is larger than the upper critical
value.
x1 x2 D ,0 (47.2 39.8) 0
t obs 1.956
s x1 x2 3.784
Since this observed test statistic value is above the upper critical value, we reject H0 and
conclude at the 10% significance level that the mean ages of purchasers and non-
purchasers differ from each other.
6
Recall (Tutorial 3, p. 13) that on the printout skew.2SE is skewness divided by 2 times its standard error,
kurtosis is actually excess kurtosis, and kurt.2SE is excess kurtosis divided by 2 times its standard error.
14
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
where ~ (called tilde) is an operator. In general, it means “as a function of” and it
separates the left-had side (dependent variable) and right-hand side (independent
variable) in a model formula. In this case, Age ~ Householder implies that the t.test
command is to be executed on the difference between the two means of Age specified
by the two categories of Householder.7
By default, R assumes that the population variances are different (3rd scenario) and
performs an unequal variances t-test originally developed by Welch. The test statistic is
tobs = 1.9557 and the p-value is 0.05853, so H0 can be rejected even at the 6% level.
This printout also shows the 90% confidence interval for the difference between the
mean ages of purchasers and non-purchasers: (1.006765 ; 13.793235). It is practically
the same than in part (a).
What if there is reason to believe that the population variances are equal? In this case
we need to add the var.equal = TRUE argument to the t.test command. The augmented
command,
If you compare the two printouts, you can see that the test statistics are the same, but
the numbers of degrees of freedom are different (34.94 vs. 38) and hence the 90%
confidence interval estimates and the p-values are not the same either. However, the
the difference between the p-values is very small, so as far as the t-test is concerned,
7
Note that the data in the t4e2.xlxs file is arranged in long format while in t4e1.xlxs it is in wide format. That’s
why in Exercise 1 in the t.test command we specified the variables as Pattern1, Pattern2, while this time we
specified them as Age ~ Householder. You will learn about these two data formats on Tutorial 6.
15
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
this time it does not really matter whether we assume equal or different population
variances.
d) Although the original question implies a two-tail test, for the sake of illustration, perform
a left-tail and a right-tail t-test as well and compare the printouts to each other.
By default the t.test command assumes that the t-test is a two-tail test, so when the
hypotheses are
H0 : 1 2 0 , HA : 1 2 0
returns
H0 : 1 2 0 , HA : 1 2 0
and it returns
Second, the left-tail t-test fails to reject the null hypothesis (p-value = 0.9707), while the
right-tail t-test rejects it even at the 3% significance level (p-value = 0.02927). This is
because R considers Householder = “N” as population 1 and Householder = “P” as
population 2, and hence the difference between the two sample means is positive (47.2
– 39.8 = 7.4), providing no support for the left-sided alternative hypothesis.
What if we intend to compare the central locations of two populations which are very non-
normal, or do not have means because they are measured on an ordinal scale, or do have
means but we prefer to use the medians to measure their central locations? In these cases,
we should use some nonparametric test for the difference between the population medians.
The simplest option is the Wilcoxon rank-sum test.
The Wilcoxon rank-sum test is similar to the Wilcoxon signed ranks test, but instead of
classifying the observations based on their relative positions to the hypothesized median, it
classifies the observations according to some characteristic of the experimental units (in the
current example according to being or not the toothpaste purchaser in the household).
Let sample 1 be the smaller (not bigger) sample, so that n1 n2. To perform the Wilcoxon
rank-sum test, we need to rank all available observations in the combined (pooled) sample
from the smallest to the largest averaging the ranks of tied observations and calculate the
rank sums of the two samples (T1 and T2).8 The test statistic is T = T1.
The exact small sample (n1 10, n2 10) lower and upper critical values, TL and TU, are in
Table 8, Appendix B (p. 1088) of the Selvanathan book. We can reject H0 if
For larger sample sizes the sampling distribution of T has a normal approximation with
parameters
8
To perform the test, we need only T1, the rank sum of the smaller (not bigger) sample, but it is recommended
to check whether T1 + T2 is equal to n(n+1)/2.
17
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
n1 ( n1 n2 1) n n ( n n2 1)
T , T2 1 2 1
2 12
e) Assume this time that the populations are not normally distributed and perform the
Wilcoxon rank-sum test to see whether there is a difference in the median age of
purchasers and non-purchasers (use α = 0.10). Perform the test first manually and then
with R.
H0 : 1 2 0 , HA : 1 2 0
The rank sums are T1 = 338 and T2 = 482. Their sum is 820, equal to
n ( n 1) 40 41
820
2 2
z 0 .0 5 1 .6 4 5
and we reject H0 if the calculated test statistic is smaller than -1.645 or greater than
1.645.
The expected value and variance of the test statistic under the null hypothesis are
n1 ( n1 n2 1) 20 41 n n ( n n2 1) 20 20 41
T 410 , T2 1 2 1 1366.7
2 2 12 12
T T T1 T 338 410
zobs 1.948
T T 1366.7
Since the absolute value of the observed test statistic is larger than 1.645, we can reject
H0 and conclude that at the 10% significance level there is a difference in the median
age of purchasers and non-purchasers.
18
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
Householder Age Rank T
P 34 13.00
P 35 15.50
P 23 2.00
P 44 18.50
P 52 28.50
P 46 22.00
P 28 5.00
P 48 23.50
P 28 5.00
P 34 13.00
P 33 10.50
P 52 28.50
P 41 17.00
P 32 9.00
P 34 13.00
P 49 25.00
P 50 26.00
P 45 20.50
P 29 7.00
P 59 35.50 338.00
N 28 5.00
N 22 1.00
N 44 18.50
N 33 10.50
N 55 33.00
N 63 39.00
N 45 20.50
N 31 8.00
N 60 37.00
N 54 32.00
N 53 31.00
N 58 34.00
N 52 28.50
N 52 28.50
N 66 40.00
N 35 15.50
N 25 3.00
N 48 23.50
N 59 35.50
N 61 38.00 482.00
Using R, this test can be performed with the wilcox.exact command of the
exactRankTests package. Execute the following commands:9
library(exactRankTests)
wilcox.exact(Age ~ Householder)
9
By default, wilcox.exact assumes that the samples are independent, i.e. paired = FALSE, so unlike earlier in
in Exercise 1 (page 7), it is not necessary to use the paired argument.
19
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
You should get the following printout:
The reported test statistic is W = 272 and the p-value is 0.05129, so H0 can be rejected
even at the 5.2% significance level.
But, where did this test statistic come from? When we performed the test manually, we
used T = T1 = 338 as the test statistic because the sample sizes are equal and the first
sample in the data file is Householder = “P”. R, however, considers the Householder =
“N” sample the first sample10, and reports an adjusted version of the Wilcoxon test
statistic11,
n2 ( n2 1) 20 21
W T2 482 272
2 2
f) Like in part (d), perform a left-tail and a right-tail Wilcoxon rank-sum test as well and
compare the printouts to each other.
to obtain
returns
10
You can see this on the previous t.test printouts.
11
The adjustment term is the smallest possible value of T, i.e. the sum of the first n positive integers where n
is the number of observations in the ‘first’ sample.
20
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
Like in the case of the t.test command, the test statistic reported by wilcox.exact does
not depend on the nature of the test (i.e. two-tail, left-tail or right-tail), but the p-value
does. For the left-tail test it is far too big (0.9752) to reject H0, while for the right-tail test
it is small enough (0.02565) to reject H0 even at the 3% significance level.
Exercise 3
The owner of a computer store is concerned about the one-year parts and labour warranty
on its top two bestselling brands of laptop computers, brand A and brand B. In particular, he
would like to know whether there is a difference between these two brands in terms of the
time between the sale of a laptop and its return for repair under warranty. In the last month
there were 6 claims for warranty repairs of brand A laptops and 9 claims for warranty repairs
of brand B laptops. The number of days these laptops had been owned prior to coming in
for repair are saved in the t4e3 Excel file. Perform the Wilcoxon rank sum test at the 5%
significance level, both manually and with R, to assist the owner in his quest.12
Since there 6 returned brand A laptops but 9 brand B laptops, we consider brand A as #1
and brand B as #2. The hypotheses are
H0 : 1 2 0 , HA : 1 2 0
The 5% critical values from Table 8, Appendix B of Selvanathan are TL = 31 and TU = 65,
and H0 is rejected if T TL or T TU, where T = T1.
Like in Exercise 2, we need to rank all 15 observations from the smallest to the largest and
calculate the rank sums. The details are shown in the table below.13
Brand_A Brand_B
Days Rank Days Rank
225 12.5 83 7.0
79 6.0 52 3.5
225 12.5 113 9.0
52 3.5 67 5.0
29 1.0 165 11.0
98 8.0 132 10.0
48 2.0
230 14.0
255 15.0
T1= 43.5 T2= 76.5
The rank sum is T1 = 43.5 for Brand A and T2 = 76.5 for Brand B. Their sum is 120, equal to
12
Note that this time the sample sizes are far too small to assess normality in any reasonable way.
13
Try to reproduce this table for the sake of practice.
21
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
n ( n 1) 15 16
120
2 2
The observed test statistic is T = T1 = 43.5. It is between the lower and upper critical values,
so at the 5% significance level there is not enough evidence to reject H0.
To repeat this test with R, launch RStudio, create a new RStudio project and script, import
the data from the Excel file to RStudio and load it into your current project. Then, execute
the following commands
library(exactRankTests)
wilcox.exact(Brand_A, Brand_B)
to obtain
n1 ( n1 1) 67
W T1 43.5 22.5
2 2
and the p-value is far too big to reject H0 at any reasonable significance level. Hence, we
cannot conclude that there is a difference between the two brands of laptops in terms of the
time between sale and return for repair under warranty.
Suppose now that the test is a one-tail test and the significance level is still 5%. The test
statistic does not change, but the critical values and the decision rule do. The new critical
values are TL = 33 and TU = 63.
H0 : 1 2 0 , HA : 1 2 0
and H0 can be rejected if T ≤ TL. In this case T = 43.5 is larger than TL = 33, so H0 cannot be
rejected.
which returns
22
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
The p-value is 0.3131, so H0 is maintained.
H0 : 1 2 0 , HA : 1 2 0
and H0 can be rejected if T ≥ TU. Since T = 43.5 is smaller than TU = 63, again we fail to
reject H0.
which returns
23
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4
Exercises for Assessment
In recent years, insurance companies offering medical coverage have given discounts to
companies that are committed to improving the health of their employees. To help determine
whether this policy is reasonable, the general manager of one large insurance company in
the US organised a study of a random sample of 30 workers who regularly participate in
their company’s lunchtime exercise program and 30 workers who do not. Over a two-year
period, he observed the total dollar amount of medical expenses for each individual. The
data are stored in the t4e4 (column 1: Expenses; column 2: Exercise, 1 for yes, 0 for no)
Excel file. Do all calculations with R.
a) Can the manager conclude at the 5% significance level that companies that provide
exercise programs should be given discounts? Perform an independent-samples t-test
to answer the question. Do not forget to specify the null and alternative hypotheses.
b) What assumptions must hold to ensure the validity of the hypothesis test in part (a)
above? Does it appear that these conditions are satisfied?
c) Assuming that some of the assumption(s) mentioned above is (are) not satisfied, which
nonparametric hypothesis-testing procedure could be used? Conduct this test and give
the appropriate conclusion in the context of the problem.
In a taste test of a new beer, 25 people rated the new beer and another 25 rated the leading
brand on the market. The possible ratings were Poor, Fair, Good, Very Good, and Excellent.
a) Suppose the responses for the new beer and the leading beer were stored using a 1-2-
3-4-5 coding system (1 = Poor, …, 5 = Excellent). Based on the data saved in the t4e5a
file, can we infer that the new beer is rated less highly than the leading brand?
b) Suppose the responses were recoded so that 3 = Poor, 8 = Fair, 22 = Good, 37 = Very
Good, and 55 = Excellent. Based on the recoded data, saved in the t4e5b file, can we
infer that the new beer is rated less highly than the leading brand?
24
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 4