0% found this document useful (0 votes)

36 views110 pages

Chapter 1

The document discusses methods for summarizing data, including graphical representations like dot plots and histograms, and numerical descriptive measures such as the mean, median, and measures of dispersion. It emphasizes the importance of understanding both the central tendency and variability of data through examples, including Canadian return data and comparisons with other datasets. Additionally, it highlights the significance of identifying outliers and the use of time series plots to analyze data over time.

Uploaded by

Promachos IV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views110 pages

Chapter 1

Uploaded by

Promachos IV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

1.

Summarizing Data
1.1 Graphical summaries of the data
Dot plot and histogram
The time series plot
1.2 Numerical descriptive measures
1.3 Measures of central tendency
The sample mean
The median
Mean versus median
1.4 Measures of dispersion
The sample variance
The sample standard deviation
1.5 The empirical rule
1.6 How to relate two things
1.7 Linearly related variables
Linear functions
Mean and variance of a linear function
Linear combinations
Mean and variance of a linear combination
1.1 Graphical Summaries of the Data

Two key ideas

• Exploratory (descriptive) issues: Look at the

data (sample). Understand its structure without
generalizing

• Inference issues: Use data (sample) to generalize

results to a larger population of interest

© Imperial College Business School

Example:
Problem: How many of 100,000 voters (population) prefer A over B?
We can’t ask them all!
Solution: Ask a sample of 500 voters.

Summarize, describe the 300

data: 300 voters for A (A =

1), 200 for B (B = 0). 200

We will learn how to Frequency

generalize to the population. 100

For now, we just learn how

to analyze (describe) the 0

data. 0 1
C1
© Imperial College Business School
•Data is the statistician’s raw material, the numbers that
we use to interpret reality

• All statistical problems involve either the collection,

description and analysis of data, or thinking about the
collection, description and analysis of data

•There are many aspects of data e.g.. data may be

univariate (one variable per case) or multivariate (more
than one variable per case). Let us look at some data=

© Imperial College Business School

The Canadian Return Data
Here is a specific data set (or sample). We have 107 monthly
returns on a broad based portfolio of Canadian assets (more on
portfolios later).
Canada
0.07 0.05 0.02 -0.04 0.08 -0.02 -0.05 0.02 0.03
0.00 0.03 0.08 -0.03 0.01 0.03 0.01 0.02 0.08
0.02 -0.02 0.00 0.01 0.02 -0.09 0.00 0.01 -0.07
0.07 0.00 0.02 -0.05 -0.04 -0.03 0.03 0.04 0.00
0.07 0.00 0.01 0.04 -0.02 0.02 0.01 -0.03 0.05
-0.02 0.00 0.01 -0.01 -0.05 -0.01 0.01 0.00 0.02
-0.02 -0.07 0.03 -0.04 0.03 -0.02 0.06 0.03 0.04
0.01 -0.01 -0.01 0.01 -0.05 0.09 -0.02 0.05 0.06
-0.05 -0.04 -0.01 0.01 -0.06 0.05 0.06 0.02 -0.01
-0.06 0.02 -0.05 0.06 0.04 0.02 0.04 0.02 0.02
0.00 0.00 -0.01 0.04 0.01 0.05 -0.01 0.02 0.04
0.02 -0.03 -0.03 0.05 0.04 0.08 0.07 -0.03

Interpret: Each number corresponds to a month. They are given in

time order (go across columns first).
Our first observation is .07. In the first month, the return was .07, in
the 11th .03.
© Imperial College Business School
A little finance: what are returns?

•The return on an asset is the percentage increase in

wealth invested in the asset over a given time period

•If you invest B at the beginning of the time period

you get E = (1+r)B at the end of the time period,
where r is the return

•(1+r) is the factor by which your wealth increases

© Imperial College Business School

Example:

Given E and B we can calculate r (the return): r = (E-B)/B

E=110, B=100, r =.1 or 10%

E = (1+.1)B = (1.1)B

For an investment in a stock, E is comprised of the amount

you would get from selling the stock and any dividends paid.

B is the price you pay at the beginning of the time period

to acquire the stock.

© Imperial College Business School

Histograms

We are interested in ways to summarize or “see” the data.

The previous table was very unclear.
To display the returns we can use a simple graphical tool: the histogram
(made by the histc command in matlab).
To each point on the number line we draw a bar as high as the number of
elements with that value point.

Interpret:
The returns are
centered or
located
at about .01.
The spread or
variation
in the returns is
huge.
8
Dotplot for canada

-0.05 0.00 0.05

canada

center or
location of the data

variation or spread about the center

Notice that the data has a nice mound or bell shape.
There is a central peak and right and left “tails” that
die off roughly symmetrically.
Dotplot for Volume

Some data
does not
have the
mound
shape.

0 1000 2000 3000 4000 5000 6000

Volume

It is skewed
to the left.
We also have data on countries other than Canada.
Let us compare Canada with Japan.
It really helps to get things on the same scale.
How is Japan different from Canada?
Mutual fund data

• Let us use histograms to compare returns on

some other kinds of assets

• We will look at returns on different mutual funds

such as the equally weighted market and treasury
bills (T-bills)

• The equally weighted market represents returns

on a portfolio where you spread your money out
equally over a wide variety of stocks
© Imperial College Business School
Data on 4 different kinds of returns:

Dreyfus
growth fund

Putman
income fund

Equally weighted
market

T-bills
The beer data:
nbeerm: the number of beers male MBA students claim
they can drink without getting drunk
nbeerf: same for females

We call a
point
like this an
outlier

Generally the males claim they can drink more, their numbers are
centered or located at larger values.
© Imperial College Business School
The number of bars you use affects how “smooth "the
picture looks.

© Imperial College Business School

The time series plot:

We just looked at two kinds of data:

1. the return data
2. the number of beers

• For the return data, each number corresponds to a month

• For the beer data, each number corresponds to a person

• The return data has an important feature that the beer data does not
have

• It has an order!

• There is a first one, a second one, and ....

© Imperial College Business School

• A sequence of observations taken over time is
often called a time series

• We could have daily data (temperature), annual

data (inflation), quarterly data (inflation, GDP) and
so on

• For time series data, the time series plot is an

important way to look at the data

© Imperial College Business School

Time series plot of the Canadian returns:

On the
vertical
axis we
have
returns.

On the
horizontal
axis we
have “time”

Do you see a pattern?

Now do you
see a pattern?

© Imperial College Business School

1.2 Numerical Descriptive Measures

• We have looked at graphs. Suppose we are

now interested in having numerical
summaries of the data rather than graphical
representations

• We have seen that two important features of

any data set are how spread out the data is,
and the central or typical value of the data set
© Imperial College Business School
• In this part of the notes we will describe methods to
summarize a data set numerically

• First, we will introduce measures of central tendency to

determine the “center” of a distribution of data values, or
possibly the “most typical” data value

• Measures of central tendency include: the mean and the

median

• Second, we will discuss measures of dispersion, such as

the sample standard deviation and the sample variance
© Imperial College Business School
1.3 Measures of Central Tendency
The sample mean

Suppose we collect n pieces of data. We need some way of

describing the data. We write:

x1, x2, x3 ,Kxn

the last number, n is the number
of numbers, or the “number of
the first number
observations.” You may also hear it
referred to as the “sample size.”

They are the values that we observe.

© Imperial College Business School
Here, x is just a name for the set of numbers,
we could just as easily use y (or Buddy).

5
x1 2 n=5
x3 8
6
2

Sometimes the order of the observations means

something. In our return data the first observation
corresponds to the first time period.
Sometimes it does not. In our beer data we just have a
list of numbers, each of which corresponds to a student.
© Imperial College Business School
The sample mean is just the average of the numbers “x”:

sum x1 + x2 +L+ xn
x= =
n n
We often use the x symbol to denote the mean of the
numbers x

We call it “x bar”
© Imperial College Business School
Here is a more compact way to write the same thing3

Consider x1 + x2 +L+ xn
We use a shorthand for it (it is just notation):

∑x = x
i =1
i 1 + x2 +L+ xn

This is summation notation

Using summation notation we have:

The sample mean

n
1
x = ∑ xi
n i=1
© Imperial College Business School
Graphical interpretation of the sample mean

Let us go back
to our standard
histogram

In some sense, the

men claim to drink
more

To summarize this
we can compute
the average value
for both men and
women

(I deleted the outlier, I do not believe him!).

I bit of fuss because there are NaN (Not-a-Number), I explain in the next page.
“On average women claim they can drink 4.2 beers. Men claim
they can drink 7.8 beers”
In the picture, I think of the mean
(this deals with NaN) as the “center” of the data
>> bm=isnan(nbeerm);
>>Tm=size(nbeerm,1)-sum(bm);
>>bf=isnan(nbeerf);
>>Tf=size(nbeerf,1)-sum(bf);

>>mean(nbeerm(1:Tm))
>>ans = 7.862500000000000

>>mean(nbeerf(1:Tf));
>>ans = 4.222222222222222
• Let us compare the means of the Canadian
and Japanese returns
>> mean(canada)

ans = 0.009065420560748

>> mean(japan)

ans = 0.002336448598131

• This is a big difference

• It was hard to see this difference in the dot plots
because the difference is small compared to the
variation
© Imperial College Business School
More on summation notation
(take this as an aside)

Let us look at summation in more detail.

∑ xi
i =1
means that for each value of i, from 1 to n,
we add to the sum the value indicated,
in this case xi

add in this value for each i

© Imperial College Business School

To understand how it works
let us consider some examples:

Think of each row as an

x y year
observation on both x and y.
0.07 0.11 1
To make things concrete, think
of each row as corresponding to 0.06 0.05 2
a year and let x and y be annual 0.04 0.09 3
returns on two different assets. 0.03 0.03 4

In year 1 asset “x” had return 7%

In year 4 asset “y” had return 3%

© Imperial College Business School

compute x bar.

compute y bar.

(here, we do not
sum over all
observations: we
sum only the
second and
third.)

© Imperial College Business School

For each value of i, we can add in anything we want:

© Imperial College Business School

The median

• After ordering the data, the median is the

middle value of the data
• If there is an even number of data points, the
median is the average of the two middle values

Example

1,2,3,4,5 Median = 3
1,1,2,3,4,5 Median = (2+3)/2 =2.5

© Imperial College Business School

Mean versus median

• Although both the mean and the median are good

measures of the center of a distribution of measurements,
the median is less sensitive to extreme values

• The median is not affected by extreme values since the

numerical values of the measurements are not used in its
computation

Example:
1,2,3,4,5 Mean: 3 Median: 3
1,2,3,4,100 Mean: 22 Median: 3

© Imperial College Business School

We call extreme values in a data set “outliers”. We
used to call them funny points but outliers sounds
more scientific. Outliers are sometimes the most
interesting aspect of a data set, and sometimes they
are just coding errors.

The sex survey: how many partners?

“The median number of sex partners over a lifetime is

6 for males and 2 for females. One quarter of men
reported only one lifetime partner, but the range varied
markedly. One man reported 1,016 and one woman
reported 1,009” (Likely outliers, am I wrong?)
1.4 Measures of Dispersion

The mean and the median give us information about the

central tendency of a set of observations, but they shed
no light on the dispersion, or spread of the data.
Example: Which data set is more variable ?
5,5,5,5,5 Mean: 5
1,3,5,8,8 Mean: 5
Do you only care about the average return on a mutual
fund or you need a measure of risk, too? Here is one =

© Imperial College Business School

The Sample Variance
. . . .
-+---------+---------+---------+---------+---------+-----x

. . . .
-+---------+---------+---------+---------+---------+-----y
0.030 0.045 0.060 0.075 0.090 0.105

The y numbers are more spread out than the x numbers.

We want a numerical measure of variation or spread.

The basic idea is to view variability in terms of distance

between each measurement and the mean.

xi − x
. . . .
-+---------+---------+---------+---------+---------+-----x

. . . .
-+---------+---------+---------+---------+---------+-----y
0.030 0.045 0.060 0.075 0.090 0.105

Overall, these are smaller than these.

• We cannot just look at the distance between each
measurement and the mean. We need an overall
measure of how big the differences are (i.e., just
one number like in the case of the mean)
• Also, we cannot just sum the individual distances
because the negative distances cancel out with the
positive ones giving zero always (Why?)
• We average the squared distances and define
n
1 2
∑
n i =1
( xi − x)
So, the sample variance of
the x data is defined to be:
Sample variance:

n
2 1 2
s =x ∑
n − 1 i =1
( xi − x)

• We use n -1 instead of n for technical reasons that will

be discussed later (the intuition does not change, though)
• Think of it as the average squared distance of the
observations from the mean

© Imperial College Business School

Questions

1. What is the smallest value a variance can be?

2. What are the units of the variance?

It is helpful to have a measure of spread which is in the

original units. The sample variance is not in the original
units. We now introduce a measure of dispersion that
solves this problem: the sample standard deviation

© Imperial College Business School

The sample standard deviation

It is defined as the square root of the sample variance (easy)

The sample standard deviation:

2
sx = s x

The units of the standard deviation are the same

as those of the original data

© Imperial College Business School

Example 1 (numerical)

Assume as before: Y−Y = .04, -.02, .02, -.04

X−X = .02, .01, .01, .02

The sample
standard deviation
for the y data
is bigger than
that for the x data.

This numerically
captures the
fact that y has
“more variation”
about its mean
than x.
Example 2 (graphical)

The standard
deviations
measure the
fact that there
is more spread
in the
Japanese
returns

• Variable N Mean StDev

• Canada 107 0.00907 0.03833
• japan 107 0.00234 0.07368

© Imperial College Business School

1.5 The Empirical Rule
We now have two numerical summaries for the data

x sx
where the data is how spread out,
how variable the data is

• The mean is pretty easy to interpret (some sort of “center”

of the data)
• We know that the bigger sx is, the more variable the data is,
but how do we really interpret this number?
• What is a big sx? What is a small one ?

© Imperial College Business School

Empirical Rule

For “mound shaped data”:

Approximately 68% of the data is in the interval

( x − s x, x + s x ) = x ± s x
Approximately 95% of the data is in the interval

( x − 2s x , x + 2s x ) = x ± 2s x
The empirical rule will help us understand sx and relate the
summaries back to the histogram

© Imperial College Business School

Let us see this with the Canadian returns
x − 2s x x + 2s x
x =.00907

s x =.03833
10

The empirical
rule says that
Density
roughly 95%
5
of the
observations
are between the
dashed lines and 0
roughly 68%
-0.1 0.0 0.1
between canada
the dotted lines.

Looks reasonable. x − sx x + sx
Same thing viewed from the
perspective of the time series plot.

x + 2s x

5% outside
would be
about
5 points. x

There are 4
points
outside,
which is
pretty close.

x − 2s x
A little finance: comparing mutual funds
Let us use the means and standard deviations to compare mutual funds.
For 9 different assets we compute the means and standard deviations.
Then, we plot the means versus the standard deviations.
The assets are:
Variable N Mean StDev
drefus 180 0.00677 0.04724
fidel 180 0.00470 0.05659
keystne 180 0.00654 0.08424
Putnminc 180 0.00552 0.03008
scudinc 180 0.00443 0.03597
windsor 180 0.01002 0.04864
eqmrkt 180 0.01082 0.06856
valmrkt 180 0.00681 0.04800
tbill 180 0.00598 0.00252

© Imperial College Business School

It is considered good to have a large
mean return and a small standard deviation.

0.011 eqmrkt

windsor
0.010

0.009

0.008
Mean

valmrkt drefus
0.007 keystne
tbill
0.006 Putnminc

0.005 fidel
scudinc
0.004
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

StDev
© Imperial College Business School
Let us compare some countries

honkong
0.02 Based
on
usa
singapor france monthly
returns
Mean

belgium germany
australia finland
0.01 canada from ‘88
italy to ‘96
japan

0.00
0.03 0.04 0.05 0.06 0.07 0.08
StDev

© Imperial College Business School

1.6 How to Relate Two Things

• The mean and standard deviation help us summarize a bunch

of numbers which are measurements of just one thing (one
variable)

• A fundamental and totally different question is how one thing

relates to another

• In this section of the notes we look at scatter plots and how

covariance and correlation can be used to summarize them

• When examining two things (variables) at the time, the scatter

plot will be our main graphical tool whereas covariance and
correlation will be our main numerical summaries

© Imperial College Business School

Is the number of beers you can drink
Example: related to your weight?

20
nbeer weight i
12.0 192 1
12.0 160 2
5.0 155 3

nbeer
10
5.0 120 4
7.0 150 5
13.0 175 6
4.0 100 7
0
12.0 165 8
100 150 200
12.0 165 9
12.0 150 10 weight
. . .
. . . Scatter plot
. . .
Now we think of each pair of numbers as an observation.
Each pair corresponds to a person.
Each person has two numbers associated with him/her,
# beers and weight.
Each pair corresponds to a point on the plot. © Imperial College Business School
Example:

Are returns on a mutual fund related to market returns?

0.2

Each point 0.1

corresponds
to a month 0.0

-0.1

-0.1 0.0 0.1 0.2

valmrkt
© Imperial College Business School
In general we have observations
and each point on the plot
corresponds to an observation.

Our data looks like:

the ith observation
( xi , y i ) is a pair
x y i of numbers
12.0 192 1
12.0 160 2
5.0 155 3
5.0 120 4
7.0 150 5 The plot enables us to see
13.0 175 6 the relationship between
4.0 100 7 x and y
12.0 165 8

© Imperial College Business School

• In both examples it does look like there is a relationship
• Even more, the relationship looks linear in that it looks
like we could draw a line through the plot to capture the
pattern
• Covariance and correlation summarize how strong a
linear relationship there is between two variables
• In our first example weight and # beers were two
variables. In our second example our two variables
were two kinds of returns
• In general, we think of the two variables as x and y

© Imperial College Business School

The sample covariance between x and y:

1 n
s xy = ∑
n − 1 i =1
( xi − x)( yi − y )

The sample correlation between x and y:

s xy
rxy =
s xs y

So, the correlation is just the covariance divided by

the two standard deviations.
© Imperial College Business School
We will get some intuition about these formulae, but first let us
see them in action. How do they summarize data for us? Let us
start with the correlation.

Correlation, the fact of life:

−1 ≤ rxy ≤ 1
The closer r is to 1 the stronger the linear
relationship is with a positive slope.
When one goes up, the other tends to go up.

The closer r is to -1 the stronger the linear

relationship is with a negative slope.
When one goes up, the other tends to go down.

© Imperial College Business School

The correlations corresponding to the two scatter plots
we looked at are:

Correlation of valmrkt and windsor = 0.923

Correlation of nbeer and weight = 0.692

The larger correlation between valmrkt and windsor

indicates that the linear relationship is stronger.

Let us look at some more examples.

© Imperial College Business School

y1
Correlation of
y1 and x1 = 0.019 -1

-2

-3 -2 -1 0 1 2 3
x1

1
Correlation of
y2

0
y2 and x2 = 0.995
-1

-2

-3
-3 -2 -1 0 1 2 3
x2
© Imperial College Business School
4

y3
Correlation of 0

-1
y3 and x3 = 0.586
-2

-3

-4
-3 -2 -1 0 1 2 3
x3
3

1
Correlation of
y4

0
y4 and x4 = -0.982
-1

-2

-3
-3 -2 -1 0 1 2 3
x4
Correlation of y5 and x5 = 0.210

9
8
7
6
5
y5

4
3
2
1
0

-3 -2 -1 0 1 2 3
x5

The correlation only measures linear relationships

(here the value is small but there is a strong nonlinear
relationship between y5 and x5.)
© Imperial College Business School
Example: The country data

Which countries go up and down together?

I have data on 23 countries. That would be a lot of plots!

>> scatter(Canada, USA)

0 .1
canada

0 .0

-0 .1
-0 .1 0 .0 0 .1
usa

© Imperial College Business School

To summarize, we can compute all pair-wise correlations:

>>list=[australia belgium ... singapore]

>>corrcoef(list)
Why is this blank?
australia belgium canada finland france germany hongkong italy
belgium 0.189
canada 0.507 0.357
finland 0.387 0.183 0.386
france 0.275 0.734 0.342 0.176
germany 0.226 0.691 0.302 0.304 0.709
honkong 0.334 0.301 0.558 0.355 0.359 0.339
italy 0.159 0.367 0.334 0.389 0.352 0.465 0.261
japan 0.251 0.418 0.271 0.307 0.421 0.318 0.219 0.426
usa 0.360 0.429 0.651 0.264 0.501 0.372 0.429 0.240
singapor 0.409 0.355 0.478 0.391 0.408 0.467 0.647 0.416

japan usa
usa 0.246
singapor 0.407 0.473

© Imperial College Business School

Understanding the covariance and correlation formulae

• How do these weird looking formulae for covariance

and correlation capture the relationship?

• To get a feeling for this, let us go back to the simple

example and compute covariance and correlation

x y
0.07 0.11
0.06 0.05
0.04 0.09
0.03 0.03
© Imperial College Business School
First, let us compute the covariance
(which is a necessary ingredient to
compute the correlation):

1 n
∑
n − 1 i =1
( xi − x)( yi − y ) =

1
((.07 −.05)(.11−.07) + (.06 −.05)(.05 −.07) + (.04 −.05)(.09 −.07) + (.03 −.05)(.03 −.07 ))
3
1
= (.02*.04 + .01 * ( −.02) + ( −.1)*.02 + ( −.02) * ( −.04))
3
1 1
= (.0008 −.0002−.0002+.0008) = (.0012) =.0004
3 3

= .0004

Each of the 4 points makes a contribution to the sum.

Let us see which point does what.

© Imperial College Business School

( x 3 − x )( y 3 − y ) = ( −.01)*.02 = −.0002 ( x1 − x )( y 1 − y ) =.02*.04 =.008

x
0.11
0.10
0.09
0.08 (III) (I)
0.07 y
y (II) (IV)
0.06
0.05
0.04
0.03

0.03 0.04 0.05 0.06 0.07

( x 2 − x )( y 2 − y ) =.01 * ( −.02) = −.0002

( x 4 − x )( y 4 − y ) = ( −.02) * ( −.04 ) =.008

Points in (I) have both x and y bigger than their means so we get a
positive contribution to the covariance.
Points in (II) have both x and y less than their means so we get a
positive contribution to the covariance.
In (III) and (IV) one of x and y is less than its mean and the other is
greater so we get a negative contribution. The further out the point is,
the bigger the contribution.
© Imperial College Business School
just a few
relatively small Lots of positive contributions
contributions

just a few
relatively small
Lots of positive contributions
contributions

© Imperial College Business School

So,
• A positive covariance means that when a variable
is above its average the other one tends to be above
as well. They move up and down together

• A negative covariance means that when one is up

the other tends to be down.
They move in opposite directions

• A small covariance means that their movements are

almost (linearly) unrelated

Let us now compute the correlation.

© Imperial College Business School

We just finish the example:

.0004
rxy = =.6
(.0365)(.0183)

The division by the standard deviations standardizes

the covariance so that the correlation is always
between +/- 1

© Imperial College Business School

The sign of the correlation contains the same
information as the sign of the covariance (in fact, they
have the same sign being the standard deviations
always positive)

Positive sign: positive relationship

Negative sign: negative relationship

The correlation is more informative, though, because it

is unit-less (always between –1 and 1), by construction.
Hence, it is a better measure of the strength of the
relationship.

Close to 1: strong positive relationship

Close to -1: strong negative relationship
© Imperial College Business School
1.7 Linearly Related Variables
• We have studied data sets that display some kind of relation
with each other (the mutual fund returns and the market returns,
for instance)
• Sometimes there is an exact linear relation between variables:

y = c0 + c1 x

• Can we say something about the sample mean of y if all we

know is the sample mean of x (and vice versa)?
• Can we say something about the sample standard deviation
of y if all we know is the sample standard deviation of x (and
vice versa)?
• We will answer these questions in the sequel
© Imperial College Business School
Example:

cel fahr Suppose we have these temps in

10 50 Celsius and Fahrenheit.
15 59
20 68
25 77
How are the F values related to the
C values?
40 104
30 86
50 122
F = 32 + (9/5)C
70 158
© Imperial College Business School
Note: if we plot F versus C, what do we see ?

Correlation of cel and fahr = 1.000

The variable y is a linear function of the variable x if:

y = c 0 + c1x
c 0 : the intercept We think of the c’s as constants
c1 : the slope (fixed numbers) while x and y vary.

© Imperial College Business School

Example:

• Suppose you are a movie star and you

have a deal which gives you a $10 million
fee per movie + 10% of the gross

• How is your income related to the gross?

© Imperial College Business School

Mean and variance of a linear
function

Suppose y (the data y) is a linear function of x.

How are the mean and variance (standard deviation)

of y related to those of x?

Let us look
at our
>> cel = [ -10 0 10 15 20 25 30 35 ]';
temperature
example.
>> mul = (9/5)*cel;
Suppose we
first multiply
by (9/5) and >> fahr = 32+mul;
then add 32.
© Imperial College Business School
>> mean([ cel mul fahr])
ans =
15.625000000000000
28.125000000000000
60.125000000000000
>> std([ cel mul fahr])
ans =
15.221577729375776
27.398839912876394
27.398839912876394

. . .. . . . .
+---------+---------+---------+---------+---------+-------cel
. . . . . . . .
+---------+---------+---------+---------+---------+-------mul
. . . . . . . .
+---------+---------+---------+---------+---------+-------fahr
0 30 60 90 120 150

Interpret

• When we multiply cel by 9/5 we affect

(increase) both the mean and the standard
deviation proportionally

• If we add a constant (32 in our case) we

simply increase the mean (by the value of
the constant) but leave the overall dispersion
unaffected

Suppose, y = c 0 + c1x
Then, y = c 0 + c1x
s y =| c1 | sx
2 2 2
s =c s
y 1 x

Example:

• Suppose our movie star makes 10 pictures

and the mean and standard deviation of the
gross on the films are 100 and 30 million.

• What are the mean and standard deviation

of the star’s income?

Example:

• Suppose x has mean 100 and standard

deviation 10

• What are the mean, standard deviation and

variance of:
Y = -2x?
Y = 5+x?
Y = 5-2x?

Linear combinations

We may want a variable to be related to several others

instead of just one. We will assume that Y is a function of
X,Z,=rather than just a function of X.

Example:
Suppose our movie star also gets 5 percent of
all sales of the CD released with the movie.
How is the star’s income related to the film’s
gross and CD sales (in millions of dollars)?

When a variable y is linearly related to several
others, we call it a linear combination.

y = c 0 + c1x1 + c 2 x 2 + K c k xk
y is a linear combination of the x’s.
ci is the coefficient of xi.

Important example: Portfolios

• Suppose you have $100 to invest

• Let x1 be the return on asset 1. If x1 = .1, and you put all
your money into asset 1, then you will have $110 at the
end of the period
• Let x2 be the return on asset 2. If x2 = .15, and you put
all your money into asset 2, then you will have $115 at
the end of the period
• Suppose you put 1/2 of your money into 1 and 1/2 into 2
• What will happen?
© Imperial College Business School
At the end of the period you will have:

(100)*.5*(1+.1) + (100)*.5*(1+.15)
=100*(1+.5*.1+.5*.15)

So the return is .5.1 + .5.15=.125.

To generalize, let w1 be the fraction of your wealth you

invest in asset 1.
Let w2 be the fraction of your wealth you invest in asset 2.
Let M be your wealth.
The w’s are called the portfolio weights.

Then, at the end of the period, you have:

w 1M(1 + x1 ) + w 2M(1 + x2 ) = M( w 1 + w 2 + w 1x1 + w 2 x2 )

= M(1 + w 1x1 + w 2 x2 )

Hence the return is,

Rp = w1x1 + w 2 x 2
This is beautiful (=some people get a kick out of weird stuff!)

The return on the portfolio is just a linear combination

of the asset returns where the coefficients are the
portfolio weights.
© Imperial College Business School
• Suppose we have m assets

• The return on the ith asset is xi

• Put wi fraction of your wealth into asset i

• Your portfolio is determined by the portfolio weights wi

• Then, the return on the portfolio is:

m
Rp = w 1x1 + w 2 x 2 + L w m xm = ∑ w i xi
i =1

Notice that the portfolio weights always sum up to one.
(If I invest 30% of my wealth in asset 1, then I have to
invest 70% of my wealth in asset 2)

Questions:

1. Can the portfolio weights sum up to one and be

negative? (What does it mean to invest –30% of
your wealth in asset 1 and 130% in asset 2?)

2. What is the equally weighted portfolio?

3. What is the value weighted portfolio?

Example (the country data again)

Let us use our country data and suppose that we had put
.5 into USA and .5 into Hong Kong.
What would our returns have been?
In MatLab:
>> port = .5*hongkong + .5*usa

honkong usa port

0.02 0.04 0.030 For each month, we
0.06 -0.03 0.015
0.02 0.01 0.015
get the portfolio return
-0.03 0.01 -0.010 as ½*hongkong + ½*usa.
0.08 0.05 0.065

How do the returns on this portfolio compare
with those of Hong Kong and USA?

It looks like the

mean for my 0.021
honkong

portfolio is right 0.020

in between the 0.019

means of USA 0.018

port
and Hong Mean 0.017

Kong. 0.016
0.015

0.014
What about the usa
0.013
standard 0.03 0.04 0.05 0.06 0.07
deviation? StDev

Let us try a portfolio with three stocks.
Let us go short on Canada (i.e. we borrow
Canada to invest in the other stocks)

>> port = -.5canada + usa +.5honkong

honkong

0.020 port

Clearly,
forming
Mean

portfolios 0.015
usa
is an
interesting
thing to do! 0.010 canada

0.03 0.04 0.05 0.06 0.07

StDev
© Imperial College Business School
• Basic question: why would we form portfolios?
• Maybe the portfolio has a nice mean and variance
(i.e., nice “average return” and nice “risk”)
• There are some basic formulae that relate the
mean and standard deviation of a linear
combination to the means, variances and
covariances of the input variables
• We can apply these formulae to understand how
the mean and variance of a portfolio depend on the
input assets. These formulae constitute the basic
part of the tool-kit of those who really understand
finance

Mean and variance of a linear combination

First, we consider the case where we have only two inputs

Then, y = c 0 + c1x1 + c 2 x2

y = c 0 + c1x1 + c 2 x2

s 2y = c12s 2x1 + c 22s 2x2 + 2c1c 2 s x1x2

Example:

Going back to our movie star, suppose the average

sales of CD’s is 5 million and the standard deviation is
1 million.
Assume the correlation between gross and CD sales
is .8

1. What is the mean and standard deviation of the

>> port = .5honkong + .5usa

Honkong usa port
0.02 0.04 0.030 For each month, we get
0.06 -0.03 0.015 the portfolio return as
0.02 0.01 0.015
½*hongkong + ½*usa
-0.03 0.01 -0.010
0.08 0.05 0.065
........

The mean returns on USA and Hong Kong are .01346 and .02103.
Knowing what the portfolio returns are, we can easily compute the
mean return for the portfolio (i.e., it is the sample mean of the
portfolio returns): .01724.
We can now confirm the validity of our formula:
.01724 = .5*.01346+.5*.02103
© Imperial College Business School
Let us do the same exercise for the variance:

Diagonal elements are variances, off diagonal elements

are covariances (this is a variance-covariance matrix)

Covariances
>> cov([ honkong usa port])

hongkong usa port

honkong 0.00521497
usa 0.00103037 0.00110774
port 0.00312267 0.00106906 0.00209586

As before, we can check the formula:

.0021
= (.5)*(.5)*.00521 + (.5)*(.5)*.00111 + 2*(.5)*(.5)*.001
= .25*.00521 + .25*.00111 + .5*.001

Let us do it one more time:

>> port = .25usa +.75honkong

>> cov([ honkong usa port])

Covariances

honkong usa port

honkong 0.00521497
usa 0.00103037 0.00110774
port 0.00416882 0.00104972 0.00338905

.0033 =
(.25)*(.25)*.00111 + (.75)*(.75)*.0052+(2)*(.25)*(.75)*(.00103)

Example: -0.12

-0.05

1
-0.07
y = .5x1 + .5 x2 -0.1 -0.01
0.03
0.04
-0.06

0
At each point we -0.01
-0.05
-0.03

x2
plot the value of y -0.05
0.05

-1
0.12
-0.08 0.13
The variances and 0.12
0.11
0.05
covariance are:

-2
0.03
x1 x2
-1 0 1 2
x1 1.334636 x1
x2 -1.208679 1.106238
The dashed lines are drawn
Then, the variance of y is at the mean of x1 and x2

0.0058105 = .5.51.3346 + .5.51.106 +2.5.5*(-1.208679)

Why is the variance of y so much smaller than those of the x’s ?

Example: 1.77
1.55
1.19
y = .5x1 + .5 x2

1
0.85
0.70.81
0.78
0.5
0.53

At each point we 0.230.33

0
plot the value of y

x2
-0.03
-0.17
-0.39
-0.46
-0.79 -0.7

The variances and -1.05

-1
covariance are:
-1.85
-2

x1 x2 -2 -1 0 1 2
x1 1.158167 x1
x2 1.046490 0.9609463
The dashed lines are drawn
Then, the variance of y is at the mean of x1 and x2

1 .0 5 3 = .5 * .5 1.158 + .5 .5 * .96 1 + 2 * .5 * .5* 1.046 5

Why is the variance of y not so much smaller than those of the x’s ?
2.0
Example: 0.93

1.5
-0.02
0.75
y = .5x1 + .5 x2 -0.27 1.29

1.0
-0.43 1.03
0.17
At each point we

0.5
0.43

x2
plot the value of y

0.0
-0.09 0.39
-1.11 -0.35

-1.2 0.23

-0.5
The variances and -1.07 -0.76
0.13

covariance are:

-1.0
-1.67
-0.69

-2 -1 0 1
x1 x2
x1
x1 1.3870537
x2 0.1976187 0.8247886
The dashed lines are drawn
Then, the variance of y is at the mean of x1 and x2

0 .6 5 1 75=.5.5 1.3 87 + .5 * .5 .82 48 + 2 .5 * .5 *.1 976

Why is the variance of y so much smaller than those o the x’s ?

K inputs:

Suppose,

y = c 0 + c1x1 + c 2 x2 + c 3 x3 +L+ ck xk
Then,
y = c 0 + c1x1 + c 2 x2 + c 3 x3 +L+ ck xk
s 2y = c12 s 2x1 + c 22 s 2x2 + c 23 s 2x3
+ 2 c1c 2 s x1x2 + c1c 3 s x1x3 + c 3 c 2 s x3 x2

Example:

y = c 0 + c1x1 + c 2 x2 + c 3 x3

2 2 2 2 2 2 2
s = c s +c s +c s
y 1 x1 2 x2 3 x3

+ 2 c1c 2 s x1x2 + c1c 3 s x1x3 + c 3 c 2 s x3 x2

Example:

>> port = .1fidel +.4eqmrkt +.5*windsor

>> cov([ port fidel eqmrkt windsor])

Covariances

port fidel eqmrkt windsor

port 0.00306760
fidel 0.00280224 0.00320210
eqmrkt 0.00369384 0.00319150 0.00470021
windsor 0.00261967 0.00241087 0.00298922 0.00236580

.0030676 = (.1)(.1).003202 + (.4)(.4).0047 + (.5)(.5).0023658

+2*((.1)*(.4)*.00319 + (.1)*(.5)*.00241+(.4)*(.5)*.00299)

Example:

Cut from a Finance Textbook:

Analysis of Financial Data
No ratings yet
Analysis of Financial Data
24 pages
00 Required Background
No ratings yet
00 Required Background
14 pages
03-Data Gathering and Preparation
No ratings yet
03-Data Gathering and Preparation
71 pages
Business Statistics Assignment Analysis
No ratings yet
Business Statistics Assignment Analysis
21 pages
Reading 8: Statistical Concepts and Market Returns
No ratings yet
Reading 8: Statistical Concepts and Market Returns
28 pages
Amit Statistics
No ratings yet
Amit Statistics
14 pages
Statistics Final
No ratings yet
Statistics Final
11 pages
Empirical Methods For Finance, Part I
No ratings yet
Empirical Methods For Finance, Part I
78 pages
EFB334 Lecture02, Financial Statistics
100% (1)
EFB334 Lecture02, Financial Statistics
34 pages
Statistical Concepts and Returns
No ratings yet
Statistical Concepts and Returns
21 pages
Kampala Workshop: Statistics & Risk Measures
No ratings yet
Kampala Workshop: Statistics & Risk Measures
97 pages
Microsoft PowerPoint - Intro To Fin Planning - Lecture 3 Read-Only Compatibility Mode PDF
No ratings yet
Microsoft PowerPoint - Intro To Fin Planning - Lecture 3 Read-Only Compatibility Mode PDF
58 pages
Asset Return Predictability
No ratings yet
Asset Return Predictability
42 pages
Instructor'S Manual: Statistical Techniques in Financial Management
No ratings yet
Instructor'S Manual: Statistical Techniques in Financial Management
3 pages
Chapter 3 Numerical Technique
No ratings yet
Chapter 3 Numerical Technique
56 pages
Basics
No ratings yet
Basics
8 pages
Security Analysis & Portfolio Management: Risk & Return
No ratings yet
Security Analysis & Portfolio Management: Risk & Return
27 pages
Commodities and Alternative Investments - Session 12 - Slides
No ratings yet
Commodities and Alternative Investments - Session 12 - Slides
23 pages
Chapter 1 Introductory Econometrics For Finance
No ratings yet
Chapter 1 Introductory Econometrics For Finance
8 pages
Reading 7: Statistical Concepts and Market Returns
No ratings yet
Reading 7: Statistical Concepts and Market Returns
26 pages
Module 2 - Risk and Return
No ratings yet
Module 2 - Risk and Return
37 pages
Data Analysis & Decision Making Guide
No ratings yet
Data Analysis & Decision Making Guide
97 pages
CFA Level 1 Quick Notes Index
No ratings yet
CFA Level 1 Quick Notes Index
289 pages
Lec0 Returns
No ratings yet
Lec0 Returns
22 pages
What Is Statistics
No ratings yet
What Is Statistics
7 pages
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
No ratings yet
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
54 pages
Lec1 Eim 2013
No ratings yet
Lec1 Eim 2013
39 pages
Chapter 3 - Numerical Technique - Send
No ratings yet
Chapter 3 - Numerical Technique - Send
49 pages
Statistics
No ratings yet
Statistics
87 pages
Statistics For Managers Using Microsoft Excel: 5 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 5 Edition
54 pages
Corporate Finance - Statistics Review: Random Variable
No ratings yet
Corporate Finance - Statistics Review: Random Variable
15 pages
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
No ratings yet
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
56 pages
All Lesson Summaries (Bloomberg's Level I CFA (R) Exam Prep)
No ratings yet
All Lesson Summaries (Bloomberg's Level I CFA (R) Exam Prep)
144 pages
3 - Statistical Concepts - Market Returns
No ratings yet
3 - Statistical Concepts - Market Returns
56 pages
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
No ratings yet
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
35 pages
CH 3. Additional Slides Part I
No ratings yet
CH 3. Additional Slides Part I
29 pages
PM - Portfolio Risk Return - Part I - 2
No ratings yet
PM - Portfolio Risk Return - Part I - 2
38 pages
Define Statistics
No ratings yet
Define Statistics
89 pages
Random Behaviour of Assets
No ratings yet
Random Behaviour of Assets
52 pages
All Lesson Summaries (Bloomberg's Level I CFA (R) Exam Prep)
No ratings yet
All Lesson Summaries (Bloomberg's Level I CFA (R) Exam Prep)
144 pages
Lesson 1-07 Measures of Variation STAT
No ratings yet
Lesson 1-07 Measures of Variation STAT
12 pages
Risk and Return: Stats & Diversification
No ratings yet
Risk and Return: Stats & Diversification
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Statistics Review for Managers
No ratings yet
Statistics Review for Managers
24 pages
Investment Returns & Risks Guide
No ratings yet
Investment Returns & Risks Guide
34 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
5 pages
2201AFE VW Week 8 Some Lessons From Capital Market History
No ratings yet
2201AFE VW Week 8 Some Lessons From Capital Market History
47 pages
Measures of Variability
No ratings yet
Measures of Variability
27 pages
1 - 3 - 4 - Class1 - Descriptive Statistics - 4slines - 1trang
No ratings yet
1 - 3 - 4 - Class1 - Descriptive Statistics - 4slines - 1trang
99 pages
Business Statistics - KMBN104
No ratings yet
Business Statistics - KMBN104
25 pages
ECON90033 Quantitative Analysis of Finan
No ratings yet
ECON90033 Quantitative Analysis of Finan
43 pages
Data Reduction and Descriptive Statistics: Business Mathematics (IBA) October 2023
No ratings yet
Data Reduction and Descriptive Statistics: Business Mathematics (IBA) October 2023
8 pages
3) Statistical Measures of Asset Returns
No ratings yet
3) Statistical Measures of Asset Returns
6 pages
Statistical Modelling of Financial Time Series - An Introduction
100% (1)
Statistical Modelling of Financial Time Series - An Introduction
41 pages
LO3 - TASK 2&3: Statistics and Financial Decisions
No ratings yet
LO3 - TASK 2&3: Statistics and Financial Decisions
10 pages
Chapter 02-ACF 2017
No ratings yet
Chapter 02-ACF 2017
106 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Advanced Bond Investment Analysis
No ratings yet
Advanced Bond Investment Analysis
33 pages
Week 1
No ratings yet
Week 1
42 pages
Investment Return Predictability
No ratings yet
Investment Return Predictability
46 pages
Lecture 5 - Momentum
No ratings yet
Lecture 5 - Momentum
65 pages
MSC Derivatives Week1
No ratings yet
MSC Derivatives Week1
42 pages
Wso Discounted Cash Flow Modeling Course Materials Vcurrent
100% (1)
Wso Discounted Cash Flow Modeling Course Materials Vcurrent
97 pages
Chapter 2
No ratings yet
Chapter 2
83 pages
Chapter 4
No ratings yet
Chapter 4
56 pages
Week 6
No ratings yet
Week 6
29 pages
Banking & Financial Management
No ratings yet
Banking & Financial Management
32 pages
Intro To Statiscs Syllabus - Outlier
No ratings yet
Intro To Statiscs Syllabus - Outlier
2 pages
Regression Analysis
No ratings yet
Regression Analysis
2 pages
Assignment 3
0% (1)
Assignment 3
2 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
RMB301 Chapter-7 Selecting-Samples
No ratings yet
RMB301 Chapter-7 Selecting-Samples
30 pages
CASTILLO, RYAN CARL C. (Activity #2)
No ratings yet
CASTILLO, RYAN CARL C. (Activity #2)
3 pages
Bootstrap Methods in Statistics
No ratings yet
Bootstrap Methods in Statistics
7 pages
Formula DVM
No ratings yet
Formula DVM
4 pages
Advanced Excel Formulas
No ratings yet
Advanced Excel Formulas
30 pages
Class 11 Economics Macro & Indian Economy Holiday Homework
No ratings yet
Class 11 Economics Macro & Indian Economy Holiday Homework
6 pages
Quality Kitchen Meatloaf Mix
100% (2)
Quality Kitchen Meatloaf Mix
13 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Aggregate Functions Overview
No ratings yet
Aggregate Functions Overview
6 pages
3 CPA QUANTITATIVE TECHNIQUES Paper 3
No ratings yet
3 CPA QUANTITATIVE TECHNIQUES Paper 3
8 pages
IPE 333 - Sheet-1
No ratings yet
IPE 333 - Sheet-1
11 pages
Biology Lab: Isopod Selection Worksheet
No ratings yet
Biology Lab: Isopod Selection Worksheet
2 pages
SP09 Final Exam
No ratings yet
SP09 Final Exam
10 pages
2 Randomization and Design PDF
No ratings yet
2 Randomization and Design PDF
18 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
14 pages
Patient Data: GL and HBA1C Analysis
No ratings yet
Patient Data: GL and HBA1C Analysis
60 pages
Group Cohesiveness As A Determinant of Success and Member Satisfaction in Team Performance PDF
No ratings yet
Group Cohesiveness As A Determinant of Success and Member Satisfaction in Team Performance PDF
15 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
2 pages
Statistics & Probability: Quarter 3 - Week 5
No ratings yet
Statistics & Probability: Quarter 3 - Week 5
4 pages
Geog 113 - Quantitative Methods
No ratings yet
Geog 113 - Quantitative Methods
3 pages
Interpretasi Panel Eviews-1
No ratings yet
Interpretasi Panel Eviews-1
9 pages
Design and Analysis of Cluster Randomization Trials in Health Research - Allan Donner, Neil Klar
No ratings yet
Design and Analysis of Cluster Randomization Trials in Health Research - Allan Donner, Neil Klar
200 pages
Lampiran 1. Analisa Deskriptif: Case Processing Summary
No ratings yet
Lampiran 1. Analisa Deskriptif: Case Processing Summary
5 pages
Running A T-Test in Excel
No ratings yet
Running A T-Test in Excel
3 pages
MIT18 05S14 Reading18
No ratings yet
MIT18 05S14 Reading18
8 pages
BIOSTATISTICS
No ratings yet
BIOSTATISTICS
55 pages