Data Analysis and
Presentation
BIDA330
Correlation
Chapter 12 Part 1
Correlation
• Correlation is a measure of the degree of relatedness of
variables
• It can help a business researcher determine, for example, whether
the stocks of two airlines rise and fall in any related manner
• For a sample of pairs of data, correlation analysis can yield a
numerical value that represents the degree of relatedness of the
two stock prices over time
• In the transportation industry, is a correlation evident between the
price of transportation and the weight of the object being
shipped? If so, how strong are the correlations?
Correlation Cont.
• Several measures of correlation are available, the selection of which
depends mostly on the level of data being analyzed
• Ideally, researchers would like to solve for , the population coefficient
of correlation. However, because researchers virtually always deal with
sample data, this section introduces a widely used sample coefficient
of correlation, r
• This measure is applicable only if both variables being analyzed have
interval or ratio level of data
• The statistic r is the Pearson product-moment correlation
coefficient, named after Karl Pearson (1857–1936), an English
statistician who developed several coefficients of correlation along
with other significant statistical concepts
Pearson's correlation
• Pearson's correlation is the parametric test for correlation
between two continuous (interval/ratio) variables
• The assumptions to apply the test are as follows:
• Normal distribution
• Independence of observations
• Linear relationship
• If the first assumption, that is, normality, is not met or if one
variable is ordinal in nature, a nonparametric alternative known
as Spearman's correlation is applied
Spearman's correlation
• Spearman's correlation can be applied to curvilinear relationships
(in ranked or ordinal data)
• However, the relationship in any correlation must be monotonic,
that is, as the value of one variable increases or decreases, so
does the value of the other variable either increase/decrease
Pearson product-moment correlation
coefficient
• Named after Karl Pearson (1857–1936), an English statistician who developed
several coefficients of correlation along with other significant statistical
concepts
• The term r is a measure of the linear correlation of two variables
• It is a number that ranges from -1 to 0 to +1, representing the strength of the
relationship between the variables. r belongs to [-1:1], -1≤ r ≤ 1
• An r value of +1 denotes a perfect positive relationship between two sets of
numbers
• An r value of -1 denotes a perfect negative correlation, which indicates an
inverse relationship between two variables: as one variable gets larger, the
other gets smaller
• An r value of 0 means no linear relationship is present between the two
variables
Scatterplot/Diagram
A scatterplot is a graph that is used to represent the
relationship between two variables. (Also referred to
as a scatter diagram.)
In a scatterplot, the X values are placed on the
horizontal axis and the Y values are placed on the
vertical axis.
The value of the scatterplot is that it lets you see the
nature of the relationship.
Strong Negative Correlation (r = –.933)
Moderate Negative Correlation (r = –.674)
Virtually No Correlation (r = –.004)
Strong Positive Correlation (r = .909)
Moderate Positive Correlation (r = .518)
Characteristics of the Relationship
• A correlation measures three characteristics of
the relationship between X and Y:
• 1) The Direction of the Relationship
• 2) The Form of the Relationship
• 3) The Degree of the Relationship
The Direction of the Relationship
• In a positive correlation, the two variables tend to move in the
same direction (correlation is +).
• When X increases, Y increases.
• When X decreases, Y decreases.
• See (a) and (d) of scatterplot examples
• In a negative correlation, the two variables move in opposite
directions (correlation is -).
• When X increases, Y decreases.
• When X decreases, Y increases.
• See (b) and (c) of scatterplot examples.
The Form of the Relationship
• There are many forms that plots can take. The one we will
consider is linear. In a linear form, the points in the plot tend to
form a straight line. See scatterplot examples (a) and (b) for linear
forms. The remaining examples are not linear.
Pearson Product-moment Correlation Coefficient
Example 1: Economics: What is the measure of
correlation between the interest rate of federal funds and
the commodities futures index?
Interpreting the Pearson Correlation
• Correlation describes a relationship between two
variables, not why the variables are related (not proof of
cause-and-effect).
• The value of a correlation can be greatly affected by the
range of scores in the data.
• The value of a correlation can be greatly affected by one
or two extreme points (outliers).
• A correlation should not be interpreted as a “proportion”.
For example, a correlation of .815 does not mean that
one could predict with 81.5 % accuracy. To describe how
accurately one variable predicts the other, you must
square the correlation. (r = .815 means 66% accuracy) r2
is the coefficient of determination.
Lab: Solve Example 1 Economics using Excel