[go: up one dir, main page]

0% found this document useful (0 votes)
20 views17 pages

Basic Statistics Overview - Part I

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Basic Statistics Overview - Part I

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Basic Statistics Overview-

Part I
Descriptive Statistics (1/2)
• Population
• A population is the entire group that you
want to draw conclusions about.
• Sample
• A sample is the specific group that you
will collect data from.
• In research, a population doesn’t
always refer to people. It can mean a
group containing elements of anything
you want to study, such as objects,
events, organizations, countries,
species, organisms, etc
• Central Tendency
Descriptive Statistics (2/2)
• Central Tendency
• The central tendency is the extent to which all the data values group around a
typical or central value.

• The variation is the amount of dispersion or scattering of values

• The shape is the pattern of the distribution of values from the lowest
value to the highest value.
Measures of Central Tendency: The Mean
• The arithmetic mean (often just called the “mean”) is the most common
measure of central tendency
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• For a sample of size n:
σ𝑛𝑖=1 𝑋𝑖 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑋= =
𝑛 𝑛
Measures of Central Tendency: The Mean
• The value of mean is affected by extreme values (outliers);

11 12 13 14 15 16 17 18 19 20
11 12 13 14 15 16 17 18 19 20

Mean = 13 Mean = 14
11 + 12 + 13 + 14 + 15 65 11 + 12 + 13 + 14 + 20 70
= = 13 = = 14
5 5 5 5
• How to mitigate the effect of outliers ?
• Increase the sample size
• Winsorize or remove outliers
Measures of Central Tendency: The Median

• In an ordered array, the median is the “middle” number (50% above,


50% below)
• The location of the median when the values are in numerical order
(smallest to largest):
n +1
Median position = position in the ordered data
2
• If the number of values is odd, the median is the middle number

• If the number of values is even, the median is the average of the two
middle numbers
Measures of Central Tendency: The Median

• Median is Not affected by extreme values

11 12 13 14 15 16 17 18 19 20
11 12 13 14 15 16 17 18 19 20

Median = 13 Median = 13

• Compare median and mean is another way to detect the outliers


Measures of Central Tendency: The Mode
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical (nominal) data
• There may be no mode
• There may be several modes
Measures of Central Tendency: Review Example

House Prices: ▪ Mean: ($3,000,000/5)


= $600,000
$2,000,000
$ 500,000 ▪ Median: middle value of ranked data
$ 300,000 = $300,000
$ 100,000
$ 100,000 ▪ Mode: most frequent value
Sum $ 3,000,000 = $100,000
Which Measure to Choose?
• The mean is generally used, unless extreme values (outliers) exist.
• The median is often used, since the median is not sensitive to extreme
values. For example, median home prices may be reported for a
region; it is less sensitive to outliers.
• In some situations, it makes sense to report both the mean and the
median.
Measures of Variation
Variation

Range Variance Standard Coefficient of


Variation
Deviation

Measures of variation give information on the spread or variability


or dispersion of the data values.
Measures of Variation: The Range
• Simplest measure of variation
• Difference between the largest and the smallest values:

Range = Xlargest – Xsmallest

• Sensitive to outliers ;
• Compare :
• Sample 1 : 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
• Sample 2 : 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Measures of Variation: The Sample Variance

• Low variation: more points close to the mean


• High variation: more points far from the mean

• So, measure the distance to the mean


Measures of Variation: The Sample Variance

• Average (approximately) of squared deviations of values from the


mean

• Sample variance:

σn 2
i=1(X i − X)
S2 =
n−1

Where X = arithmetic mean


n = sample size
Xi = ith value of the variable X
Measures of Variation: The Sample Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data

n
• Sample standard deviation:  (X − X)
i
2

S= i=1
n -1
Measures of Variation: Comparing Standard
Deviations
Smaller standard deviation

Larger standard deviation


Measures of Variation: Coefficient of Variation

• The coefficient of variation (CV) is a standardized measure of the


dispersion of a dataset relative to its mean.
• It is commonly used in finance to assess the relative risk or volatility
of an investment compared to its expected return.
• The formula for CV is:
Math+ Money = Finance
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑉 =
𝑀𝑒𝑎𝑛

You might also like