0% found this document useful (0 votes)

20 views10 pages

Analytics People Programming Parte 1

This document discusses descriptive statistics, focusing on measures of central tendency, variability, and data distribution. It covers concepts such as mode, range, variance, standard deviation, quartiles, skewness, and kurtosis, along with their calculations and interpretations using R programming. Additionally, it highlights the importance of sample size and outliers in data analysis, particularly in the context of people analytics.

Uploaded by

Ruben Sierra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views10 pages

Analytics People Programming Parte 1

Uploaded by

Ruben Sierra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

100 Descriptive Statistics

The larger the n-count, the less influential an extreme value will be on .x̄. As we
will learn in chapter “Statistical Inference”, sample size is fundamental to our ability
to achieve precise estimates of population parameters based on sample statistics.
While the focus of this section is central tendency, it is important to recognize that
outlying values are often the more actionable data points in an analysis since these
cases may represent those with significantly different experiences relative to the
average employee. Understanding the distribution of data is critical, and the spread
of data around measures of central tendency will receive considerable attention
throughout this book.

Mode

The mode is the most frequent number in a set of values.

While mean() and median() are standard functions in R, mode() returns the
internal storage mode of the object rather than the statistical mode of the data. We
can easily create a function to return the statistical mode(s):

# Fill vector x2 with integers

x3 <- c(1, 2, 3, 3, 100, 200, 300, 300)

# Create function to calculate statistical mode(s)

stat.mode <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}

# Return mode(s) of vector x3

stat.mode(x3)

## [1] 3 300
In this case, we have a bimodal distribution since both 3 and 300 occur most
frequently.

Range

The range is the difference between the maximum and minimum values in a set of
numbers.
The range() function in R returns the minimum and maximum numbers:

# Return lowest and highest values of vector x

range(x)
Univariate Analysis 101

## [1] 1 300

We can leverage the max() and min() functions to calculate the difference
between these values:

# Calculate range of vector x

max(x, na.rm = TRUE) - min(x, na.rm = TRUE)

## [1] 299

In people analytics, there are many conventional descriptive metrics—largely

counts, percentages, and averages cut by various time (e.g., day, month, quarter,
year) and categorical (e.g., department, job, location, tenure band) dimensions. Here
is a sample of common measures:
• Time to Fill: average days between job requisition posting and offer accep-
tance
• Offer Acceptance Rate: percent of offers extended to candidates that are
accepted
• Pass-Through Rate: percent of candidates in a particular stage of the recruiting
process who passed through to the next stage
• Progress to Goal: percent of approved positions that have been filled
• cNPS/eNPS: candidate and employee NPS (.−100 to 100)
• Headcount: counts and percent of workforce across worker types (employee,
intern, contingent)
• Diversity: counts and percent of workforce across gender, ethnicity, and
generational cohorts
• Positions: count and percent of open, committed, and filled seats
• Hires: counts and rates
• Career Moves: counts and rates
• Turnover: counts and rates (usually terms/average headcount over the period)
• Workforce Growth: net changes over time, accounting for hires, internal
transfers, and exits
• Span of Control: ratio of people leaders to individual contributors
• Layers/Tiers: average and median number of layers removed from CEO
• Engagement: average score or top-box favorability score

Measures of Spread
Variance

Variance is a measure of variability in the data. Variance is calculated using the

average of squared differences—or deviations—from the mean.
102 Descriptive Statistics

Variance of a population is defined by:

n
(xi − μ)2
i=1
σ2 =
.
N
Variance of a sample is defined by:

n
(xi − x̄)2
i=1
s2 =
.
n−1

It is important to note that since differences are squared, the variance is always
non-negative. In addition, we cannot compare these squared differences to the
arithmetic mean since the units are different. For example, if we calculate the
variance of annual compensation measured in USD, variance should be expressed
as .USD2 while the mean exists in the original USD unit of measurement.
In R, the sample variance can be calculated using the var() function:

# Load library
library(peopleanalytics)

# Load data
data("employees")

# Calculate sample variance for annual compensation

var(employees$annual_comp)

## [1] 1788038934

Sample statistics are the default in R. Since the population variance differs from
the sample variance by a factor of .s 2 ( n−1
n ), it is simple to convert output from var()
to the population variance:

# Store number of observations

n = length(employees$annual_comp)

# Calculate population variance for annual compensation

var(employees$annual_comp) * (n - 1) / n

## [1] 1786822581
Univariate Analysis 103

Standard Deviation

The standard deviation is simply the square root of the variance.

The standard deviation of a population is defined by:

n

(xi − μ)2

i=1
σ =
.
N

The standard deviation of a sample is defined by:

n

(xi − x̄)2

i=1
s=
.
n−1

Since a squared value can be converted back to its original units by taking its
square root, the standard deviation expresses variability around the mean in the
variable’s original units.
In R, the sample standard deviation can be calculated using the sd() function:

# Calculate sample standard deviation for annual compensation

sd(employees$annual_comp)

## [1] 42285.21
Since the population
standard deviation differs from the sample standard devia-
n−1
tion by a factor of .s n , it is simple to convert output from sd() to the population
standard deviation:

# Calculate population standard deviation for annual

→ compensation
sd(employees$annual_comp) * sqrt((n - 1) / n)

## [1] 42270.82

Quartiles

A quartile is a type of quantile that partitions data into four equally sized parts after
ordering the data. Each quartile is equally sized with respect to the number of data
points—not the range of values in each. Quartiles are also related to percentiles.
For example, Q1 is the 25th percentile—the value at or below which 25% of values
104 Descriptive Statistics

lie. Percentiles are likely more familiar than quartiles, as percentiles show up in the
height and weight measurements of babies, performance on standardized tests like
the SAT and GRE, among other things.
The Interquartile Range (IQR) represents the difference between Q3 and Q1
cut point values (the middle two quartiles). The IQR is sometimes used to detect
extreme values in a distribution; values less than .Q1 − 1.5 ∗ I QR or greater than
.Q3 + 1.5 ∗ I QR are generally considered outliers.

In R, the quantile() function returns the values that bookend each quartile:

# Return quartiles for annual compensation

quantile(employees$annual_comp)

## 0% 25% 50% 75% 100%

## 62400 99840 137280 174200 208000

Based on this output, we know that 25% of people in our data earn annual com-
pensation of .99,840 USD or less, .137,280 USD is the median annual compensation,
and 75% of people earn annual compensation of .174,200 USD or less.
We can also return a specific percentile value using the probs argument in the
quantile() function. For example, if we want to know the 80th percentile annual
compensation value, we can execute the following:

# Return 80th percentile annual compensation value

quantile(employees$annual_comp, probs = .8)

## 80%
## 180960

In addition, the summary() function returns several common descriptive statis-

tics for an object:

# Return common descriptives

summary(employees$annual_comp)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 62400 99840 137280 137054 174200 208000

Box plots are a common way to visualize the distribution of data. Box plots are not
usually found in presentations to stakeholders, since they are a bit more technical
and often require explanation, but these are very useful to analysts for understanding
data distributions during the EDA phase.
Let us visualize the spread of annual compensation by education level and gender
using the geom_boxplot() function from the ggplot2 library:
Univariate Analysis 105

# Load library
library(ggplot2)

# Produce box plots to visualize compensation distribution by

→ education level and gender
ggplot2::ggplot(employees, aes(x = as.factor(ed_lvl), y =
→ annual_comp, color = gender)) +
ggplot2::geom_boxplot() +
ggplot2::labs(x = "Education Level", y = "Annual
→ Compensation") +
ggplot2::guides(col = guide_legend("Gender")) +
ggplot2::theme_bw()

200000
Annual Compensation

150000
Gender
Female
Male

100000

1 2 3 4 5
Education Level

Box plots can be interpreted as follows:

• Horizontal lines represent median compensation values.
• The box in the middle of each distribution represents the IQR.
• The end of the line above the IQR represents the threshold for outliers in the
upper range: .Q3 + 1.5 ∗ I QR.
• The end of the line below the IQR represents the threshold for outliers in the
lower range: .Q1 − 1.5 ∗ I QR.
• Data points represent outliers: .x > Q3 + 1.5 ∗ I QR or .x < Q1 − 1.5 ∗ I QR.
While box plots are pervasive in statistically oriented disciplines, they can be
misleading. Figure 1 illustrates how information about the shape of a distribution
106 Descriptive Statistics

Box Plot Bar Chart

100
90−99
90
80−89
80

70−79
70

60−69
60

50 50−59

40 40−49

30 30−39

20
20−29

10
0−9
0
0 5 10 15 20 25

Fig. 1 The number range with the highest frequency (0–9) is not as apparent with a box plot (left)
relative to the bar chart (right)

can be lost on a box plot. The range with the highest frequency (0–9) is not as
obvious in the box plot relative to the bar chart.
Box plot alternatives such as violin plots, jittered strip plots, and raincloud plots
are often more helpful in understanding data distributions. Figure 2 shows the
juxtaposition of a raincloud plot against a box plot. While it may seem like an
oxymoron, in this case the spread of data is clearer in the rain.

Skewness

Skewness is a measure of the horizontal distance between the mode and mean—
a representation of symmetric distortion. In most practical settings, data are not
normally distributed. That is, the data are skewed either positively (right-tailed
distribution) or negatively (left-tailed distribution). The coefficient of skewness is
one of many ways in which we can ascertain the degree of skew in the data. The
skewness of sample data is defined as:

n
(xi − x̄)3
1 i=1
Sk =
.
n s3
Univariate Analysis 107

100

Fig. 2 Raincloud plot superimposed on a box plot to illustrate the data distribution

A positive skewness coefficient indicates positive skew, while a negative coef-

ficient indicates negative skew. The order of descriptive statistics can also be
leveraged to ascertain the direction of skew in the data:
• Positive skewness: mode < median < mean
• Negative skewness: mode > median > mean
• Symmetrical distribution: mode = median = mean
Figure 3 illustrates the placement of these descriptive statistics in each of the
three types of distributions. The magnitude of skewness can be determined by
measuring the distance between the mode and mean relative to the variable’s scale.
Alternatively, we can simply evaluate this using the coefficient of skewness:
• If skewness is between .−0.5 and 0.5, the data are considered symmetrical.
• If skewness is between .−0.5 and .−1 or 0.5 and 1, the data are moderately
skewed.
• If skewness is < .−1 or > 1, the data are highly skewed.
Since there is not a base R function for skewness, we can leverage the moments
library to calculate skewness:
108 Descriptive Statistics

Fig. 3 Skewness

# Load library
library(moments)

# Calculate skewness for org tenure, rounded to two

→ significant figures via the round() function
round(moments::skewness(employees$org_tenure), 2)

## [1] 2.27
Statistical Moments, after which this library was named, play an important role in
specifying the appropriate probability distribution for a set of data. Moments are a
set of statistical parameters used to describe the characteristics of a distribution.
Skewness is the third statistical moment in the set; hence the sum of cubed
differences and cubic polynomial in the denominator of the formula above. The
complete set of moments comprises: (1) expected value or mean, (2) variance and
standard deviation, (3) skewness, and (4) kurtosis.
We can verify that the skewness() function from the moments library returns
the expected value (per the aforementioned formula) by validating against a manual
calculation:

# Store components of skewness calculation

n = length(employees$org_tenure)
x = employees$org_tenure
Univariate Analysis 109

x_bar = mean(employees$org_tenure)
s = sd(employees$org_tenure)

# Calculate skewness manually, rounded to two significant

→ figures via the round() function
round(1/n * (sum((x - x_bar)ˆ3) / sˆ3), 2)

## [1] 2.27
A skewness coefficient of 2.27 indicates that organization tenure is positively
skewed. We can visualize the data to confirm the expected right-tailed distribution
(Fig. 4):

# Produce histogram to visualize sample distribution

ggplot2::ggplot() +
ggplot2::aes(employees$org_tenure) +
ggplot2::labs(x = "Organization Tenure", y = "Density") +
ggplot2::geom_histogram(aes(y = ..density..), fill =
→ "#414141") +
ggplot2::geom_density(fill = "#ADD8E6", alpha = 0.6) +
ggplot2::theme_bw()

Kurtosis

While skewness provides information on the symmetry of a distribution, kurtosis

provides information on the heaviness of a distribution’s tails (“tailedness”).
Kurtosis is the fourth statistical moment, defined by:

n
(xi − x̄)4
1 i=1
K=
.
n s4
Note that the quartic functions characteristic of the fourth statistical moment are
the only differences from the skewness formula we reviewed in the prior section
(which featured cubic functions).
The terms leptokurtic and platykurtic are often used to describe distributions
with light and heavy tails, respectively. “Platy-” in platykurtic is the same root as
“platypus,” and I have found it helpful to recall the characteristics of the flat platypus
when characterizing frequency distributions as platykurtic (wide and flat) vs. its
antithesis, leptokurtic (tall and skinny). The normal (or Gaussian) distribution is
referred to as a mesokurtic distribution in the context of kurtosis.
Figure 5 illustrates the three kurtosis categorizations.

Measures of Variability
100% (2)
Measures of Variability
71 pages
Unit 4 - Measures of Variability
No ratings yet
Unit 4 - Measures of Variability
24 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
CH 003
No ratings yet
CH 003
87 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Varianc and Standard Deviation
No ratings yet
Varianc and Standard Deviation
10 pages
3.2 Measures of Dispersion (Ungrouped)
No ratings yet
3.2 Measures of Dispersion (Ungrouped)
15 pages
Measures of Spread and Dispersion
No ratings yet
Measures of Spread and Dispersion
20 pages
Why Study Dispersion?: Spread of The Data
No ratings yet
Why Study Dispersion?: Spread of The Data
31 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Module 5 - Range, Variance, Standard Deviation
No ratings yet
Module 5 - Range, Variance, Standard Deviation
31 pages
Understanding Data Dispersion
No ratings yet
Understanding Data Dispersion
23 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Author(s) Prerequisites Learning Objectives: Measures of Variability
No ratings yet
Author(s) Prerequisites Learning Objectives: Measures of Variability
17 pages
Module V 1
No ratings yet
Module V 1
7 pages
BComp3 Module 5 Measures of Variability
No ratings yet
BComp3 Module 5 Measures of Variability
17 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
Lecture 2b Brief Lecture Notes On Measures of Dispersion (Variability)
No ratings yet
Lecture 2b Brief Lecture Notes On Measures of Dispersion (Variability)
11 pages
Business Statistics ASSIGNMENT
No ratings yet
Business Statistics ASSIGNMENT
4 pages
Measures of Dispersion Modified PPT Sheet-3
No ratings yet
Measures of Dispersion Modified PPT Sheet-3
61 pages
Statistics Concept Review
No ratings yet
Statistics Concept Review
54 pages
Midterms Day 4
No ratings yet
Midterms Day 4
51 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
40 pages
GE Math - Learning Module 9 - AY 2021 2022
No ratings yet
GE Math - Learning Module 9 - AY 2021 2022
9 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Measure of Dispersion-1
No ratings yet
Measure of Dispersion-1
17 pages
Sta102 3
No ratings yet
Sta102 3
17 pages
Measures of Variability
No ratings yet
Measures of Variability
7 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
CB161 (R Lab Manual)
No ratings yet
CB161 (R Lab Manual)
32 pages
Measures of Variation PDF
No ratings yet
Measures of Variation PDF
45 pages
Practical No.7
No ratings yet
Practical No.7
3 pages
Statistics for Data Analysis
No ratings yet
Statistics for Data Analysis
59 pages
Unit II
No ratings yet
Unit II
76 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
23 pages
Notes Module 5
No ratings yet
Notes Module 5
19 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
14 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Measures of Dispersion Guide
No ratings yet
Measures of Dispersion Guide
31 pages
ED242 LEC4 Measures of Variability
No ratings yet
ED242 LEC4 Measures of Variability
22 pages
L7 Notes
No ratings yet
L7 Notes
21 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
17 pages
Chapter 3 Slides #2 Variability
No ratings yet
Chapter 3 Slides #2 Variability
20 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Week 12
No ratings yet
Week 12
37 pages
Lecture 03
No ratings yet
Lecture 03
31 pages
Measures of Variation: Range Variance and Standard Deviation (Ungrouped Data)
No ratings yet
Measures of Variation: Range Variance and Standard Deviation (Ungrouped Data)
24 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
69 pages
Chapter 4 QD, SD, Empirical Rule
No ratings yet
Chapter 4 QD, SD, Empirical Rule
25 pages
Statistics
No ratings yet
Statistics
6 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
G6statsandprobsreport 220420182943
No ratings yet
G6statsandprobsreport 220420182943
24 pages
3.4 Descriptive Statistics Measures of Spread
No ratings yet
3.4 Descriptive Statistics Measures of Spread
4 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
4 pages
Session 3
No ratings yet
Session 3
61 pages
People Analytics Parte 7
No ratings yet
People Analytics Parte 7
30 pages
Summary (SLM - Fit) : # Produce Model
No ratings yet
Summary (SLM - Fit) : # Produce Model
15 pages
People Analytics Parte 8
No ratings yet
People Analytics Parte 8
14 pages
People Analytics Parte 6
No ratings yet
People Analytics Parte 6
36 pages
People Analytics With R Part 4
No ratings yet
People Analytics With R Part 4
11 pages
People Analytics With R Part 5
No ratings yet
People Analytics With R Part 5
18 pages
RRR - Annualreview 67 70
No ratings yet
RRR - Annualreview 67 70
4 pages
The Fundamentals of People Analytics: Craig Starbuck
No ratings yet
The Fundamentals of People Analytics: Craig Starbuck
15 pages
Recovering Findings Following A Failure To Replicate
No ratings yet
Recovering Findings Following A Failure To Replicate
5 pages
RRR - AnnualReview 1 5
No ratings yet
RRR - AnnualReview 1 5
5 pages
RRR - Annualreview 45 50
No ratings yet
RRR - Annualreview 45 50
6 pages
Danielsson Financial Risk Forecasting Slides 2 - 1 41
No ratings yet
Danielsson Financial Risk Forecasting Slides 2 - 1 41
41 pages
ES031 M1.3 ContinuousProbabillityDistribution
No ratings yet
ES031 M1.3 ContinuousProbabillityDistribution
50 pages
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
No ratings yet
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
48 pages
Quant Summary
No ratings yet
Quant Summary
16 pages
Rank Correlation Explained
No ratings yet
Rank Correlation Explained
21 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
11 pages
Solution Key of Quiz 1 Statistics - Mathematics For Mnagment - Fall - 2021 27032022 034720pm
No ratings yet
Solution Key of Quiz 1 Statistics - Mathematics For Mnagment - Fall - 2021 27032022 034720pm
4 pages
Second Long Examination Instructions:: F (X) X 2 X 0,1, 2, 3, 4, 5
No ratings yet
Second Long Examination Instructions:: F (X) X 2 X 0,1, 2, 3, 4, 5
3 pages
Other Prospectus
No ratings yet
Other Prospectus
102 pages
Hasil Analisa Univariat
No ratings yet
Hasil Analisa Univariat
17 pages
Statistics-Probability G11 Q3 Mod2 Mean and Variance - of Discrete Random-Variable
No ratings yet
Statistics-Probability G11 Q3 Mod2 Mean and Variance - of Discrete Random-Variable
7 pages
Sample Problems With Answers For Measures of Variability
No ratings yet
Sample Problems With Answers For Measures of Variability
1 page
Unit 12 End-Of-Unit Test
100% (1)
Unit 12 End-Of-Unit Test
3 pages
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt Solutions To Homework Assignment 3 Due On Tuesday, September 25
No ratings yet
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt Solutions To Homework Assignment 3 Due On Tuesday, September 25
4 pages
Probability Theory for AI Systems
No ratings yet
Probability Theory for AI Systems
47 pages
Series 5
No ratings yet
Series 5
1 page
Answerkey 3
No ratings yet
Answerkey 3
4 pages
Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
IDF Curves
No ratings yet
IDF Curves
33 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
Markov Chains for R Users
No ratings yet
Markov Chains for R Users
66 pages
Probability & Random Variables Guide
No ratings yet
Probability & Random Variables Guide
8 pages
Engineering Probability Problems
No ratings yet
Engineering Probability Problems
4 pages
Quantitive Assessment of Corrosion Probability - A Bayesian Network Approach
No ratings yet
Quantitive Assessment of Corrosion Probability - A Bayesian Network Approach
20 pages
Comparison of Probability Distributions Used For Harnessing The Wind Energy Potential: A Case Study From India
No ratings yet
Comparison of Probability Distributions Used For Harnessing The Wind Energy Potential: A Case Study From India
18 pages
Statistics: Complementary: Syllabus For B.Sc. (Mathematics/Cs Main) CBCSSUG 2019 (2019 Admission Onwards)
No ratings yet
Statistics: Complementary: Syllabus For B.Sc. (Mathematics/Cs Main) CBCSSUG 2019 (2019 Admission Onwards)
9 pages
Sampling Design and Analysis - (APPENDIX A Probability Concepts Used in Sampling)
No ratings yet
Sampling Design and Analysis - (APPENDIX A Probability Concepts Used in Sampling)
14 pages
Mtma Dse2 End Sem 19
No ratings yet
Mtma Dse2 End Sem 19
3 pages
Numerical SDE Solutions in Finance
No ratings yet
Numerical SDE Solutions in Finance
22 pages
Sigman Regenerative Process
No ratings yet
Sigman Regenerative Process
10 pages

Analytics People Programming Parte 1

Uploaded by

Analytics People Programming Parte 1

Uploaded by

100 Descriptive Statistics

The mode is the most frequent number in a set of values.

# Fill vector x2 with integers

# Create function to calculate statistical mode(s)

# Return mode(s) of vector x3

# Return lowest and highest values of vector x

# Calculate range of vector x

In people analytics, there are many conventional descriptive metrics—largely

Variance is a measure of variability in the data. Variance is calculated using the

Variance of a population is defined by:

# Calculate sample variance for annual compensation

# Store number of observations

# Calculate population variance for annual compensation

The standard deviation is simply the square root of the variance.

The standard deviation of a sample is defined by:

# Calculate sample standard deviation for annual compensation

# Calculate population standard deviation for annual

# Return quartiles for annual compensation

## 0% 25% 50% 75% 100%

# Return 80th percentile annual compensation value

In addition, the summary() function returns several common descriptive statis-

# Return common descriptives

## Min. 1st Qu. Median Mean 3rd Qu. Max.

# Produce box plots to visualize compensation distribution by

Box plots can be interpreted as follows:

Box Plot Bar Chart

A positive skewness coefficient indicates positive skew, while a negative coef-

# Calculate skewness for org tenure, rounded to two

# Store components of skewness calculation

# Calculate skewness manually, rounded to two significant

# Produce histogram to visualize sample distribution

While skewness provides information on the symmetry of a distribution, kurtosis

You might also like