Quantitative Data
Analysis
Dr Iwona Wilkowska
Learning Outcomes
• Importance of analytical framework
• Recognise different types of data and understand implications for
subsequent analysis
• Understand what preparing quantitative data for analysis involves
• Use descriptive statistics:
• Frequency distributions
• Measures of central tendency
• Measures of dispersion
• Cross – tabulation
• Overview of inferential statistical tests
• Test of relationship - correlation
• Correlation and Causality
• Significance testing
Concepts
• ‘A concept is an abstraction or idea formed by the perception of
phenomena’
• A combination of a number of similar characteristics/variables
which collectively define the concept and make its measurement
possible
• Examples: job satisfaction, job commitment, brand awareness,
service quality
• Some concepts are easy to measure and simple, others more complex
and can only be measured indirectly
• Importance of definition
• Variables – to measure a concept we need to specify the variables;
direct or indirect that serve as proxies for the concepts
• Independent and dependent variables
Importance of a conceptual
framework Dependent variable
X
Z
Independent variables
Y
• Logical integration of these variables provides a theoretical
framework or a conceptual model
Example- Model of Employee Engagement (adapted
from Saks, 2006)
Job characteristics
H1
Rewards and
Recognition
H2
Perceived H3
organizational Employee
support Engagement
H4
Perceived
supervisory
support H10: There will be no significant effect of Job
Characteristics on Employee Engagement
H11: Job characteristics will be positively related to
employee engagement
H5
Procedural justice
H20: There will be no significant effect of rewards and
recognition on Employee Engagement
H21: Rewards and recognition will be positively related to
Employee Engagement
Activity
• Alan M. Saks, (2006) "Antecedents and consequences of
employee engagement", Journal of Managerial Psychology,
Vol. 21 Issue: 7, pp.600-619
• What is the main concept being researched?
• What is your interpretation of this concept? How does it compare
to the interpretation of your colleague?
• Is it easy to define? (by the author, yourself, your colleague?)
• What are the proposed relationship?
• How was the concept measured?
Numerical coding – 5 points
Likert scale
Numerical coding – 10 points
Likert scale
Data types
• Categorical – a category labelled by means of a word, not a
number
• Descriptive (or nominal) – gender: male, female
• Ranked (or ordinal) – e.g. NVQ levels (1,2,3,4,5,6,7) or
rating/scale questions (‘how strongly do you agree with the
statement…’)
• No ‘true’ zero, rankings not necessarily evenly spaced (NVQ4 is NOT
twice as hard as NVQ2, nor an NVQ1 + NVQ5 = NVQ6)
• Therefore, numbers here do not have any arithmetic relationship
Data types
• Numerical – values are measured or counted numerically as
quantities, can assign a value position on a numerical scale
• Continuous – can take any value, e.g length of service 20 and ¼ years
• Discrete data – finite number or value from a scale e.g. number of mobile
phones manufactured or customers served
• And
• Interval – e.g. Celsius temperature scale
• Can state the difference (‘interval’) between any 2 values, no ‘true’ zero
• Can be added and subtracted but not multiplied and divided
• So, the difference between 20 ̊C and 30 ̊C is 10 ̊C it does not mean that 30 ̊ C is
one and a half times as warm
• Likert scale?
• Ratio – can calculate relative difference (‘ratio’) between any two data
values for a variable
• Profit - £300.000 in one year, £600.000 the following year, we can say profit
doubled
Figure 12.1 Defining the data type
Video 1
• https://www.youtube.com/watch?v=Mjif8PTgzUs&list=PLkIsel
vEzpM6pZ76FD3NoCvvgkj_p-dE8
Implications
• Extremely important for quantitative data analysis
• Not all statistics work with all types of data (e.g. calculating mean
for categorical is pointless)
• Analysis software will generate inappropriate statistics if you
allow it
• The more precise the scale of measurement the greater the range
of analytical techniques available
Activity
• Using Civil Service People Survey 2016 (Technical Guide)
• https://www.gov.uk/government/publications/civil-service-people
-survey-2016-results
• Answer the following questions:
• How does Civil Service People Survey define engagement?
• What is their analytical framework? How does it compare to Saks’
(2006) framework?
• How is the analytical framework related to the questionnaire
questions?
• How has measure been compiled? (e.g. ‘engagement’, ‘performance’)
• You can also address the additional questions:
• How was the sample chosen?
• What was the sample size?
• What was the response rate?
• What was the response rate? (any patterns in relation to responses and
non-responses?)
Preparing data for quantitative
computer analysis
• Data are entered for computer analysis as a data matrix
• Column represents a variable
• Each row represents a case
• All data need to be recorded using numerical codes
• If possible, existing coding schemes should be used (e.g. 1
‘Male’, 2 ‘Female’)
• Actual numbers usually used for numerical data (then may
need to be grouped or combined, e.g. employees’ salary could
be coded to the nearest £ as 43543, and/or placed in a group
from £40000 to £49999)
• The data matrix must be checked for errors
Example
gender born marital educate profmemb
1 2 67 1 5 3
2 1 19 . 7 3
3 2 24 2 7 3
• Thus for the table above row 1 represents a person who has
gender code 2 (female), was born in 1967, has marital status
code 1 (single), was educated up to code 5 (O‘Levels / GCSE
grade C or above), and professional membership code 3
(none). The data then continues to the right for further
variables. The symbol "." is the IBM SPSS Statistics symbol for
missing data;
SPSS activity
• Explore variable view in SPSS – Exercise 4
Descriptive statistics
• Used to organise and summarise data
• Meaningful when we have data for the whole population we
are aiming to describe e.g. all employees in an organisation
• Can be used to describe the findings of a sample but results
apply to the sample only
• To say something about the population from which the sample is
drawn would need to use inferential statistics
Frequency distribution
• Summarises the number of cases (frequency) for each
category (of a variable)
• Analyse the output
Frequency distributions (1 of 2)
Figure X.1 A bar chart of qualification levels
Frequency distributions (2 of 2)
Table X.1 Frequency table of qualification levels
SPSS activity
• Frequency distribution – Exercise 5
• Bar chart – Exercise 8
• Analyse the output
Measures of central tendency
• Mean – for continuous measure (e.g. salary, age, years of post-
qualification experience)
• Arithmetic average – add up all salaries, divide by the number of
people receiving a salary
• Median – mid-point in the ranking, equal number of cases
above and below
• For even numbers of cases take the middle two measures and
take the mid-point of these
• Mode – the most frequently occurring or most numerous
category – particularly useful for categorical measures
• Identify the mode in qualification levels frequency bar chart
Example: salaries at 2 organisations
Entrepreneur Co Cooperative Co
£80 000 (owner manager) £22 850
£29 000 £22850
£17 000 £22 600
£15 000 £22 600
£14 000 £22 100
£14 000 £22 100
£13 000 £21 600
£13 000 £21 600
£13 000 £21 600
£13 000 £21 000
Mean: £22 100 Mean: £22 100
Median: £14 000 Median: £22 100
Implications
• Mean affected by ‘outliers’ – measures at the extreme (80 000
in the example before)
• Mean for Entrepreneur Co (on its own) does not indicate that
8 employees earned less than the mean, only two were above
• Median still does not tell us anything about the range of
salaries. They are tightly bunched at Cooperative Co but in
another company could be spread more widely and have the
same median
Measures of dispersion
• Explains how the measures are spread out
• Can be used in addition to mean and median to give more
information
• Range – determined from the highest and the lowest point
Entrepreneur Co Cooperative Co
£80 000 - £ 13 000 £22 850 - £21 000
Range: £67 000 Range: £1 850
• Affected by ‘outliers’
• The interquartile range – to counteract the effect of outliers –
divide results into quarters
Figure 12.8 Annotated box plot
SPSS activity
• Measures of central tendency – Exercise 6 & 7
• Analyse the output
Video 2
• Data visualisation and descriptive statistic
• https://www.youtube.com/watch?v=Xm0PPtci3JE&list=PLkIsel
vEzpM6pZ76FD3NoCvvgkj_p-dE8
Comparing Variables – cross-tabulation
SPSS activity
• Create a table (crosstab) – Exercise 9
• Analyse the output
Which statistics to use?
• Categorical variables – data analysis restricted to frequency,
percentages for particular question, mode, chi square statistic
• E.g. Are women underrepresented in senior roles? (relationship between
gender and job grade)
• Ordinal variable – Spearman rank-order correlation
• E.g. Is there a relationship between education level and recognition of the
need for positive action
• Independent categorical binary variable, dependent
numerical/continuous – t test to compare averages
• E.g. Is there a difference between men and women in average salary? Is
there a difference in stress levels before and after well-being programme?
• Numerical variables – Pearson correlation
• E.g. Is there a relationship between income and job satisfaction?
• Pay attention to the requirements of each test!
Significance testing
• Data collected from a sample
• What is the probability of the correlation coefficient having
occurred by chance alone?
• If probability of the correlation coefficient having occurred by
chance alone is very low, usually P ‹ 0.05 or lower, than it is a
statistically significant relationship
• Higher than 0.05 is not statically significant
• That does not mean that there is no correlation but we cannot
make the conclusion with any certainty
• Sample size effect
• Small sample, very difficult to obtain a significant test statistic
• More on this in OpenIntro Statistics Chapter 4
• 4.X. Why do we use 0.05 as a significance level?
P-value and level of significance
P-Value Results Interpretation
p-value ≤ 0.05 Strong evidence you reject the null
against null hypothesis hypothesis and accept
the alternative
hypothesis
p-value > 0.05) Strong evidence for You accept null
null hypothesis hypothesis
P-value less than but Marginal An argument needs to
close to 0.05 developed with respect
to why you suggest
that the null hypothesis
should be rejected
34
SPSS activity
• Chi square – Exercise 10
• Analyse the output
Correlation
• Correlation examines the relationships between
variables, e.g. between the price of a product and the
demand for it.
• Correlation:
• Positive – as the values for one variable increase so do
those for the other
• Negative – as the values for one variable decrease, those
for the other variable increase
• Correlation does not imply causality
Positive correlation
e.g. income and food expenditure
Negative correlation
e.g. demand and price
Zero (absence of) correlation
e.g. sales of cameras and the price of fish
The correlation coefficient
• Measures the strength of association between two variables,
X and Y.
• −1 ≤ r ≤ + 1
• Positive correlation: r > 0
• Negative correlation: r < 0
• Zero correlation: r ≈ 0
• The closer r is to +1 (or −1), the closer the points lie to a
straight line with positive (negative) slope.
• Slide ‘positive correlation’ : r = 0.8
• Slide ‘negative correlation’ : r = −0.7
• Slide ‘absence of correlation’ : r = 0
Figure 12.15 Values of the correlation coefficient
Sources: Developed from earlier editions; Hair et al. (2006)
Video 3
• Line fitting, residuals, and correlation
https://www.youtube.com/watch?list=PLkIselvEzpM63ikRfN41DNIh
SgzboELOM&v=mPvtZhdPBhQ
Causality or Association?
In Germany storks were believed
to "bring fertility and prosperity,"
Causal or what?
• Number of storks as well as the number of newborns reflect size of
a village: larger village has more families producing more newborns
and has more roofs enabling more storks to nest
• Association of storks and babies highly propitious for the bird:
people encouraged them to nest on their roofs, in the belief that
they would bring fertility and prosperity to the house."