Factor Analysis
• Factor Analysis is interdependence technique. In
interdependence techniques the variables are not
classified as independent or dependent variable,
but their interrelationship is studied.
Factor analysis is general name for
•Principle Component Analysis
•Common Factor Analysis.
Ø The factor analysis is done principally for two reasons
Ø To identify a new, smaller set of uncorrelated variables to
be used in subsequent multiple regression analysis. In this
situation the Principle Component Analysis is performed
on the data. PCA considers the total variance in the data
while finding principle components from a given set of
variables
Ø To identify underlying dimensions / factors that are
unobservable but explain correlations among a set of
variables. In this situation the Common Factor Analysis is
performed on the data. FA considers only the common
variance while finding common factors from a given set of
variables. The common factor analysis is also termed as
Principle Axis Factoring.
SW388R7
Data Analysis &
Computers II
Slide 3 Ø The essential purpose of factor analysis is to
describe, if possible, the covariance relationships
among many variables in terms of few underlying,
but unobservable, random quantities called factors.
Basically, the factor model is motivated by the
following argument.
Ø Suppose variables can be grouped by their
correlations. That is, all variables, within a
particular group are highly correlated among
themselves but have relatively small correlations
with variables in a different group. In that case, it is
conceivable that each group of variables represents
a single underlying construct, or factor, that is
responsible for the correlations.
Principal components factor analysis
Ø Obtaining a factor solution through principal
components analysis is an iterative process that
usually requires repeating the SPSS factor analysis
procedure a number of times to reach a satisfactory
solution.
Ø We begin by identifying a group of variables whose
variance we believe can be represented more
parsimoniously by a smaller set of factors, or
components. The end result of the principal
components analysis will tell us which variables can
be represented by which components, and which
variables should be retained as individual variables
because the factor solution does not adequately
represent their information.
Principal components factor analysis
Ø Technique for forming set of new variables that are
linear combinations of the original set of variables,
and are uncorrelated. The new variables are called
Principal Components.
Ø These variables are fewer in number as compared to
the original variables, but they extract most of the
informant provided by the original variables.
Applications
Ø One could identify several financial parameters and
ratios exceeding ten for determining the financial
health of a company. Obviously, it would be
extremely taxing to interpret all such pieces of
information for assessing the financial health of a
company. However, the task could be much simpler
if these parameters and ratios could be reduced to a
few indices, say two or three, which are linear
combinations of the original parameters and ratios.
Applications
Ø A multiple regression model may be derived to
forecast a parameter like sales, profit, price, etc.
However, the variables under consideration could be
correlated among themselves indicating
multicollinearity in the data. This could lead to
misleading interpretation of regression coefficients
as also increase in the standard errors of the
estimates of parameters. It would be very useful, if
the new uncorrelated variables could be formed
which are linear combinations of the original
variables. These new variables could then be used
for developing the regression model, for appropriate
interpretation and better forecast.
SW388R7
Data Analysis &
Computers II
Common Factor Analysis (CFA)
Slide 8
Ø It is a statistical approach that is used to analyse
inter- relationships among a large number of
variables(indicators) and to explain these
variables(indicators) in terms of a few unobservable
constructs (factors). In fact, these factors impact the
variables, and are reflective indicators of the
factors. The statistical approach involves finding a
way of condensing the information contained in a
number of original variables into a smaller set of
constructs (factors) - mostly one or two- with a
minimum loss of information.
SW388R7
Data Analysis &
Computers II Common Factor Analysis (CFA)
Slide 9
Ø Identifies the smallest number of common factors
that best explain or account for most of the
correlation among the indicators. For example,
intelligence quotient of a student might explain most
of the marks obtained in Mathematics, Physics,
Statistics, etc. As yet another example, when two
variables x and y are highly correlated, only one of
them could be used to represent the entire data
SW388R7
Data Analysis &
Computers II Common Factor Analysis (CFA)
Slide 10
CFA Helps in assessing
Ø the image of a company/enterprise
Ø attitudes of sales personnel and customers
Ø preference or priority for the
Ø characteristics of
- product like television, mobile phone, etc.
- a service like TV program, air travel etc.
Key Terms
Ø Exploratory Factor Analysis (EFA)
This technique is used when a researcher has no prior
knowledge about the number of factors the variables
will be indicating. In such cases computer based
techniques are used to indicate appropriate number of
factors.
Ø Confirmatory Factor Analysis (CFA)
This technique is used when the researcher has the prior
knowledge(on the basis of some pre-established
theory) about the number of factors the variables will
be indicating. This makes it easy as there is no decision
to be taken about the number of factors and the
number is indicated in the computer based tool while
conducting analysis.
Key Terms
Ø Correlation Matrix
Ø This is the matrix showing simple correlations
between all possible pairs of variables. The diagonal
element of this matrix is 1 and this is a symmetric
matrix, since correlation between two variables x
and y is same as between y and x.
Ø Communality
Ø The amount of variance, an original variable shares
with all other variables included in the analysis. A
relatively high communality indicates that a variable
has much in common with the other variables taken
as a group.
Key Terms
Ø Eigenvalue
Ø Eigenvalue for each factor is the total variance
explained by each factor.
Ø Factor
Ø A linear combination of the original variables. Factor
also represents the underlying dimensions(
constructs) that summarise or account for the
original set of observed variables
SW388R7
Data Analysis &
Computers II Key Terms
Slide 14
Ø Factor Loadings
Ø The factor loadings, or component loadings in PCA,
are the correlation coefficients between the
variables (given in output as rows ) and factors
(given in output columns) These loadings are
analogous to Pearson’s correlation coefficient r, the
squared factor loading is defined as the percent of
variance in the respective variable explained by the
factor.
Ø Factor Matrix
Ø This contains factor loadings on all the variables on
all the factors extracted
SW388R7
Data Analysis &
Computers II Key Terms
Slide 15
Ø Factor Plot or Rotated Factor Space
Ø This is a plot where the factors are on different axis and the
variables are drawn on these axes. This plot can be interpreted
only if the number of factors are 3 or less
Ø Factor Scores
Ø Each individual observation has a score, or value, associated
with each of the original variables. Factor analysis procedures
derive factor scores that represent each observation’s
calculated values, or score, on each of the factors. The factor
score will represent an individual’s combined response to the
several variables representing the factor.
Ø The component scores may be used in subsequent analysis in
PCA. When the factors are to represent a new set of variables
that they may predict or be dependent on some phenomenon,
the new input may be factor scores.
SW388R7
Data Analysis &
Computers II Key Terms
Slide 16
Ø Goodness of a Factor
Ø How well can a factor account for the correlations
among the indicators ? One could examine the
correlations among the indicators after the effect of
the factor is removed. For a good factor solution, the
resulting partial correlations should be near zero,
because once the effect of the common factor is
removed , there is nothing to link the indicators.
Ø Bartlett’s Test of specificity This is the test
statistics used to test the null hypothesis that there
is no correlation between the variables.
SW388R7
Data Analysis &
Computers II Key Terms
Slide 17
Ø Kaiser Meyer Olkin (KMO) Measure of Sampling
Adequacy
Ø This is an index used to test appropriateness of the
factor analysis. High values of this index, generally,
more than 0.5 , may indicate that the factor
analysis is an appropriate measure, where as the
lower values (less than 0.5) indicate that factor
analysis may not be appropriate.
SW388R7
Data Analysis &
Computers II Key Terms
Slide 18
Ø Scree Plot A plot of Eigen values against the factors
in the order of their extraction.
Ø Trace
Ø The sum of squares of the values on the diagonal of
the correlation matrix used in the factor analysis. It
represents the total amount of variance on which the
factor solution is based.
SW388R7
Data Analysis &
Computers II PCA
Slide 19
Ø Suppose, in a particular situation, k variables are
required to explain the entire system under study.
Through PCA, the original variables are transformed
into a new set of variables called principal
components, numbering much less than k. These are
formed in such a manner that they extract almost
the entire information provided by the original
variables. Thus, the original data of n observations
on each of the k variables is reduced to a new data
of n observations on each of the principal
components. That is how; PCA is referred to as one
of the data reduction and interpretation techniques.
Some indicative applications are given below.
SW388R7
Data Analysis &
Computers II PCA
Slide 20
Ø There are a number of financial parameters/ratios
for predicting health of a company. It would be
useful if only a couple of indicators could be formed
as linear combination of the original
parameters/ratios in such a way that the few
indicators extract most of the information contained
in the data on original variables.
SW388R7
Data Analysis &
Computers II PCA
Slide 21
Ø In regression model, if independent variables are
correlated implying there is multicollinearity, then
new variables could be formed as linear
combinations of original variables which themselves
are uncorrelated. The regression equation can then
be derived with these new uncorrelated independent
variables, and used for interpreting the regression
coefficients as also for predicting the dependant
variable with the help of these new independent
variables. This is highly useful in marketing and
financial applications involving forecasting, sales,
profit, price, etc. with the help of regression
equations.
SW388R7
Data Analysis &
Computers II PCA
Slide 22
Ø Analysis of principal components often reveals
relationships that were not previously suspected and
thereby allows interpretations that would not be
ordinarily understood. A good example of this is
provided by stock market indices.
Ø PCA is a means to an end and not the end in itself.
PCA can be used for inputting principal
components as variables for further analysing the
data using other techniques such as cluster
analysis, regression and discriminant analysis.
SW388R7
Data Analysis &
Computers II Common Factor Analysis
Slide 23
Ø Data reduction and summarization technique. It is a
statistical approach that is used to analyse inter
relationships among a large number of variables
(e.g., test scores, test items, questionnaire
responses) and then explaining these variables in
terms of their common underlying dimensions
(factors). For example, a hypothetical survey
questionnaire may consist of 20 or even more
questions, but since not all of the questions are
identical, they do not all measure the basic
underlying dimensions to the same extent. By using
factor analysis, we can identify the separate
dimensions being measured by the survey and
determine a factor loading for each variable (test
item) on each factor.
SW388R7
Data Analysis &
Computers II Common Factor Analysis
Slide 24
Ø Common Factor analysis is an interdependence
technique in which all variables are simultaneously
considered. In a sense, each of the observed
(original) variables is considered as a dependant
variable that is a function of some underlying,
latent, and hypothetical/unobserved set of factors
(dimensions). One could also consider the original
variables as reflective indicators of the factors. For
example, marks( variable) in an examination reflect
the intelligence( factor).
SW388R7
Data Analysis &
Computers II Common Factor Analysis
Slide 25
Ø The statistical approach followed in factor analysis involves
finding a way of condensing the information contained in a
number of original variables into a smaller set of dimensions
(factors) with a minimum loss of information.
Ø Common Factor Analysis was originally developed to explain
students’ performance in various subjects and to understand
the link between grades and intelligence. Thus, the marks
obtained in an examination reflect the student’s intelligence
quotient. A salesman’s performance in term of sales might
reflect his attitude towards the job, and efforts made by him.
Ø One of the studies relating to marks obtained by students in
various subjects, led to the conclusion that students’ marks are
a function of two common factors viz. Quantitative and Verbal
abilities. The quantitative ability factor explains marks in
subjects like Mathematics, Physics and Chemistry, and verbal
ability explains marks in subjects like Languages and History.
SW388R7
Data Analysis &
Computers II Common Factor Analysis
Slide 26
Ø In general, the factor analysis performs the
following functions:
Ø Identifies the smallest number of common factors
that best explain or account for the correlation
among the indicators
Ø Identifies a set of dimensions that are latent ( not
easily observed) in a large number of variables
Ø Devises a method of combining or condensing a large
number of consumers with varying preferences into
distinctly different number of groups.
Ø
SW388R7
Data Analysis &
Computers II Strategy for solving problems - 1
Slide 27
Ø A principal component factor analysis requires:
Ø The variables included must be metric level or dichotomous
(dummy-coded) nominal level
Ø The sample size must be greater than 50 (preferably 100)
Ø The ratio of cases to variables must be 5 to 1 or larger
Ø The correlation matrix for the variables must contain 2 or
more correlations of 0.30 or greater
Ø Variables with measures of sampling adequacy less than 0.50
must be removed
Ø The overall measure of sampling adequacy is 0.50 or higher
Ø The Bartlett test of sphericity is statistically significant.
Ø The first phase of a principal component analysis is
devoted to verifying that we meet these
requirements. If we do not meet these
requirements, factor analysis is not appropriate.
SW388R7
Data Analysis &
Computers II Strategy for solving problems - 2
Slide 28
Ø The second phase of a principal component factor
analysis focuses on deriving a factor model, or
pattern of relationships between variables and
components, that satisfies the following
requirements:
Ø The derived components explain 50% or more of the
variance in each of the variables, i.e. have a communality
greater than 0.50
Ø None of the variables have loadings, or correlations, of 0.40
or higher for more than one component, i.e. do not have
complex structure
Ø None of the components has only one variable in it
Ø To meet these requirements, we remove problematic
variables from the analysis and repeat the principal
component analysis.
SW388R7
Data Analysis &
Computers II Strategy for solving problems - 3
Slide 29
Ø If, at the conclusion of this process, we have
components that have more than one variable
loading on them, have components that explain at
least 50% of the variance in the included variables,
and have components that collectively explain more
than 60% of the variance in the set of variables, we
can substitute the components for the variables in
further analyses.
Ø Variables that were removed in the analysis should
be included individually in further analyses.
Ø Substitution of components for individual variables is
accomplished by using only the highest loading
variable, or by combining the variables loading on
each component to create a new variable.
SW388R7
Data Analysis &
Computers II Notes - 1
Slide 30
Ø When evaluating measures of sampling adequacy,
communalities, or factor loadings, we ignore the sign
of the numeric value and base our decision on the
size or magnitude of the value.
Ø The sign of the number indicates the direction of the
relationship.
Ø A loading of -0.732 is just as strong as a loading of
0.732. The minus sign indicates an inverse or
negative relationship; the absence of a sign is meant
to imply a plus sign indicating a direct or positive
relationship.
SW388R7
Data Analysis &
Computers II Notes - 2
Slide 31
Ø If there are two or more components in the
component matrix, the pattern of loadings is based
on the SPSS Rotated Component Matrix. If there is
only one component in the solution, the Rotated
Component Matrix is not computed, and the pattern
of loadings is based on the Component Matrix.
Ø It is possible that the analysis will break down and
we will have too few variables in the analysis to
support the use of principal component analysis.
SW388R7
Data Analysis & Sample size requirement:
minimum number of cases
Computers II
Slide 32
Descriptive Statistics
Mean Std. Deviation Analysis N
ENVIRONMENTAL
THREATS 3.31 .988 67
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN 3.09 .949 67
CROPS
AMERICAN DOING
ENOUGH TO PROTECT 2.40 .605 67
The number of valid cases for this
ENVIRONMENT
set of variables
INTL AGREEMENTS FOR is 67.
ENVIRONMENT 1.97 .674 67
While
PROBLEMS principal component analysis
can be conducted
POOR COUNTRIES LESS on a sample that
has fewer
THAN RICH FOR than 100 cases,
3.73 but more .898 67
than 50 cases, we should be
ENVIRONMENT
cautious
ECONOMIC about its interpretation.
PROGRESS
DEPENDENT ON 2.69 .839 67
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN 2.49 .894 67
IN 5 YEARS
RESPONDENT'S
49.757 18.9651 67
SOCIOECONOMIC INDEX
SW388R7
Data Analysis & Sample size requirement:
ratio of cases to variables
Computers II
Slide 33
Descriptive Statistics
Mean Std. Deviation Analysis N
ENVIRONMENTAL
THREATS 3.31 .988 67
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN 3.09 .949 67
CROPS
AMERICAN DOING
ENOUGH TO PROTECT 2.40 to
The ratio of cases .605 67
ENVIRONMENTvariables in a principal
INTL AGREEMENTS FOR
component analysis should
ENVIRONMENTbe at least 5 to1.971. .674 67
PROBLEMS
POOR COUNTRIES
WithLESS
67 and 8 variables,
THAN RICH FOR 3.73to
the ratio of cases .898 67
ENVIRONMENT
variables is 8.4 to 1, which
ECONOMIC PROGRESS
exceeds the requirement
DEPENDENT ON
for the ratio of 2.69
cases to .839 67
ENVIRONMENT
variables.
LIKELIHOOD OF
NUCLEAR MELTDOWN 2.49 .894 67
IN 5 YEARS
RESPONDENT'S
49.757 18.9651 67
SOCIOECONOMIC INDEX
SW388R7
Data Analysis & Appropriateness of factor analysis:
Presence of substantial correlations
Computers II
Slide 34
Correlation Matrix
AMERICAN INTL POOR ECONOMIC
ENVIRON HOW DOING AGREEMEN COUNTRIES PROGRESS
MENTAL DANGEROUS ENOUGH TO TS FOR Principal
LESS THAN DEPENDEN analysis
components LIKELIHOOD RESPON
THREATS MODIFYING PROTECT ENVIRONME RICH FOR T ON OF NUCLEAR T'S
EXAGGER GENES IN ENVIRONME NT requires that ENVIRONME
ENVIRONME there be some
MELTDOWN SOCIOE
ATED CROPS NT PROBLEMScorrelations
NT greater
NT thanIN0.30
5 YEARS OMIC IN
Correlation ENVIRONMENTAL between the variables
THREATS 1.000 -.240 .394 -.305 .301 -.117 -.126
EXAGGERATED included in the analysis.
HOW DANGEROUS
MODIFYING GENES IN -.240 1.000 -.301 .146For this -.149
set of variables,
.340 there .394
CROPS are 10 correlations in the
AMERICAN DOING
matrix greater than 0.30,
ENOUGH TO PROTECT .394 -.301 1.000 -.305 .258 -.136 -.401
ENVIRONMENT satisfying this requirement.
INTL AGREEMENTS FOR The correlations greater than
ENVIRONMENT -.305 .146 -.305 1.0000.30 are-.289
highlighted in yellow..151
.117
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR .301 -.149 .258 -.289 1.000 -.194 -.003
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON -.117 .340 -.136 .117 -.194 1.000 .310
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN -.126 .394 -.401 .151 -.003 .310 1.000
IN 5 YEARS
RESPONDENT'S
.069 .208 .054 .094 .114 -.018 .341
SOCIOECONOMIC INDEX
SW388R7
Data Analysis & Appropriateness of factor analysis:
Sampling adequacy of individual variables
Computers II
Slide 35
Principal component analysis requires
that the Kaiser-Meyer-Olkin Measure of
Sampling Adequacy be greater than 0.50
for each individual variable as well as the
set of variables.
The Measure of Sampling Adequacy
(MSA) is described at marvelous if it is
0.90 or greater, meritorious if it is in the
0.80's, middling if in the 0.70's, mediocre
if in the in the 0.60's , miserable if in the
0.50's, and unacceptable if below 0.50.
There are two anti-image
matrices: the anti-image
covariance matrix and the
anti-image correlation
matrix. We are interested in
the anti-image correlation
matrix.
SW388R7
Data Analysis & Appropriateness of factor analysis:
Sampling adequacy of individual variables
Computers II
Slide 36
SPSS locates the Measures
of Sampling Adequacy are
on the diagonal of the anti-
image correlation matrix,
highlighted in yellow.
On iteration 1, the MSA for
the variable "respondent's
socioeconomic index" [sei]
was 0.410 which was less
than 0.50, so it was
removed from the analysis.
SW388R7
Data Analysis &
Computers II Excluding a variable from the factor analysis
Slide 37
To remove the variable
"respondent's socioeconomic
index" [sei] from the
analysis, click on the Dialog
Recall tool button to access
the drop down menu.
SW388R7
Data Analysis &
Computers II Repeating the factor analysis
Slide 38
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II Removing the variable from the list of variables
Slide 39
First, highlight
the sei variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II Replicating the factor analysis
Slide 40
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.
To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis & Appropriateness of factor analysis:
Sample adequacy for revised factor analysis
Computers II
Slide 41
In the factor analysis with the sei
variable removed, we see that the
measures of sampling adequacy
for the remaining variables are all
greater than 0.50.
On iteration 2, the MSA for all of
the individual variables still
included in the analysis was
greater than 0.5, supporting their
retention in the analysis.
SW388R7
Data Analysis & Appropriateness of factor analysis:
Sample adequacy for set of variables
Computers II
Slide 42
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy. .734
Bartlett's Test of Approx. Chi-Square 69.350
Sphericity df 21
Sig. .000
In addition, the overall
MSA for the set of variables
included in the analysis
was 0.734, which exceeds
the minimum requirement
of 0.50 for overall MSA.
The seven variables remaining
in the analysis satisfy the
criteria for appropriateness of
factor analysis.
SW388R7
Data Analysis & Appropriateness of factor analysis:
Bartlett test of sphericity
Computers II
Slide 43
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy. .642
Bartlett's Test of Approx. Chi-Square 84.006
Sphericity df 28
Sig. .000
Principal component analysis requires
that the probability associated with
Bartlett's Test of Sphericity be less
than the level of significance.
The probability associated with the
Bartlett test is <0.001, which satisfies
this requirement.
The next step is to determine the
number of factors that should be
included in the factor solution.
SW388R7
Data Analysis & Number of factors to extract:
Latent root criterion
Computers II
Slide 44
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation
Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total
1 2.461 35.164 35.164 2.461 35.164 35.164 1.849
2 1.229 17.557 52.721 1.229 17.557 52.721 1.842
3 .889 12.695 65.415
4 .722 10.313 75.728
5 .621 8.878 84.606
6 .572 8.174 92.780
7 .505 7.220 100.000
Extraction Method: Principal Component Analysis.
Using the output from iteration 2,
there were 2 eigenvalues greater
than 1.0.
The latent root criterion for
number of factors to derive would
indicate that there were 2
components to be extracted for
these variables.
SW388R7
Data Analysis & Number of factors to extract:
Percentage of variance criterion
Computers II
Slide 45
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation
Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total
1 2.461 35.164 35.164 2.461 35.164 35.164 1.849
2 1.229 17.557 52.721 1.229 17.557 52.721 1.842
3 .889 12.695 65.415
4 .722 10.313 75.728
5 .621 8.878 84.606
6 .572 8.174 92.780
7 .505 7.220 100.000
Extraction Method: Principal Component Analysis. In contrast, the cumulative
proportion of variance criteria
would require 3 components to
satisfy the criterion of explaining
60% or more of the total variance.
A 3 components solution would
explain 65.415% of the total
Since the SPSS default is to extract variance.
the number of components indicated
by the latent root criterion, our
initial factor solution was based on
the extraction of 2 components.
SW388R7
Data Analysis &
Computers II Evaluating communalities
Slide 46
Communalities
Initial Extraction
ENVIRONMENTAL
THREATS 1.000 .526
Communalities represent the EXAGGERATED
proportion of the variance in HOW DANGEROUS
the original variables that is MODIFYING GENES IN 1.000 .582
accounted for by the factor CROPS
solution. AMERICAN DOING
ENOUGH TO PROTECT 1.000 .521
The factor solution should ENVIRONMENT
explain at least half of each INTL AGREEMENTS FOR
original variable's variance, so ENVIRONMENT 1.000 .491
PROBLEMS
the communality value for
each variable should be 0.50 POOR COUNTRIES LESS
or higher. THAN RICH FOR 1.000 .494
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON 1.000 .437
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN 1.000 .640
IN 5 YEARS
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II Communality requiring variable removal
Slide 47
Communalities
Initial Extraction
ENVIRONMENTAL
THREATS 1.000 .526
EXAGGERATED
HOW DANGEROUS On iteration 2, the
MODIFYING GENES IN 1.000 .582 communality for the
CROPS variable "economic
AMERICAN DOING progress in America will
ENOUGH TO PROTECT 1.000 .521 slow down without more
ENVIRONMENT concern for environment"
[econgrn] was 0.437
INTL AGREEMENTS FOR which was less than 0.50.
ENVIRONMENT 1.000 .491
PROBLEMS The variable was removed
POOR COUNTRIES LESS and the principal
THAN RICH FOR 1.000 .494 component analysis was
ENVIRONMENT computed again.
ECONOMIC PROGRESS
DEPENDENT ON 1.000 .437
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN 1.000 .640
IN 5 YEARS
Extraction Method: Principal Component Analysis.
In this iteration, there are actually three
variables that have communalities less
than 0.50. The variable with the
smallest communality is selected for
removal.
SW388R7
Data Analysis &
Computers II Repeating the factor analysis
Slide 48
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II Removing the variable from the list of variables
Slide 49
First, highlight
the econgrn
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II Replicating the factor analysis
Slide 50
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.
To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II Communality requiring variable removal
Slide 51
Communalities
Initial Extraction
ENVIRONMENTAL On iteration 3, the
THREATS 1.000 .517 communality for the
EXAGGERATED
variable "should be
international agreements
HOW DANGEROUS for environment
MODIFYING GENES IN 1.000 .573 problems" [grnintl] was
CROPS 0.486 which was less
AMERICAN DOING than 0.50.
ENOUGH TO PROTECT 1.000 .576
ENVIRONMENT The variable was removed
and the principal
INTL AGREEMENTS component analysis was
FOR ENVIRONMENT 1.000 .486 computed again.
PROBLEMS
POOR COUNTRIES
LESS THAN RICH FOR 1.000 .573
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN 1.000 .716
IN 5 YEARS
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II Repeating the factor analysis
Slide 52
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II Removing the variable from the list of variables
Slide 53
First, highlight
the grnintl
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II Replicating the factor analysis
Slide 54
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.
To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II Communality satisfactory for all variables
Slide 55
Communalities
Initial Extraction Once any variables with
ENVIRONMENTAL communalities less than
THREATS 1.000 .591 0.50 have been removed
EXAGGERATED from the analysis, the
pattern of factor loadings
HOW DANGEROUS should be examined to
MODIFYING GENES IN 1.000 .568 identify variables that
CROPS have complex structure.
AMERICAN DOING
ENOUGH TO PROTECT 1.000 .582
ENVIRONMENT
POOR COUNTRIES
LESS THAN RICH FOR 1.000 .679
ENVIRONMENT
Complex structure occurs when
LIKELIHOOD OF one variable has high loadings or
NUCLEAR MELTDOWN 1.000 .725 correlations (0.40 or greater) on
IN 5 YEARS more than one component. If a
Extraction Method: Principal Component Analysis. variable has complex structure, it
should be removed from the
analysis.
Variables are only checked for
complex structure if there is more
than one component in the
solution. Variables that load on
only one component are described
as having simple structure.
SW388R7
Data Analysis &
Computers II Identifying complex structure
Slide 56
If only one component has been
extracted, each variable can only
load on that one factor, so
complex structure is not an issue.
On iteration 4, the variable
"America doing enough to
Rotated Component Matrixa
protect environment"
Component [amprogrn] was found to
1 2 have complex structure.
ENVIRONMENTAL
THREATS -.265 .722 Specifically, the variable
EXAGGERATED had a loading of -0.581 on
HOW DANGEROUS component 1 and a loading
MODIFYING GENES IN .732 -.182 of 0.495 on component 2.
CROPS
AMERICAN DOING The variable should be
ENOUGH TO PROTECT -.581 .495 removed and the principal
ENVIRONMENT component analysis should
POOR COUNTRIES be repeated.
LESS THAN RICH FOR .092 .819
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN .847 .087
IN 5 YEARS
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II Repeating the factor analysis
Slide 57
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II Removing the variable from the list of variables
Slide 58
First, highlight
the amprogrn
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II Replicating the factor analysis
Slide 59
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.
To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II Checking for single-variable components
Slide 60
Rotated Component Matrixa On iteration 5, none of the
variables demonstrated complex
Component
structure.
1 2
ENVIRONMENTAL
THREATS -.207 .756
It is not necessary to remove
EXAGGERATED any additional variables because
of complex structure.
HOW DANGEROUS
MODIFYING GENES IN .801 -.229
CROPS
POOR COUNTRIES
LESS THAN RICH FOR .051 .830
ENVIRONMENT
LIKELIHOOD OF
After variables have been
NUCLEAR MELTDOWN removed for low communalities
.861 .059
IN 5 YEARS and complex structure, the
factor solution is examined to
Extraction Method: Principal Component Analysis.
remove any components that
Rotation Method: Varimax with Kaiser Normalization.
have only a single variable
a. Rotation converged in 3 iterations.
loading on them.
If a component has only a single
variable loading on it, the
variable should be removed from
the next iteration of the principal
component analysis.
SW388R7
Data Analysis &
Computers II Variable loadings on components
Slide 61
On iteration 5, the 2 Communalities
components in the
Initial Extraction
analysis had more than
ENVIRONMENTAL
one variable loading on
THREATS 1.000 .615
each of them. EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN 1.000 .694
Rotated Component Matrixa CROPS
POOR COUNTRIES
Component
LESS THAN RICH FOR 1.000 .691
1 2 ENVIRONMENT
ENVIRONMENTAL
LIKELIHOOD OF
THREATS -.207 .756
NUCLEAR MELTDOWN 1.000 .744
EXAGGERATED
IN 5 YEARS
HOW DANGEROUS
MODIFYING GENES IN .801 -.229 Extraction Method: Principal Component Analysis.
CROPS
POOR COUNTRIES The communalities for all of the
LESS THAN RICH FOR .051 .830 variables included on the
ENVIRONMENT components were greater than
LIKELIHOOD OF 0.50 and all variables had
NUCLEAR MELTDOWN .861 .059 simple structure.
IN 5 YEARS
Extraction Method: Principal Component Analysis. The principal component
Rotation Method: Varimax with Kaiser Normalization. analysis has been completed.
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II Interpreting the principal components
Slide 62
The information in 4 of the
variables can be represented
by 2 components.
Component 1 includes the variables
"danger to the environment from
modifying genes in crops"
[genegen] and "likelihood of nuclear
power station damaging
Rotated Component Matrixa environment in next 5 years"
[nukeacc]. We can substitute one
Component component variable for this
combination of variables in further
1 2 analyses.
ENVIRONMENTAL
THREATS -.207 .756
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN .801 -.229
CROPS
POOR COUNTRIES
LESS THAN RICH FOR .051 .830 Component 2 includes the variables
ENVIRONMENT "claims about environmental threats
LIKELIHOOD OF are exaggerated" [grnexagg] and
NUCLEAR MELTDOWN "poorer countries should be
.861 .059
expected to do less for the
IN 5 YEARS environment" [ldcgrn]. We can
Extraction Method: Principal Component Analysis. substitute one component variable
Rotation Method: Varimax with Kaiser Normalization. for this combination of variables in
a. Rotation converged in 3 iterations. further analyses.
SW388R7
Data Analysis &
Computers II Variance explained in individual variables
Slide 63
The components explain at
least 50% of the variance in
each of the variables included
in the final analysis.
Communalities
Initial Extraction
ENVIRONMENTAL
THREATS 1.000 .615
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN 1.000 .694
CROPS
POOR COUNTRIES
LESS THAN RICH FOR 1.000 .691
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN 1.000 .744
IN 5 YEARS
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II Total variance explained
Slide 64
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings R
Component Total % of Variance Cumulative % Total % of Variance Cumulative % Tot
1 1.626 40.651 40.651 1.626 40.651 40.651 1
2 1.119 27.968 68.619 1.119 27.968 68.619 1
3 .694 17.341 85.960
4 .562 14.040 100.000
Extraction Method: Principal Component Analysis.
The 2 components explain
68.619% of the total
variance in the variables
which are included on the
components.