A statistical approach that can be used to
analyze interrelationship among a large
number of variables and a explain these
variables in terms of their common
underlying dimension(factor)”
A data reduction technique designed to
represent a wide range of attributes on a
smaller number of dimensions.
For example,
suppose that a bank asked a large number of
questions about a given branch.
Consider how the following characteristics might
be represented by just a few constructs (factors).
Friendliness of the Staff
Time spend in line up Service
Assistance via telephone
Distance of bank from home
Hours of operation Convenience
Availability of parking
Monthly Account fee
Charges for withdrawals and deposits Cost
Loan interest rate
A statistical Techniques which Simultaneously analyze more
than two variables
Classification Of Multivariate Technique
1.Multiple Regression
2.Factor Analysis
It is a technique applicable when there is a systematic
interdependence among a set of observed variables
into a few categories known factors
Eg: Individual income,Education,Occupation to infer –
Social Class –Factor
The Centroid Method
This method of factor analysis is developed by L.L Thrustone
The Centroid method tends to maximize the sum of loadings and also it
extracts the largest sum of absolute loadings from each factor
Illustration -1
Given is the following Correlation matrix ,R , relating to eight variables with unities n
the diagonal spaces
Variables
1 2 3 4 5 6 7 8
1.000 .709 .204 .081 .626 .113 .155 .744
.709 1.000 .051 .089 .581 .098 .083 .652
.204 .051 1.000 .671 .123 .689 .582 .072
.081 .089 .671 1.000 .022 .798 .613 .111
.629 .581 .123 .022 1.000 .047 .201 .724
.113 .098 .689 .798 .047 1.000 .801 .120
.115 .083 .582 .613 .201 .801 1.000 .152
.774 .652 .072 .111 .724 .120 .152 1.000
Using the Centroid method of factor analysis, work out the First and second Centroid
factors from the above information
1 2 3 4 5 6 7 8
1 1.000 .709 .204 .081 .626 .113 .155 .744
2 .709 1.000 .051 .089 .581 .098 .083 .652
3 .204 .051 1.000 .671 .123 .689 .582 .072
4 .081 .089 .671 1.000 .022 .798 .613 .111
5 .629 .581 .123 .022 1.000 .047 .201 .724
6 .113 .098 .689 .798 .047 1.000 .801 .120
7 .115 .083 .582 .613 .201 .801 1.000 .152
8 .774 .652 .072 .111 .724 .120 .152 1.000
Column 3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Sums
Sum of the column sums (T) = 27.884 T = 5.281
First Centroid Factor A = 3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
5.281 5.281 5.281 5.281 5.281 5.281 5.281 5.281
= .693 .618 .642 .641 .629 .694 .679 .683
Variables Factor loading Concerning First
Centroid Factor A
1 .693
2 .618
3 .642
4 .641
5 .629
6 .694
7 .679
8 .683
.693 .618 .642 .641 .629 .694 .679 .683
.693 .480 .428 .445 .444 .436 .481 .471 .473
.618 .428 .382 .397 .396 .389 .429 .420 .422
.642 .445 .397 .412 .412 .404 .446 .436 .438
.641 .444 .396 .411 .411 .403 .445 .435 .438
.629 .436 .389 .403 .403 .396 .437 .427 .430
.694 .481 .429 .445 .445 .437 .482 .471 .474
.679 .471 .420 .435 .435 .427 .471 .461 .464
.683 .473 .422 .438 .438 .430 .474 .464 .466
1 2 3 4 5 6 7 8
1 .520 .281 -.241 -.363 .190 -.368 -.316 .301
2 .281 .618 -.346 -.307 .192 -.331 -.337 .230
3 -.241 -.346 .588 .259 -.281 .146 .146 -.366
4 -.363 -.307 .259 .589 -.381 .178 .178 -.327
5 .190 .192 -.281 -.381 .604 -.217 -.217 .294
6 -.368 -.331 .243 .353 -.390 .330 .330 -.354
7 -.316 -.337 .146 .178 -.226 .539 .539 -.312
8 .301 .230 -.366 -.327 .294 -.312 -.312 -534
1 2 3 4 5 6 7 8
1 .520 .281 .241 .363 .190 .368 .316 .301
2 .281 .618 .346 .307 .192 .331 .337 .230
3 .241 .346 .588 .259 .281 .146 .146 .366
4 .363 .307 .259 .589 .381 .178 .178 .327
5 .190 .192 .281 .381 .604 .217 .217 .294
6 .368 .331 .243 .353 .390 .330 .330 .354
7 .316 .337 .146 .178 .226 .539 .539 .312
8 .301 .230 .366 .327 .294 .312 .312 .534
Column 2.580 2.642 2.470 2.757 2.558 2.887 2.375 2.718
Sums
Sum of the column sums (T) = 20.987 T = 4.581
2nd Centroid Factor B= .563 .577 -.539 -.602 .558 -.630 -.518 .593
Variables Factor Loadings
Centroid Factor A Centroid Factor B
1 .693 .563
2 .618 .577
3 .642 -.539
4 .641 -.602
5 .629 .558
6 .694 -.630
7 .679 -.518
8 .683 .593
Variables Centroid Factor A Centroid Factor B Communality(h)2
1 .693 .563 (.693)2 +(.563)2 =
7.97
2 .618 .577
3 .642 -.539
4 .641 -.602
5 .629 .558
6 .694 -.630
7 .679 -.518
8 .683 .593
Eigen 3.490 2.631
Value 6.121
3.490/8
2.631/8 6.121/8
Proportion 44%
of total 43% 77%
Variance
This method of factor analysis developed by
H.Hotelling
In this method it seeks to maximize the sum of squared
loadings of each factor
Illustration -1
Given is the following Correlation matrix ,R , relating to eight variables with unities n
the diagonal spaces
Variables
1 2 3 4 5 6 7 8
1.000 .709 .204 .081 .626 .113 .155 .744
.709 1.000 .051 .089 .581 .098 .083 .652
.204 .051 1.000 .671 .123 .689 .582 .072
.081 .089 .671 1.000 .022 .798 .613 .111
.629 .581 .123 .022 1.000 .047 .201 .724
.113 .098 .689 .798 .047 1.000 .801 .120
.115 .083 .582 .613 .201 .801 1.000 .152
.774 .652 .072 .111 .724 .120 .152 1.000
Using the Centroid method of factor analysis, work out the First and second Centroid
factors from the above information
1 2 3 4 5 6 7 8
1 1.000 .709 .204 .081 .626 .113 .155 .744
2 .709 1.000 .051 .089 .581 .098 .083 .652
3 .204 .051 1.000 .671 .123 .689 .582 .072
4 .081 .089 .671 1.000 .022 .798 .613 .111
5 .629 .581 .123 .022 1.000 .047 .201 .724
6 .113 .098 .689 .798 .047 1.000 .801 .120
7 .115 .083 .582 .613 .201 .801 1.000 .152
8 .774 .652 .072 .111 .724 .120 .152 1.000
Column 3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Sums
Sum of the column sums (T) = 97.372 T = 9.868
First Centroid Factor A =(3.662)2 + (3.263)2 + (3.392 )2 + (3.385)2 + (3.324)2 + (3.666)2 + (3.587)2 + (3.605)2
This method (ML) seeks to extrapolate what is known from Rs
In the best possible way to estimate Rp( the PC method only
Maximizes the variance explained in Rs)
Thus the ML method is a statistical approach in which one
maximizes some relationship between the sample of data and
the population from which the sample was drawn
Rs =Stands for Correlation matrix actually obtained from the data in a sample
Rp= Stands for Correlation matrix actually obtained from the data in a
population
Through discriminate analysis technique researcher may classify
individuals or objects into one of two or more mutually
exhaustive groups on the basis of a set of independent variables
Eg Brand Preference (Say Brand X & Brand Y ) is the
dependent variable and its relationship to an individual’s
income,age,education etc
Multiple discriminate analysis
The term path analysis was first introduced by the biologist
‘Snewall Wright’ in 1934 in connection with Decomposing
the total correlation between any two variables
This technique of path analysis is based on a serious of
multiple regression analysis with the added assumption of
casual relationship between independent and dependent
variables
Path analysis a simple set of equations can be built up
showing how each variable depends on preceding variables
Cluster analysis consists of methods of
classifying variables into clusters
Technically a cluster consists of variables
that correlate highly with one another and
have comparatively low correlations with the
variables in other clusters