[go: up one dir, main page]

0% found this document useful (0 votes)
101 views10 pages

BRM Unit 4 Extra

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

UNIT-4

Normality test:
The term Normality is a property of a random variable that is distributed
according to the normal distribution A normality test is used to determine whether
sample data has been drawn from a normally distributed population. Normality
tests are tests of whether a set of data is distributed in a way that is consistent with
a normal distribution.

Testing for normality is often a first step in analyzing your data. Many
statistical tools that you might use have normality as an underlying assumption. If
you fail that assumption, you may need to use a different statistical tool or
approach.

How to identify a curve is normally distributed:

In order to be considered a normal distribution,

 A data set (when graphed) must follow a bell-shaped symmetrical curve


centered around the mean.
 It must also adhere to the empirical rule that indicates the percentage of the data
set that falls within (plus or minus) 1, 2 and 3 standard deviations of the mean.

Normality of data and testing:


The standard normal distribution is the most important continuous
probability distribution has a bell-shaped density curve described by its mean and
SD and extreme values in the data set have no significant impact on the mean
value. If a continuous data is follow normal distribution then 68.2%, 95.4%, and
99.7% observations are lie between mean ± 1 SD, mean ± 2 SD, and mean ± 3 SD,
respectively.
Why to test the normality of data
Various statistical methods used for data analysis make assumptions about
normality, including correlation, regression, t-tests, and analysis of variance.
Central limit theorem states that when sample size has 100 or more observations,
violation of the normality is not a major issue. Although for meaningful
conclusions, assumption of the normality should be followed irrespective of the
sample size. If a continuous data follow normal distribution, then we present this
data in mean value.

Further, this mean value is used to compare between/among the groups to


calculate the significance level (P value). If our data are not normally distributed,
resultant mean is not a representative value of our data. A wrong selection of the
representative value of a data set and further calculated significance level using
this representative value might give wrong interpretation.

That is why, first we test the normality of the data, then we decide whether
mean is applicable as representative value of the data or not. If applicable, then
means are compared using parametric test otherwise medians are used to compare
the groups, using nonparametric methods.
Test of Uni-dimensionality:

Uni-dimensionality is used to describe a specific type of measurement


scale. A uni-dimensional measurement scale has only one (“uni”) dimension. In
other words, it can be represented by a single number line. Some examples of
simple, uni-dimensional scales:
 Height of people.
 Weight of cars.
 IQ.

Others can be forced into a unidimensional status by narrowing the idea


into a single, measurable construct. For example, self-worth is a psychological
concept that has many layers of complexity and can be different for different
situations (at home, at a party, at work, at your wedding). However, you can
narrow the concept by making a simple line that has “low self worth” on the left
and “high self worth” on the right.
What is Multicollinearity?
Multicollinearity occurs when two or more independent variables are highly
correlated with one another in a regression model.

This means that an independent variable can be predicted from another


independent variable in a regression model. For example, height and weight,
household income and water consumption, mileage and price of a car, study time
and leisure time, etc.

What causes Multicollinearity


Multicollinearity could occur due to the following problems:

 Multicollinearity could exist because of the problems in the dataset at the time of
creation. These problems could be because of poorly designed experiments, highly
observational data, or the inability to manipulate the data:

For example, determining the electricity consumption of a household from


the household income and the number of electrical appliances.Here, we know that
the number of electrical appliances in a household will increase with household
income. However, this cannot be removed from the dataset

 Multicollinearity could also occur when new variables are created which are
dependent on other variables:

For example, creating a variable for BMI from the height and weight
variables would include redundant information in the model

 Including identical variables in the dataset:

For example, including variables for temperature in Fahrenheit and


temperature in Celsius
 Inaccurate use of dummy variables can also cause a multicollinearity problem.
This is called the Dummy variable trap:

For example, in a dataset containing the status of marriage variable with two
unique values: ‘married’, ’single’. Creating dummy variables for both of them
would include redundant information. We can make do with only one variable
containing 0/1 for ‘married’/’single’ status.

 Insufficient data in some cases can also cause multicollinearity problem.

FACTOR ANALYSIS:

Factor analysis is a technique in mathematics that we use to reduce a larger


number into a smaller number. Factor analysis is a technique that is used to reduce a
large number of variables into fewer numbers of factors. This technique extracts
maximum common variance from all variables and puts them into a common
score.
Types of Factor Analysis

Confirmatory factor analysis (CFA) is a multivariate statistical procedure


that is used to test how well the measured variables represent the number of
constructs. Confirmatory factor analysis (CFA) and exploratory factor analysis
(EFA) are similar techniques, but in exploratory factor analysis (EFA), data is
simply explored and provides information about the numbers of factors required to
represent the data.

In exploratory factor analysis, all measured variables are related to every latent
variable. But in confirmatory factor analysis (CFA), researchers can specify the
number of factors required in the data and which measured variable is related to
which latent variable. Confirmatory factor analysis (CFA) is a tool that is used to
confirm or reject the Measurement theory.

1. Confirmatory Factor Analysis


 Confirmatory Factor Analysis (CFA) lets one determine whether a relationship
between factors or a set of observed variables and their underlying components
exists.
 It helps one confirm whether there is a connection between two components of
variables in a given dataset. Usually, the purpose of CFA is to test whether
certain data fit the requirements of a particular hypothesis.

 The process begins with a researcher formulating a hypothesis that is made to


fit along the lines of a certain theory. If the constraints imposed on a model do
not fit well with the data, then the model is rejected, and it is confirmed that no
relationship exists between a factor and its underlying construct

2. Exploratory Factor Analysis


 In the case of Exploratory Factor Statistical Analysis, the purpose is to
determine/explore the underlying latent structure of a large set of variables.
EFA, unlike CFA, tends to uncover the relationship, if any, between measured
variables of an entity (for example - height, weight, etc. in a human)
 Conducting Exploratory Factor Analysis involves figuring the total number of
factors involved in a dataset
 EFA is generally considered to be more of a theory-generating
procedure than a theory-testing procedure.

Structural Equation Modelling:


Structural equation modelling is a multivariate statistical analysis technique that is used
to analyse structural relationships. This technique is the combination of factor analysis and
multiple regression analysis, and it is used to analyse the structural relationship between
measured variables and latent constructs.

This method is preferred by the researcher because it estimates the multiple and
interrelated dependence in a single analysis. In this analysis, two types of variables are used
endogenous variables and exogenous variables. Endogenous variables are equivalent to
dependent variables and are equal to the independent variable
Theory: This can be thought of as a set of relationships providing consistency and
comprehensive explanations of the actual phenomena. There are two types of models:

Measurement model: The measurement model represents the theory that specifies how measured
variables come together to represent the theory.

Structural model: Represents the theory that shows how constructs are related to other
constructs. Structural equation modelling is also called casual modelling because it tests the
proposed casual relationships. The following assumptions are assumed:

Multivariate normal distribution: The maximum likelihood method is used and assumed for
multivariate normal distribution. Small changes in multivariate normality can lead to a large
difference in the chi-square test.

Linearity: A linear relationship is assumed between endogenous and exogenous variables.

Outlier: Data should be free of outliers. Outliers affect the model significance.

Sequence: There should be a cause and effect relationship between endogenous and exogenous
variables, and a cause has to occur before the event.

Non-spurious relationship: Observed covariance must be true.

Model identification: Equations must be greater than the estimated parameters or models should
be over identified or exact identified. Under identified models are not considered.

Sample size: Most of the researchers prefer a 200 to 400 sample size with 10 to 15 indicators.
As a rule of thumb, that is 10 to 20 times as many cases as variables.
Uncorrelated error terms: Error terms are assumed uncorrelated with other variable error
terms.

Data: Interval data is used.

CONJOINT ANALYSIS

Conjoint analysis is a form of statistical analysis that firms use in market research
to understand how customers value different components or features of their products or
services. It’s based on the principle that any product can be broken down into a set of
attributes that ultimately impact users’ perceived value of an item or service.

Conjoint analysis is typically conducted via a specialized survey that asks


consumers to rank the importance of the specific features in question. Analyzing the
results allows the firm to then assign a value to each one.

The insights a company gleans from conjoint analysis of its product features can
be leveraged in several ways. Most often, conjoint analysis impacts pricing strategy, sales
and marketing efforts, and research and development plans

Types of Conjoint Analysis

 Choice-Based Conjoint (CBC) Analysis: This is one of the most common forms of
conjoint analysis and is used to identify how a respondent values combinations of
features.
 Adaptive Conjoint Analysis (ACA): This form of analysis customizes each
respondent's survey experience based on their answers to early questions. It’s often
leveraged in studies where several features or attributes are being evaluated to
streamline the process and extract the most valuable insights from each respondent.
 Full-Profile Conjoint Analysis: This form of analysis presents the respondent with a
series of full product descriptions and asks them to select the one they’d be most
inclined to buy.
 MaxDiff Conjoint Analysis: This form of analysis presents multiple options to the
respondent, which they’re asked to organize on a scale of “best” to “worst” (or “most
likely to buy” to “least likely to buy”)
Application of Statistical Software for Data Analysis:

1) INTRODUCTION TO MICROSOFT EXCEL 2010


Microsoft Excel is an example of a program called a “spreadsheet.” Spreadsheets are
used to organize real world data, such as a check register or a rolodex. Data can be numerical or
alphanumeric (involving letters or numbers).

The key benefit to using a spreadsheet program is that you can make changes easily,
including correcting spelling or values, adding, deleting, formatting, and relocating data. You
can also program the spreadsheet to perform certain functions automatically (such as addition
and subtraction), and a spreadsheet can hold almost limitless amounts of data—a whole filing
cabinet’s worth of information can be included in a single spreadsheet.

2) SPSS
 SPSS means “Statistical Package for the Social Sciences” and was first launched in
1968. Since SPSS was acquired by IBM in 2009, it's officially known as IBM SPSS
Statistics but most users still just refer to it as “SPSS
 SPSS is software for editing and analyzing all sorts of data. These data may come from
basically any source: scientific research, a customer database, Google Analytics or even the
server log files of a website.

3) AMOS
 AMOS is statistical software and it stands for analysis of a moment structures. AMOS
is an added SPSS module, and is specially used for Structural Equation Modeling, path
analysis, and confirmatory factor analysis.
 It is also known as analysis of covariance or causal modeling software. AMOS is a
visual program for structural equation modeling (SEM). In AMOS, we can draw models
graphically using simple drawing tools. AMOS quickly performs the computations for
SEM and displays the results

4) R-analysis
 R analytics is data analytics using R programming language, an open-source language used for
statistical computing or graphics. This programming language is often used in statistical analysis
and data mining.
 It can be used for analytics to identify patterns and build practical models. R not only can help
analyze organizations’ data, but also be used to help in the creation and development of software
applications that perform statistical

You might also like