09 - Building A Robust Geodemographic Segmentation Model

The document discusses the concept of multicollinearity in regression models, highlighting its causes, effects, and methods for detection. Multicollinearity arises when independent variables are correlated, making it difficult to assess their individual impacts on the dependent variable. It can lead to inflated standard errors and unreliable coefficient estimates, which can be detected using tools like the Variable Inflation Factor (VIF) and correlation matrices.

Uploaded by

rizqi ardiansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views65 pages

09 - Building A Robust Geodemographic Segmentation Model

Uploaded by

rizqi ardiansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Building a Robust

Geodemographic Segmentation
Model
Transforming Independent Variables
• Transformations are used to present data on a different scale. The
nature of a transformation determines how the scale of the
untransformed variable will be affected.
• In modeling and statistical applications, transformations are often
used to improve the compatibility of the data with assumptions
underlying a modeling process, to linearize the relation between two
variables whose relationship is non-linear, or to modify the range of
values of a variable.
What is Multicollinearity?
• Multicollinearity occurs when independent variables in a regression
model are correlated.
• This correlation is a problem because independent variables should
be independent. If the degree of correlation between variables is
high enough, it can cause problems when you fit the model and
interpret the results.
• This means that an independent variable can be predicted from
another independent variable in a regression model.
The Problem with having Multicollinearity
• Multicollinearity can be a problem in a regression model because we
would not be able to distinguish between the individual effects of the
independent variables on the dependent variable.
• For example, let’s assume that in the following linear equation:
Y = W0+W1*X1+W2*X2
• Coefficient W1 is the increase in Y for a unit increase in X1 while
keeping X2 constant. But since X1 and X2 are highly correlated,
changes in X1 would also cause changes in X2 and we would not be
able to see their individual effect on Y.
What causes Multicollinearity?
• Multicollinearity could exist because of the problems in the dataset at
the time of creation. These problems could be because of poorly
designed experiments, highly observational data, or the inability to
manipulate the data:
• For example, determining the electricity consumption of a household from
the household income and the number of electrical appliances. Here, we
know that the number of electrical appliances in a household will increase
with household income. However, this cannot be removed from the dataset
• Multicollinearity could also occur when new variables are created
which are dependent on other variables:
• For example, creating a variable for BMI from the height and weight variables
would include redundant information in the model
What causes Multicollinearity?
• Including identical variables in the dataset:
• For example, including variables for temperature in Fahrenheit and
temperature in Celsius
• Inaccurate use of dummy variables can also cause a multicollinearity
problem. This is called the Dummy variable trap:
• For example, in a dataset containing the status of marriage variable with two
unique values: ‘married’, ’single’. Creating dummy variables for both of them
would include redundant information. We can make do with only one
variable containing 0/1 for ‘married’/’single’ status.
• Insufficient data in some cases can also cause multicollinearity
problems
Detecting Multicollinearity using VIF
(Variable Inflation Factors)
• VIF determines the strength of the correlation between the
independent variables. It is predicted by taking a variable and
regressing it against every other variable.
• VIF score of an independent variable represents how well the variable
is explained by other independent variables.
• R^2 value is determined to find out how well an independent variable
is described by the other independent variables. A high value of R^2
means that the variable is highly correlated with the other variables.
This is captured by the VIF which is denoted below:
• VIF starts at 1 and has no upper limit
• VIF = 1, no correlation between the independent variable and the
other variables
• VIF exceeding 5 or 10 indicates high multicollinearity between this
independent variable and the others
Multicollinearity
• What does it mean?
• A high degree of correlation amongst the explanatory variables
• What are its consequences?
• It may be difficult to separate out the effects of the individual regressors.
Standard errors may be overestimated and t-values depressed.
• Note: a symptom may be high R2 but low t-values
• How can you detect the problem?
• Examine the correlation matrix of regressors - also carry out auxiliary
regressions amongst the regressors.
What is a Correlation Matrix?
• A correlation matrix is a table showing correlation coefficients
between variables. Each cell in the table shows the correlation
between two variables.
• A correlation matrix is used to summarize data, as an input into a
more advanced analysis, and as a diagnostic for advanced analyses.
What Problems Do Multicollinearity Cause?
• The coefficient estimates can swing wildly based on which other
independent variables are in the model. The coefficients become very
sensitive to small changes in the model.
• Multicollinearity reduces the precision of the estimate coefficients,
which weakens the statistical power of your regression model. You
might not be able to trust the p-values to identify independent
variables that are statistically significant.

Multicollinearity: Causes, Detection, Fixes
No ratings yet
Multicollinearity: Causes, Detection, Fixes
1 page
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
100% (1)
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
22 pages
Multicollinearity in Regression Models
No ratings yet
Multicollinearity in Regression Models
23 pages
Session On Multicollinearity
No ratings yet
Session On Multicollinearity
11 pages
Multicollinearity 074432
No ratings yet
Multicollinearity 074432
21 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
3 pages
Multicollinearity
No ratings yet
Multicollinearity
5 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
21 pages
Multicollinearity
No ratings yet
Multicollinearity
18 pages
Multicollinearity
No ratings yet
Multicollinearity
13 pages
Multicollinearity in Regression Analysis
100% (3)
Multicollinearity in Regression Analysis
7 pages
Statisticians' Guide to Multicollinearity
100% (5)
Statisticians' Guide to Multicollinearity
14 pages
Finalize Outline of Time Series and Panel Deta
No ratings yet
Finalize Outline of Time Series and Panel Deta
4 pages
C4 English
No ratings yet
C4 English
27 pages
Topic 7 Regression Diagnostic I Analysis Multicollinearity
No ratings yet
Topic 7 Regression Diagnostic I Analysis Multicollinearity
28 pages
Trapti Chap4
No ratings yet
Trapti Chap4
8 pages
Understanding Multicollinearity
No ratings yet
Understanding Multicollinearity
14 pages
Regression Analysis Challenges
No ratings yet
Regression Analysis Challenges
24 pages
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
No ratings yet
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
20 pages
Multicollinearity
No ratings yet
Multicollinearity
7 pages
MULTICOLINEARITY
No ratings yet
MULTICOLINEARITY
15 pages
CHAPTER 4 - Violations of Assumptions
No ratings yet
CHAPTER 4 - Violations of Assumptions
96 pages
6 Multicolinearity
No ratings yet
6 Multicolinearity
6 pages
11 .What Is Mul
No ratings yet
11 .What Is Mul
8 pages
Multicollinearity & Regression Issues
No ratings yet
Multicollinearity & Regression Issues
54 pages
Mulicolinearity
No ratings yet
Mulicolinearity
18 pages
Multicollinearity in Regression Analysis PDF
No ratings yet
Multicollinearity in Regression Analysis PDF
73 pages
Econometrics - Chapter 4 Lecture Notes
No ratings yet
Econometrics - Chapter 4 Lecture Notes
44 pages
04 Violation of Assumptions All
No ratings yet
04 Violation of Assumptions All
24 pages
Trapti Chap2
No ratings yet
Trapti Chap2
3 pages
CH 10
No ratings yet
CH 10
9 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Understanding Multicollinearity in CLRM
No ratings yet
Understanding Multicollinearity in CLRM
35 pages
Multicollinearity in Regression Model
No ratings yet
Multicollinearity in Regression Model
9 pages
Econometrics: CLRM Assumptions Guide
No ratings yet
Econometrics: CLRM Assumptions Guide
13 pages
Data Problems: Multicollinearity and Inadequate Variation
No ratings yet
Data Problems: Multicollinearity and Inadequate Variation
4 pages
Understanding Multicollinearity in Econometrics
No ratings yet
Understanding Multicollinearity in Econometrics
8 pages
Linear Regression 1
No ratings yet
Linear Regression 1
14 pages
LEC11
No ratings yet
LEC11
21 pages
Understanding Multicollinearity
No ratings yet
Understanding Multicollinearity
28 pages
Consequences of Multicollinearity
100% (2)
Consequences of Multicollinearity
2 pages
Relaxing Assumptions of Linear Regression-Multicollinearity
No ratings yet
Relaxing Assumptions of Linear Regression-Multicollinearity
12 pages
Multicollinearity
No ratings yet
Multicollinearity
26 pages
Multicollinearity in Regression Models
No ratings yet
Multicollinearity in Regression Models
20 pages
Econometrics Presentation
No ratings yet
Econometrics Presentation
31 pages
MULTICOLLINEALITY
No ratings yet
MULTICOLLINEALITY
20 pages
Understanding Multicollinearity in Regression Analysis
No ratings yet
Understanding Multicollinearity in Regression Analysis
13 pages
Econ 321.6
No ratings yet
Econ 321.6
20 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Missing Value 11
No ratings yet
Missing Value 11
14 pages
Chapter Four Violations of The Assumptions of Classical Model
No ratings yet
Chapter Four Violations of The Assumptions of Classical Model
151 pages
CH 4 Violations of OLS
No ratings yet
CH 4 Violations of OLS
30 pages
Multicollinearity in Regression
No ratings yet
Multicollinearity in Regression
25 pages
Lecture 4 - Multicolinearity
No ratings yet
Lecture 4 - Multicolinearity
24 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
Oracle Certified Professional, MySQL 8.0 Database Administrator (1Z0-908)
No ratings yet
Oracle Certified Professional, MySQL 8.0 Database Administrator (1Z0-908)
3 pages
05 - Understanding Generative AI
No ratings yet
05 - Understanding Generative AI
150 pages
Java SE 21 Developer Professional (1Z0-830)
No ratings yet
Java SE 21 Developer Professional (1Z0-830)
4 pages
MySQL 8.0 Database Developer (1Z0-909)
No ratings yet
MySQL 8.0 Database Developer (1Z0-909)
3 pages
08 - AI Job Positions
No ratings yet
08 - AI Job Positions
54 pages
The C# Workshop
No ratings yet
The C# Workshop
2 pages
Java SE 17 Developer (1Z0-829)
100% (1)
Java SE 17 Developer (1Z0-829)
4 pages
Chapter 14 - Highlight Data With Reference Lines
No ratings yet
Chapter 14 - Highlight Data With Reference Lines
11 pages
Java SE 11 Developer (1Z0-819)
No ratings yet
Java SE 11 Developer (1Z0-819)
3 pages
Chapter 18 - Comparing Measures Against A Goal
No ratings yet
Chapter 18 - Comparing Measures Against A Goal
14 pages
GCP Fundamentals
No ratings yet
GCP Fundamentals
5 pages
Chapter 19 - Defining Subsets of Your Data
No ratings yet
Chapter 19 - Defining Subsets of Your Data
12 pages
Chapter 20 - Viewing Distribution
No ratings yet
Chapter 20 - Viewing Distribution
36 pages
Build Modern API & Micro Services
No ratings yet
Build Modern API & Micro Services
2 pages
Chapter 13 - Advanced Table Calculation
No ratings yet
Chapter 13 - Advanced Table Calculation
8 pages
Chapter 10 - Data Extracts
No ratings yet
Chapter 10 - Data Extracts
16 pages
Chapter 17 - Showing Breakdowns of The Whole
No ratings yet
Chapter 17 - Showing Breakdowns of The Whole
13 pages
Chapter 16 - Mapping Data Geographically
No ratings yet
Chapter 16 - Mapping Data Geographically
9 pages
Chapter 12 - Analyzing Data With Quick Table Calculations
No ratings yet
Chapter 12 - Analyzing Data With Quick Table Calculations
11 pages
Go Language
No ratings yet
Go Language
3 pages
Data Science With R v.1
No ratings yet
Data Science With R v.1
4 pages
Data Analyst - Data Engineer
No ratings yet
Data Analyst - Data Engineer
7 pages
Chapter 7 - Showing The Relationship Between Numerical Values
No ratings yet
Chapter 7 - Showing The Relationship Between Numerical Values
7 pages
Using AI To Answer Internal FAQs & Build Knowledge Bots
No ratings yet
Using AI To Answer Internal FAQs & Build Knowledge Bots
3 pages
Securing Java Web Application
No ratings yet
Securing Java Web Application
3 pages
Building Data Streaming Applications With Apache Kafka
No ratings yet
Building Data Streaming Applications With Apache Kafka
4 pages
Programming With Java Standard Edition
No ratings yet
Programming With Java Standard Edition
2 pages
Chapter 4 Organizing Your Data
No ratings yet
Chapter 4 Organizing Your Data
31 pages
09 - Storing and Reading Data On Disk
No ratings yet
09 - Storing and Reading Data On Disk
19 pages
Secure Programming Course Outline Java Net PHP
No ratings yet
Secure Programming Course Outline Java Net PHP
11 pages
Satterthwaite 1941
No ratings yet
Satterthwaite 1941
8 pages
Dunn 1992 Review Papers Design and Analysis of Reliability Studies
No ratings yet
Dunn 1992 Review Papers Design and Analysis of Reliability Studies
35 pages
Central Limit Theorem
100% (2)
Central Limit Theorem
41 pages
Design of Experiment Project Report
No ratings yet
Design of Experiment Project Report
10 pages
Partially Observed Markov Decision Processes From Filtering To Controlled Sensing Vikram Krishnamurthy Instant Download
No ratings yet
Partially Observed Markov Decision Processes From Filtering To Controlled Sensing Vikram Krishnamurthy Instant Download
82 pages
Probability & Stats for IT Students
No ratings yet
Probability & Stats for IT Students
81 pages
S1 C2 Test
No ratings yet
S1 C2 Test
3 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
Diabetic Retinopathy Risk Modeling
No ratings yet
Diabetic Retinopathy Risk Modeling
24 pages
Probabilistic Reasoning in AI
No ratings yet
Probabilistic Reasoning in AI
29 pages
Opmc001 Merged
No ratings yet
Opmc001 Merged
78 pages
Haukoos, J. S., & Lewis, R. J. Advanced Statistics - Bootstrapping Confidence Intervals For Statistics With Difficult Distributions.
No ratings yet
Haukoos, J. S., & Lewis, R. J. Advanced Statistics - Bootstrapping Confidence Intervals For Statistics With Difficult Distributions.
7 pages
15 Profile Analysis
No ratings yet
15 Profile Analysis
41 pages
ND Review 2014
No ratings yet
ND Review 2014
35 pages
Analytical Chemistry MCQ Part 2 of 10
No ratings yet
Analytical Chemistry MCQ Part 2 of 10
11 pages
A Ybx: Scatter Diagram Correlation Coefficient
No ratings yet
A Ybx: Scatter Diagram Correlation Coefficient
7 pages
Continuous Time Markov Chains Guide
No ratings yet
Continuous Time Markov Chains Guide
37 pages
Assignment 4
No ratings yet
Assignment 4
1 page
Physic 212 Measurement of Resistance
No ratings yet
Physic 212 Measurement of Resistance
5 pages
5.3. Statistical Analysis of Results of Biological Assays and Tests
No ratings yet
5.3. Statistical Analysis of Results of Biological Assays and Tests
30 pages
BB Total Exam
No ratings yet
BB Total Exam
46 pages
ME345 Professor John M. Cimbala: The True Temperature of The Ice Bath Is 0.0000 C
No ratings yet
ME345 Professor John M. Cimbala: The True Temperature of The Ice Bath Is 0.0000 C
4 pages
Statistics and Probability Least Learned Competencies
No ratings yet
Statistics and Probability Least Learned Competencies
1 page
Q3 - Las Stats and Proba
No ratings yet
Q3 - Las Stats and Proba
5 pages
Review of Basic Statistics
No ratings yet
Review of Basic Statistics
104 pages
Markowitz PortfolioSelection 1952
No ratings yet
Markowitz PortfolioSelection 1952
16 pages
Final Week 12 Quiz
No ratings yet
Final Week 12 Quiz
5 pages
Quiz 5: Sampling Distributions Quiz 5: Sampling Distributions
No ratings yet
Quiz 5: Sampling Distributions Quiz 5: Sampling Distributions
1 page
IGNOU PG Diploma in Applied Statistics Assignment Booklet 2020
No ratings yet
IGNOU PG Diploma in Applied Statistics Assignment Booklet 2020
30 pages
Graduate Salary Comparison 2014
No ratings yet
Graduate Salary Comparison 2014
7 pages

09 - Building A Robust Geodemographic Segmentation Model

Uploaded by

09 - Building A Robust Geodemographic Segmentation Model

Uploaded by

Building a Robust

You might also like