Acknowledgments
We would like to express our deepest gratitude to Mr. Nguyen Phuc Son, Lecturer of the
Department of Data Analysis in Economics, for his enthusiastic guidance and support
throughout the process of conducting this data analysis report. As economics students
with limited experience and knowledge in data analysis, we were able to approach and
complete this report to the best of our ability, thanks to his dedicated mentorship and
insightful feedback.
This report has provided us with an opportunity to explore data analysis in a more
multidimensional way, expanding our understanding. Although the results and arguments
presented here may contain subjective interpretations or inaccuracies, we hope you will
recognize the effort and dedication our group has put into working with the data and
compiling this report.
Finally, we sincerely thank you once again for providing us with the essential data and
detailed instructions. These valuable resources have offered us a practical perspective,
allowing us to apply analytical models in meaningful ways. As this is our first time
conducting such a report, we understand there may be mistakes, and we appreciate your
constructive feedback to help us build a solid foundation for future reports.
Abstract
This study analyzes the factors affecting Adidas' Operating Profit in the US market by
applying multiple regression models. The main objective is to identify the key factors that
impact profitability to assist in making strategic decisions. The study uses simple linear
regression to evaluate the individual impact of each variable, then applies a multiple
regression model to examine the combined impact of factors on Operating Profit. Data
will be filtered to focus on the three most profitable retailers, analyzing their performance
in detail to make appropriate strategic recommendations. The results indicate which
factors such as product pricing, volume of units sold, and regions or sales methods have
the greatest impact on Operating Profit. Based on these results, the study will provide
strategic recommendations to optimize pricing, marketing efforts, and regional or
method-specific sales strategies, thereby improving operating profits for Adidas’ retail in
the US.
I. Introduction
1.1. Research objectives
The main objective of this research is to analyze the Adidas sale Datasets of three
retailers which have the highest revenue among six given retailers in the US between
2020 and 2021. Moreover, the research aims to investigate some factors that influence
Operating Profit of those three retailers and identify which variable has the strongest
impact. Hence, we will give some recommendations to enable the company to make
strategic decisions such as adjusting prices, focusing on high-profit regions, or optimizing
sales methods to boost overall revenue and profits.
1.2. Research questions
a. Are there any differences in Operating Profits of 3 chosen retailers?
b. Does the Sales method, Region and the number of shoes that 3 chosen retailers have a
significant impact on their Operating Profit?
c. Does the effect of selling price on Operating Profit differ across regions? Because
some regions may be more price-sensitive, and this difference may be a factor in
adjusting pricing strategies for each specific region.
d. Do different approaches to selling lead to varying profit outcomes for each product
category? Because some products are more profitable when being sold online, while
others are more profitable when being sold offline.
e. Which are the key factors that significantly impact on profitability?
f. Predict High Value in Operating Profit based on Sales Method and Units
Sold. Calculate odds ratio
1.3. Significance of research
This study will provide the understanding of the use of Stata for running different
scenarios to simulate potential impacts on operating profit as well as how to interpret the
output from Stata to make informed decisions based on the analysis; therefore, translate
into actionable recommendations for business.
II. Data analysis and Results
1) Analyze given data
To achieve the research objective, we filtered the available data and selected the
appropriate variables including three Retailers: Foot Locker, Sports Direct and West
Gear; along with Region, Product, Units Sold, Total Sales, Operating Profit and Sales
Method. We choose these three Retailers because after using Excel to calculate the
total of Operating Profit for each product across all Retailers. We found that they have
the highest Operating Profit, so we believe it will get easier to make recommendations
for Adidas by focusing on potential factors, thereby increasing profitability.
Foot Sports West Grand
Product/ Area Amazon Locker Kohl's Direct Walmart Gear Total
Men's Apparel 3,331,444 9,942,405 5,945,043 8,723,915 3,166,591 13,653,633 44,763,030
Men's Athletic Footwear 4,518,030 12,409,221 5,725,763 11,935,673 4,029,258 13,228,944 51,846,888
Men's Street Footwear 8,707,658 23,060,809 9,219,820 15,837,750 5,438,666 20,537,557 82,802,261
Women's Apparel 6,280,072 17,192,901 5,596,173 17,832,961 6,348,451 15,400,413 68,650,971
Women's Athletic
Footwear 2,701,608 8,477,314 4,570,693 9,688,746 3,239,052 10,298,372 38,975,785
Women's Street Footwear 3,279,692 9,639,474 5,753,761 10,313,910 3,560,034 12,548,955 45,095,827
Grand Total 28,818,503 80,722,125 36,811,253 74,332,955 25,782,053 85,667,873 332,134,761
In addition, we created an additional column, High Profit, based on the Operating Profits
column. If the data value is greater than the mean of Operating Profits, we consider it as
High Profit and assign a value of 1. Conversely, if the data value is less than the mean of
Operating Profits, we assign a value of 0.
And this is out final data: Adidas data
a. Are there any differences in Operating Profits of 3 chosen retailers?
. import excel "C:\Users\DELL\Downloads\Adidas-US-Sales-Datasets-1.xlsx", sheet("Lọc") firstrow
(14 vars, 7,043 obs)
oneway OperatingProfit Retailer, tabulate
| Summary of Operating Profit
Retailer | Mean Std. dev. Freq.
-------------+------------------------------------
Foot Locker | 30611.348 51194.485 2,637
Sports Dir.. | 36581.179 58018.483 2,032
West Gear | 36085.877 56359.735 2,374
-------------+------------------------------------
Total | 34179.036 55044.884 7,043
Analysis of variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 5.3922e+10 2 2.6961e+10 8.92 0.0001
Within groups 2.1283e+13 7040 3.0231e+09
------------------------------------------------------------------------
Total 2.1337e+13 7042 3.0299e+09
Bartlett's equal-variances test: chi2(2) = 40.7412 Prob>chi2 = 0.000
Ho: 1 = 2 = 3
H: Not all means are equal
where: 1= mean of Operating Profits in Foot Locker
2= mean of Operating Profits in Sport Direct
3=mean of Operating Profits in West Gear
p-value= 0,0001 <0,05
We reject Ho
We can conclude that the mean Operating Profits is not the same in 3 chosen retailers.
Next step, we will check the assumptions for Analysis of Variance
histogram OperatingProfit, discrete
As it can be seen, the response variable (Operating Profit) is not normally distributed
Therefore, we will use Krusal Wallis test instead without the assumption of nomally
distributed assumptions
kwallis OperatingProfit, by(Retailer)
Kruskal–Wallis equality-of-populations rank test
+----------------------------------+
| Retailer | Obs | Rank sum |
|---------------+-------+----------|
| Foot Locker | 2,637 | 8.84e+06 |
| Sports Direct | 2,032 | 7.34e+06 |
| West Gear | 2,374 | 8.63e+06 |
+----------------------------------+
chi2(2) = 29.504
Prob = 0.0001
chi2(2) with ties = 29.504
Prob = 0.0001
Ho: All populations are identical
H: Not all populations are identical
p_value= 0,0001 < 0,05
We reject Ho
We can conclude that the Operating Profits is not the same in 3 chosen retailers
b. Does the Sales method, Region and the number of shoes that 3 chosen retailers have a
significant impact on their Operating Profit?
. describe
Contains data
Observations: 7,043
Variables: 14
-------------------------------------------------------------------------------------------------
-------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------
-------------------------------------
Retailer str13 %13s Retailer
RetailerID long %10.0g Retailer ID
InvoiceDate int %td.. Invoice Date
Region str9 %9s Region
State str14 %14s State
City str14 %14s City
Product str25 %25s Product
PriceperUnit double %14.2f Price per Unit
UnitsSold int %10.0gc Units Sold
TotalSales double %10.0g Total Sales
OperatingProfit double %10.0g Operating Profit
OperatingMargin double %4.2f Operating Margin
SalesMethod str8 %9s Sales Method
HighProfit byte %10.0g High Profit
-------------------------------------------------------------------------------------------------
-------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
Retailer | 0
RetailerID | 7,043 1171788 27355.06 1128299 1197831
InvoiceDate | 7,043 22403.05 175.3224 21915 22645
Region | 0
State | 0
-------------+---------------------------------------------------------
City | 0
Product | 0
PriceperUnit | 7,043 44.65327 15.00465 7 110
UnitsSold | 7,043 253.7656 215.3128 0 1275
TotalSales | 7,043 91655.55 142157.9 0 825000
-------------+---------------------------------------------------------
OperatingP~t | 7,043 34179.04 55044.88 0 382500
OperatingM~n | 7,043 .4256851 .0982308 .1 .8
SalesMethod | 0
HighProfit | 7,043 .302144 .4592199 0 1
We will convert 2 categorical variables that are Sales Method and Region into dummy
variables
. tabulate SalesMethod, generate (SalesMethod)
Sales |
Method | Freq. Percent Cum.
------------+-----------------------------------
In-store | 1,441 20.46 20.46
Online | 3,485 49.48 69.94
Outlet | 2,117 30.06 100.00
------------+-----------------------------------
Total | 7,043 100.00
. tabulate Region, generate (Region)
Region | Freq. Percent Cum.
------------+-----------------------------------
Midwest | 1,492 21.18 21.18
Northeast | 1,599 22.70 43.89
South | 1,284 18.23 62.12
Southeast | 940 13.35 75.47
West | 1,728 24.53 100.00
------------+-----------------------------------
Total | 7,043 100.00
. rename SalesMethod1 Instore
. rename SalesMethod2 Online
. rename SalesMethod3 Outlet
. rename Region1 Midwest
. rename Region2 Northeast
. rename Region3 South
. rename Region4 Southeast
. rename Region5 West
. regress OperatingProfit UnitsSold
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 26537.10
Model | 1.6863e+13 1 1.6863e+13 Prob > F = 0.0000
Residual | 4.4741e+12 7,041 635438912 R-squared = 0.7903
-------------+---------------------------------- Adj R-squared = 0.7903
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 25208
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
UnitsSold | 227.272 1.395144 162.90 0.000 224.5371 230.0069
_cons | -23494.77 464.2917 -50.60 0.000 -24404.92 -22584.62
------------------------------------------------------------------------------
Simple linear equation: y= 227272UnitsSold -23494,77
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 26537.10 and the
Prob > F = 0.0000. Because the p-value < 0.05.
We reject Ho and conclude that there is a significant relationship between
UnitsSold and OperatingProfit.
The R-square = 79,03% and Adj R-square = 79,03%, these results close to 1 so it
provides a good fit for data.
. predict residual, residuals
. ttest residual ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residual | 7,043 -4.86e-06 300.3497 25206.12 -588.7758 588.7758
------------------------------------------------------------------------------
mean = mean(residual) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean = -4.86e-06 nearly equal to
0 and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is
suitable because it ensures that the residuals have a mean of 0.
. sktest residual
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual | 7,043 0.88172 433.948 16.099 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho.
Residuals do not follow normal distribution.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 7650.05
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous.
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1816.11 2 0.0000
Skewness | 376.41 1 0.0000
Kurtosis |-3152147.49 1 1.0000
---------------------+----------------------------
Total |-3149954.97 4 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
We cannot conclude that Units Sold has significant impact on Operating Profits.
. regress OperatingProfit Instore
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 971.46
Model | 2.5870e+12 1 2.5870e+12 Prob > F = 0.0000
Residual | 1.8750e+13 7,041 2.6630e+09 R-squared = 0.1212
-------------+---------------------------------- Adj R-squared = 0.1211
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 51604
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Instore | 47508.42 1524.254 31.17 0.000 44520.42 50496.42
_cons | 24458.8 689.4621 35.48 0.000 23107.25 25810.35
------------------------------------------------------------------------------
Linear regression equation: 47508.42In store + 24458.8
Hypothesis Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 971.46 and the Prob
> F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between UnitsSold and In store .
The R-square = 12.12% and Adj R-square = 12.11%, these results do not provides
a good fit for data.
. predict residual1, residuals
. ttest residual1 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~1 | 7,043 .0001841 614.8543 51600.16 -1205.299 1205.3
------------------------------------------------------------------------------
mean = mean(residual1) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean = 0.0001841 nearly equal
to 0 and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is
suitable because it ensures that the residuals have a mean of 0.
. sktest residual1
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual1 | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual1
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual1 | 7,043 0.65223 1275.929 18.958 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho.
Residuals do not follow normal distribution.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 45.42
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous.
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 9.11 1 0.0025
Skewness | 247.92 1 0.0000
Kurtosis | -4.79e+07 1 1.0000
---------------------+----------------------------
Total | -4.79e+07 3 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
We cannot conclude a significant impact of In store on Operating Profits
. regress OperatingProfit Online
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 530.40
Model | 1.4947e+12 1 1.4947e+12 Prob > F = 0.0000
Residual | 1.9842e+13 7,041 2.8181e+09 R-squared = 0.0701
-------------+---------------------------------- Adj R-squared = 0.0699
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 53086
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Online | -29137.56 1265.177 -23.03 0.000 -31617.68 -26657.43
_cons | 48596.81 889.967 54.61 0.000 46852.21 50341.41
------------------------------------------------------------------------------
Simple linear regression equation: -29137.56Online + 48596.81
Hypothesi Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 530.4 and the Prob
> F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between UnitsSold and OperatingProfit.
The R-square = 7,01% and Adj R-square = 7,01%. These results does not provide
good fit data
. predict residual2, residuals
. ttest residual2 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~2 | 7,043 9.51e-06 632.5097 53081.85 -1239.909 1239.909
------------------------------------------------------------------------------
mean = mean(residual2) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean 9.51e-06 nearly equal to 0
and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is suitable
because it ensures that the residuals have a mean of 0.
. sktest residual2
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual2 | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual2
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual2 | 7,043 0.73269 980.737 18.261 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho. Residuals
do not follow normal distribution
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 270.07
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 61.35 1 0.0000
Skewness | 255.98 1 0.0000
Kurtosis | -7.46e+07 1 1.0000
---------------------+----------------------------
Total | -7.46e+07 3 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
This linear regression model can’t be used to analyze and predict for existing data
because of it’s not align with those assumptions.
. regress OperatingProfit Outlet
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 2.22
Model | 6.7198e+09 1 6.7198e+09 Prob > F = 0.1364
Residual | 2.1330e+13 7,041 3.0294e+09 R-squared = 0.0003
-------------+---------------------------------- Adj R-squared = 0.0002
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 55040
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Outlet | -2130.344 1430.377 -1.49 0.136 -4934.314 673.6262
_cons | 34819.38 784.2097 44.40 0.000 33282.09 36356.67
------------------------------------------------------------------------------
Simple linear regression: -2130.344Outlet + 34819.38
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 2.22 and the Prob >
F = 0.1364 Because the p-value > 0.05. We cannot reject Ho, there is no
significant relationship between Outlet and OperatingProfit.
The R-square = 00.03% and Adj R-square = 00.02%. This results do not provide
good fit data
. predict residual3, residuals
. ttest residual3 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~3 | 7,043 .0000917 655.7974 55036.22 -1285.56 1285.56
------------------------------------------------------------------------------
mean = mean(residual3) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest residual3
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual3 | 7,043 0.0000 0.0000 . .
. swilk residual3
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual3 | 7,043 0.66544 1227.444 18.856 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 4.30
Prob > chi2 = 0.0381
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1.07 1 0.3019
Skewness | 293.50 1 0.0000
Kurtosis | -2.60e+07 1 1.0000
---------------------+----------------------------
Total | -2.60e+07 3 1.0000
--------------------------------------------------
This linear regression model can’t be used to analyze and predict for existing data
because of violations of Heteroskedasticity.
. regress OperatingProfit Midwest
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 52.06
Model | 1.5661e+11 1 1.5661e+11 Prob > F = 0.0000
Residual | 2.1180e+13 7,041 3.0081e+09 R-squared = 0.0073
-------------+---------------------------------- Adj R-squared = 0.0072
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 54846
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Midwest | -11540.24 1599.399 -7.22 0.000 -14675.55 -8404.937
_cons | 36623.74 736.1435 49.75 0.000 35180.68 38066.8
------------------------------------------------------------------------------
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 52.06 and the Prob
> F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between Mid West and OperatingProfit.
The R-square = 00.73% and Adj R-square = 00.72%. This results does not provide
good fit data
. predict residual4, residuals
. ttest residual4 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~4 | 7,043 -.0000306 653.4892 54842.5 -1281.036 1281.035
------------------------------------------------------------------------------
mean = mean(residual4) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean = -0,0000306 nearly equal
to 0 and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is
suitable because it ensures that the residuals have a mean of 0.
. sktest residual4
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual4 | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual4
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual4 | 7,043 0.69111 1133.287 18.644 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho.
Residuals do not follow normal distribution.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 291.43
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous.
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 73.82 1 0.0000
Skewness | 322.62 1 0.0000
Kurtosis | -4.11e+07 1 1.0000
---------------------+----------------------------
Total | -4.11e+07 3 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
We can not conclude there is significant relationship between Mid West and
Operating Profits
. regress OperatingProfit Northeast
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 25.84
Model | 7.8028e+10 1 7.8028e+10 Prob > F = 0.0000
Residual | 2.1259e+13 7,041 3.0193e+09 R-squared = 0.0037
-------------+---------------------------------- Adj R-squared = 0.0035
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 54948
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Northeast | -7945.479 1562.96 -5.08 0.000 -11009.35 -4881.608
_cons | 35982.93 744.7203 48.32 0.000 34523.05 37442.81
------------------------------------------------------------------------------
Simple linear equation: -7945.479Northeast + 35982.93
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 25.84 and the Prob
> F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between UnitsSold and OperatingProfit.
The R-square = 00.37% and Adj R-square = 00.35%. These results do not provide
good fit data
. predict residual5, residuals
. ttest residual5 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~5 | 7,043 -.0002708 654.7003 54944.14 -1283.41 1283.409
------------------------------------------------------------------------------
mean = mean(residual5) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean = -0.0002708 nearly equal
to 0 and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is
suitable because it ensures that the residuals have a mean of 0.
. sktest residual5
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual5 | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual5
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual5 | 7,043 0.67976 1174.904 18.740 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho.
Residuals do not follow normal distribution.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 47.57
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous.
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 11.79 1 0.0006
Skewness | 286.32 1 0.0000
Kurtosis | -4.24e+07 1 1.0000
---------------------+----------------------------
Total | -4.24e+07 3 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
We cannot conclude there is significant impact of North East on Operating Profits
. regress OperatingProfit South
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 2.34
Model | 7.0836e+09 1 7.0836e+09 Prob > F = 0.1263
Residual | 2.1330e+13 7,041 3.0294e+09 R-squared = 0.0003
-------------+---------------------------------- Adj R-squared = 0.0002
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 55040
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
South | 2597.458 1698.629 1.53 0.126 -732.3659 5927.282
_cons | 33705.5 725.2741 46.47 0.000 32283.74 35127.25
------------------------------------------------------------------------------
Linear regression equation: 2597.458South + 33705.5
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 2.34 and the Prob >
F = 0.1263 Because the p-value > 0.05. We cannot reject Ho, there is not any
relationship between South and OperatingProfit.
The R-square = 00.03% and Adj R-square = 00.02%. These results do not provide
good fit data
. predict residual6, residuals
. ttest residual6 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~6 | 7,043 -.0001517 655.7918 55035.75 -1285.55 1285.549
------------------------------------------------------------------------------
mean = mean(residual6) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest residual6
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual6 | 7,043 0.0000 0.0000 . .
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 45.65
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 11.36 1 0.0008
Skewness | 297.45 1 0.0000
Kurtosis | -2.58e+07 1 1.0000
---------------------+----------------------------
Total | -2.58e+07 3 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
South | 1.00 1.000000
-------------+----------------------
Mean VIF | 1.00
This linear regression model can be used to analyze and predict for existing data.
. regress OperatingProfit Southeast
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 129.03
Model | 3.8398e+11 1 3.8398e+11 Prob > F = 0.0000
Residual | 2.0953e+13 7,041 2.9758e+09 R-squared = 0.0180
-------------+---------------------------------- Adj R-squared = 0.0179
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 54551
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Southeast | 21712 1911.382 11.36 0.000 17965.11 25458.88
_cons | 31281.23 698.2849 44.80 0.000 29912.38 32650.07
------------------------------------------------------------------------------
Simple linear equation: 21712Southeast + 31281.23
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 129.03 and the Prob
> F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between South East and OperatingProfit.
The R-square = 1.8% and Adj R-square = 1.79%. These results do not provide a
good fit data.
. predict residual7, residuals
. ttest residual7 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~7 | 7,043 .0000305 649.9721 54547.33 -1274.141 1274.141
------------------------------------------------------------------------------
mean = mean(residual7) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
Ho: Mean = 0
Hα: Mean not equal to 0
Based on the information above, we can see that Mean = 0.0000305 nearly equal
to 0 and Pr(|T| > |t|) = 1.0000. We cannot reject Ho. Linear regression model is
suitable because it ensures that the residuals have a mean of 0.
. sktest residual7
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual7 | 7,043 0.0000 0.0000 . .
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that Pr(skewness), Pr(kurtosis) have
p-value < 0.05 and p-value of Joint test < 0.05. We reject Ho. Residuals do not
follow normal distribution.
. swilk residual7
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual7 | 7,043 0.71740 1036.828 18.408 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
Ho: Residuals follow normal distribution
Hα: Residuals do not follow normal distribution
Based on the information above, we can see that p-value < 0.05. We reject Ho.
Residuals do not follow normal distribution.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 422.98
Prob > chi2 = 0.0000
Based on the information above, we can see that p-value < 0.05. We reject Ho. The
variance is not homogeneous.
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 109.01 1 0.0000
Skewness | 327.17 1 0.0000
Kurtosis | -2.15e+07 1 1.0000
---------------------+----------------------------
Total | -2.15e+07 3 1.0000
--------------------------------------------------
Based on the information above, we can see that p-value (Heteroskedasticity) <
0.05. The variance is not homogeneous.
We cannot conclude there is a significant impact of South East on Operating
Profits
. regress OperatingProfit West
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(1, 7041) = 2.25
Model | 6.8030e+09 1 6.8030e+09 Prob > F = 0.1340
Residual | 2.1330e+13 7,041 3.0294e+09 R-squared = 0.0003
-------------+---------------------------------- Adj R-squared = 0.0002
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 55040
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
West | 2284.057 1524.172 1.50 0.134 -703.7785 5271.892
_cons | 33618.64 754.9652 44.53 0.000 32138.68 35098.6
------------------------------------------------------------------------------
Simple linear regression: 2284.057West + 33618.64
Ho: β1 = 0
Hα: β1 not equal to 0
Based on the information above, we can see that F (1, 7041) = 2.25 and the Prob >
F = 0.0000. Because the p-value < 0.05. We reject Ho, there is a significant
relationship between West and OperatingProfit.
The R-square = 00.03% and Adj R-square = 00.02%. These results do not provide
good fit data
. predict residual8, residuals
. ttest residual8 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~8 | 7,043 .000035 655.7962 55036.11 -1285.558 1285.558
------------------------------------------------------------------------------
mean = mean(residual8) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest residual8
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual8 | 7,043 0.0000 0.0000 . .
. swilk residual8
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual8 | 7,043 0.66628 1224.385 18.849 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 3.88
Prob > chi2 = 0.0488
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 0.96 1 0.3272
Skewness | 299.39 1 0.0000
Kurtosis | -2.63e+07 1 1.0000
---------------------+----------------------------
Total | -2.63e+07 3 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
West | 1.00 1.000000
-------------+----------------------
Mean VIF | 1.00
CONCLUSION: As it can be seen, 3 variables which are Outlet, South and West do not
have s significant impact on Operating Profits. We will eliminate them into the multiple
linear regression in order to avoid adding noise to the model, increasinging its complexity
and the risk of overfitting.
b)
. regress OperatingProfit UnitsSold Instore Online Midwest Northeast South Southeast
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 3984.57
Model | 1.7039e+13 7 2.4342e+12 Prob > F = 0.0000
Residual | 4.2977e+12 7,035 610898703 R-squared = 0.7986
-------------+---------------------------------- Adj R-squared = 0.7984
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24716
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
UnitsSold | 228.9376 1.557794 146.96 0.000 225.8839 231.9914
Instore | 8039.491 923.506 8.71 0.000 6229.141 9849.841
Online | 3626.105 710.689 5.10 0.000 2232.941 5019.27
Midwest | 6998.846 897.8194 7.80 0.000 5238.85 8758.843
Northeast | 6305.189 871.6857 7.23 0.000 4596.422 8013.956
South | 142.4665 925.8822 0.15 0.878 -1672.542 1957.475
Southeast | -2396.44 1021.206 -2.35 0.019 -4398.31 -394.569
_cons | -29976.86 852.9794 -35.14 0.000 -31648.96 -28304.77
------------------------------------------------------------------------------
. predict residual9, residuals
. ttest residual9 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
residu~9 | 7,043 4.60e-06 294.3675 24704.08 -577.0488 577.0488
------------------------------------------------------------------------------
mean = mean(residual9) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest residual9
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual9 | 7,043 0.0000 0.0000 . .
. swilk residual9
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual9 | 7,043 0.85947 515.587 16.556 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8237.45
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1994.69 22 0.0000
Skewness | 426.65 7 0.0000
Kurtosis |-4100332.57 1 1.0000
---------------------+----------------------------
Total |-4097911.23 30 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Instore | 1.60 0.624943
Midwest | 1.55 0.644479
Northeast | 1.54 0.650490
South | 1.47 0.678740
Online | 1.46 0.687004
Southeast | 1.39 0.719166
UnitsSold | 1.30 0.771105
-------------+----------------------
Mean VIF | 1.47
Although this linear regression aligns all of the assumptions, p-value(t-test) of
South is higher than 0.05. So South region is not significant in this linear
regression. We recommend to move it out.
. regress OperatingProfit UnitsSold Instore Online Midwest Northeast Southeast
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(6, 7036) = 4649.30
Model | 1.7039e+13 6 2.8399e+12 Prob > F = 0.0000
Residual | 4.2977e+12 7,036 610813934 R-squared = 0.7986
-------------+---------------------------------- Adj R-squared = 0.7984
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24715
------------------------------------------------------------------------------
OperatingP~t | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
UnitsSold | 228.9529 1.554529 147.28 0.000 225.9055 232.0002
Instore | 8013.929 908.3785 8.82 0.000 6233.234 9794.625
Online | 3618.228 708.7934 5.10 0.000 2228.779 5007.677
Midwest | 6944.887 826.444 8.40 0.000 5324.808 8564.966
Northeast | 6249.552 793.1002 7.88 0.000 4694.837 7804.267
Southeast | -2455.721 945.6773 -2.60 0.009 -4309.533 -601.9087
_cons | -29913.66 747.509 -40.02 0.000 -31379 -28448.32
------------------------------------------------------------------------------
. predict residual10, residuals
. ttest residual10 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
resid~10 | 7,043 2.07e-06 294.368 24704.12 -577.0498 577.0498
------------------------------------------------------------------------------
mean = mean(residual10) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest residual10
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
residual10 | 7,043 0.0000 0.0000 . .
. swilk residual10
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
residual10 | 7,043 0.85940 515.846 16.558 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8239.34
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1904.24 18 0.0000
Skewness | 411.84 6 0.0000
Kurtosis |-4178219.02 1 1.0000
---------------------+----------------------------
Total |-4175902.94 25 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Instore | 1.55 0.645841
Online | 1.45 0.690587
Midwest | 1.31 0.760501
UnitsSold | 1.29 0.774240
Northeast | 1.27 0.785677
Southeast | 1.19 0.838512
-------------+----------------------
Mean VIF | 1.34
This linear regression model can be used to analyze and predict for existing data.
2 Sales method (In-store and Online), Region (Midwest, Northeast, Southeast) and
the number of shoes that 3 chosen retailers have a significant impact on their
Operating Profit.
c. Does the effect of selling price on Operating Profit differ across regions? Because
some regions may be more price-sensitive, and this difference may be a factor in
adjusting pricing strategies for each specific region.
. gen Price_Region_Interaction1 = PriceperUnit*Northeast
. gen Price_Region_Interaction2 = PriceperUnit* Midwest
. gen Price_Region_Interaction3 = PriceperUnit* Southeast
. regress OperatingProfit UnitsSold Instore Online Midwest Northeast Southeast
Price_Region_Interaction1
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4025.96
Model | 1.7075e+13 7 2.4392e+12 Prob > F = 0.0000
Residual | 4.2623e+12 7,035 605872362 R-squared = 0.8002
-------------+---------------------------------- Adj R-squared = 0.8000
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24614
-------------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------------------+----------------------------------------------------------------
UnitsSold | 227.3143 1.563009 145.43 0.000 224.2503 230.3782
Instore | 7690.098 905.6887 8.49 0.000 5914.676 9465.521
Online | 3126.803 708.8441 4.41 0.000 1737.255 4516.351
Midwest | 6916.19 823.1028 8.40 0.000 5302.661 8529.72
Northeast | -12993.68 2639.35 -4.92 0.000 -18167.6 -7819.755
Southeast | -2218.65 942.3551 -2.35 0.019 -4065.95 -371.3504
Price_Region_Interaction1 | 418.7923 54.80781 7.64 0.000 311.3525 526.2322
_cons | -29194.18 750.41 -38.90 0.000 -30665.21 -27723.15
-------------------------------------------------------------------------------------------
. predict re1, residuals
. ttest re1 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
re1 | 7,043 9.11e-06 293.154 24602.24 -574.67 574.67
------------------------------------------------------------------------------
mean = mean(re1) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest re1
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
re1 | 7,043 0.0000 0.0000 . .
. swilk re1
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
re1 | 7,043 0.86029 512.560 16.541 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8089.92
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1913.99 23 0.0000
Skewness | 417.23 7 0.0000
Kurtosis |-4806898.07 1 1.0000
---------------------+----------------------------
Total |-4804566.84 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Northeast | 14.21 0.070369
Price_Regi~1 | 13.85 0.072208
Instore | 1.55 0.644427
Online | 1.46 0.684902
UnitsSold | 1.32 0.759666
Midwest | 1.31 0.760485
Southeast | 1.19 0.837603
-------------+----------------------
Mean VIF | 4.99
This linear regression model can’t be used to analyze and predict for existing data
because the VIF is higher than 10. Multicollinearity will occur.
. regress OperatingProfit UnitsSold Instore Online Midwest Northeast Southeast
Price_Region_Interaction2
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4079.43
Model | 1.7119e+13 7 2.4456e+12 Prob > F = 0.0000
Residual | 4.2175e+12 7,035 599500965 R-squared = 0.8023
-------------+---------------------------------- Adj R-squared = 0.8021
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24485
-------------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------------------+----------------------------------------------------------------
UnitsSold | 227.6294 1.544311 147.40 0.000 224.6021 230.6568
Instore | 8549.143 901.1161 9.49 0.000 6782.684 10315.6
Online | 3795.679 702.3665 5.40 0.000 2418.829 5172.529
Midwest | -17923.9 2300.763 -7.79 0.000 -22434.09 -13413.71
Northeast | 6075.868 785.8648 7.73 0.000 4535.336 7616.4
Southeast | -2408.037 936.8879 -2.57 0.010 -4244.619 -571.4542
Price_Region_Interaction2 | 616.767 53.32557 11.57 0.000 512.2328 721.3012
_cons | -29693.35 740.7993 -40.08 0.000 -31145.54 -28241.16
-------------------------------------------------------------------------------------------
. predict re2, residuals
. ttest re2 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
re2 | 7,043 -.0000105 291.6085 24472.54 -571.6404 571.6403
------------------------------------------------------------------------------
mean = mean(re2) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest re2
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
re2 | 7,043 0.0000 0.0000 . .
. swilk re2
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
re2 | 7,043 0.85854 518.980 16.574 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8031.42
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1906.77 23 0.0000
Skewness | 428.80 7 0.0000
Kurtosis |-4635946.66 1 1.0000
---------------------+----------------------------
Total |-4633611.08 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Midwest | 10.38 0.096308
Price_Regi~2 | 9.91 0.100924
Instore | 1.55 0.644138
Online | 1.45 0.690258
UnitsSold | 1.30 0.769989
Northeast | 1.27 0.785390
Southeast | 1.19 0.838496
-------------+----------------------
Mean VIF | 3.87
This linear regression model can’t be used to analyze and predict for existing data
because the VIF is higher than 10. Multicollinearity will occur.
. regress OperatingProfit UnitsSold Instore Online Midwest Northeast Southeast
Price_Region_Interaction3
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4244.46
Model | 1.7252e+13 7 2.4646e+12 Prob > F = 0.0000
Residual | 4.0849e+12 7,035 580654037 R-squared = 0.8086
-------------+---------------------------------- Adj R-squared = 0.8084
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24097
-------------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------------------+----------------------------------------------------------------
UnitsSold | 226.7268 1.520119 149.15 0.000 223.7469 229.7067
Instore | 6797.627 887.9445 7.66 0.000 5056.988 8538.266
Online | 2697.756 692.7438 3.89 0.000 1339.77 4055.743
Midwest | 7073.14 805.8101 8.78 0.000 5493.509 8652.771
Northeast | 6253.562 773.2721 8.09 0.000 4737.716 7769.408
Southeast | -54031.66 2847.634 -18.97 0.000 -59613.88 -48449.44
Price_Region_Interaction3 | 1065.639 55.66699 19.14 0.000 956.5153 1174.763
_cons | -28726.67 731.4536 -39.27 0.000 -30160.54 -27292.8
-------------------------------------------------------------------------------------------
. predict re3, residuals
. ttest re3 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
re3 | 7,043 -.0000242 286.9881 24084.78 -562.5831 562.5831
------------------------------------------------------------------------------
mean = mean(re3) t = -0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest re3
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
re3 | 7,043 0.0000 0.0000 . .
. swilk re3
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
re3 | 7,043 0.87010 476.592 16.348 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 7332.46
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1996.72 23 0.0000
Skewness | 412.04 7 0.0000
Kurtosis |-3889444.66 1 1.0000
---------------------+----------------------------
Total |-3887035.90 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Southeast | 11.38 0.087909
Price_Regi~3 | 11.36 0.088037
Instore | 1.56 0.642534
Online | 1.46 0.687260
Midwest | 1.32 0.760448
UnitsSold | 1.30 0.769709
Northeast | 1.27 0.785677
-------------+----------------------
Mean VIF | 4.23
This linear regression model can’t be used to analyze and predict for existing data
because the VIF is higher than 10. Multicollinearity will occur.
d. Do different approaches to selling lead to varying profit outcomes for each product
category? Because some products are more profitable when being sold online, while
others are more profitable when being sold offline.
. tabulate Product, generate (Product)
Product | Freq. Percent Cum.
--------------------------+-----------------------------------
Men's Apparel | 1,165 16.54 16.54
Men's Athletic Footwear | 1,175 16.68 33.22
Men's Street Footwear | 1,178 16.73 49.95
Women's Apparel | 1,170 16.61 66.56
Women's Athletic Footwear | 1,175 16.68 83.25
Women's Street Footwear | 1,180 16.75 100.00
--------------------------+-----------------------------------
Total | 7,043 100.00
. rename Product1 MenApparel
. rename Product2 MenAthleticFootwear
. rename Product3 MenStreetFootwear
. rename Product4 WomenApparel
. rename Product5 WomenAthleticFootwear
. rename Product6 WomenStreetFootwear
. gen Product_SalesMethod_1 = MenApparel*Online
. regress OperatingProfit UnitsSold Online Instore Midwest Northeast Southeast
Product_SalesMethod_1
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4027.22
Model | 1.7076e+13 7 2.4394e+12 Prob > F = 0.0000
Residual | 4.2612e+12 7,035 605720515 R-squared = 0.8003
-------------+---------------------------------- Adj R-squared = 0.8001
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24611
---------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
UnitsSold | 229.9077 1.552921 148.05 0.000 226.8635 232.9519
Online | 2246.327 727.6548 3.09 0.002 819.9048 3672.75
Instore | 7862.856 904.7929 8.69 0.000 6089.19 9636.523
Midwest | 7043.176 823.0886 8.56 0.000 5429.674 8656.678
Northeast | 6353.395 789.9 8.04 0.000 4804.953 7901.837
Southeast | -2553.287 941.8102 -2.71 0.007 -4399.519 -707.0555
Product_SalesMethod_1 | 8733.656 1125.965 7.76 0.000 6526.424 10940.89
_cons | -30191.86 745.2494 -40.51 0.000 -31652.77 -28730.94
---------------------------------------------------------------------------------------
. predict rez1, residuals
. ttest rez1 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
rez1 | 7,043 5.74e-06 293.1172 24599.15 -574.598 574.598
------------------------------------------------------------------------------
mean = mean(rez1) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest rez1
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
rez1 | 7,043 0.0000 0.0000 . .
. swilk rez1
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
rez1 | 7,043 0.85612 527.884 16.619 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8373.46
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1919.92 23 0.0000
Skewness | 410.90 7 0.0000
Kurtosis |-4174239.43 1 1.0000
---------------------+----------------------------
Total |-4171908.60 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Instore | 1.55 0.645542
Online | 1.54 0.649786
Midwest | 1.32 0.760321
UnitsSold | 1.30 0.769375
Northeast | 1.27 0.785451
Southeast | 1.19 0.838363
Product_Sa~1 | 1.11 0.903347
-------------+----------------------
Mean VIF | 1.33
This linear regression model can be used to analyze and predict for existing data.
. gen Product_SalesMethod_2 = MenApparel*Instore
. regress OperatingProfit UnitsSold Online Instore Midwest Northeast Southeast
Product_SalesMethod_2
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4042.42
Model | 1.7088e+13 7 2.4412e+12 Prob > F = 0.0000
Residual | 4.2484e+12 7,035 603896562 R-squared = 0.8009
-------------+---------------------------------- Adj R-squared = 0.8007
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24574
---------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
UnitsSold | 230.5176 1.555378 148.21 0.000 227.4686 233.5666
Online | 3732.473 704.882 5.30 0.000 2350.691 5114.254
Instore | 5129.654 957.9995 5.35 0.000 3251.686 7007.622
Midwest | 7105.255 821.9428 8.64 0.000 5494 8716.511
Northeast | 6410.761 788.7984 8.13 0.000 4864.478 7957.043
Southeast | -2586.056 940.4179 -2.75 0.006 -4429.558 -742.5534
Product_SalesMethod_2 | 15767.2 1745.519 9.03 0.000 12345.46 19188.94
_cons | -30369.84 744.978 -40.77 0.000 -31830.22 -28909.46
---------------------------------------------------------------------------------------
. predict rez2, residuals
. ttest rez2 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
rez2 | 7,043 2.51e-06 292.6756 24562.09 -573.7322 573.7322
------------------------------------------------------------------------------
mean = mean(rez2) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest rez2
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
rez2 | 7,043 0.0000 0.0000 . .
. swilk rez2
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
rez2 | 7,043 0.86137 508.601 16.520 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8378.26
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1996.54 23 0.0000
Skewness | 414.09 7 0.0000
Kurtosis |-4490236.14 1 1.0000
---------------------+----------------------------
Total |-4487825.51 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Instore | 1.74 0.574093
Online | 1.45 0.690365
Midwest | 1.32 0.760146
UnitsSold | 1.31 0.764637
Northeast | 1.27 0.785275
Southeast | 1.19 0.838315
Product_Sa~2 | 1.17 0.851564
-------------+----------------------
Mean VIF | 1.35
This linear regression model can be used to analyze and predict for existing data.
. gen Product_SalesMethod_3 = MenStreetFootwear*Online
. regress OperatingProfit UnitsSold Online Instore Midwest Northeast Southeast
Product_SalesMethod_3
Source | SS df MS Number of obs = 7,043
-------------+---------------------------------- F(7, 7035) = 4082.22
Model | 1.7122e+13 7 2.4460e+12 Prob > F = 0.0000
Residual | 4.2152e+12 7,035 599171666 R-squared = 0.8024
-------------+---------------------------------- Adj R-squared = 0.8022
Total | 2.1337e+13 7,042 3.0299e+09 Root MSE = 24478
---------------------------------------------------------------------------------------
OperatingProfit | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
UnitsSold | 231.3752 1.553418 148.95 0.000 228.33 234.4203
Online | 5998.613 730.724 8.21 0.000 4566.174 7431.052
Instore | 7632.739 900.2661 8.48 0.000 5867.947 9397.532
Midwest | 7186.737 818.7894 8.78 0.000 5581.663 8791.811
Northeast | 6502.148 785.8003 8.27 0.000 4961.742 8042.553
Southeast | -2677.931 936.8129 -2.86 0.004 -4514.366 -841.4953
Product_SalesMethod_3 | -13153.84 1120.892 -11.74 0.000 -15351.12 -10956.55
_cons | -30618.31 742.7819 -41.22 0.000 -32074.38 -29162.23
---------------------------------------------------------------------------------------
. predict rez3, residuals
. ttest rez3 ==0
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
rez3 | 7,043 .0000164 291.5284 24465.81 -571.4833 571.4834
------------------------------------------------------------------------------
mean = mean(rez3) t = 0.0000
H0: mean = 0 Degrees of freedom = 7042
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
Pr(T < t) = 0.5000 Pr(|T| > |t|) = 1.0000 Pr(T > t) = 0.5000
. sktest rez3
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
-------------+-----------------------------------------------------------------
rez3 | 7,043 0.0000 0.0000 . .
. swilk rez3
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
rez3 | 7,043 0.85215 542.433 16.691 0.00000
Note: The normal approximation to the sampling distribution of W'
is valid for 4<=n<=2000.
. estat hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of OperatingProfit
H0: Constant variance
chi2(1) = 8640.94
Prob > chi2 = 0.0000
. imtest
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 1971.88 23 0.0000
Skewness | 398.59 7 0.0000
Kurtosis |-4012565.94 1 1.0000
---------------------+----------------------------
Total |-4010195.47 31 1.0000
--------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Online | 1.57 0.637373
Instore | 1.55 0.645000
Midwest | 1.32 0.760019
UnitsSold | 1.31 0.760569
Northeast | 1.27 0.785087
Southeast | 1.19 0.838170
Product_Sa~3 | 1.12 0.891826
-------------+----------------------
Mean VIF | 1.33
This linear regression model can be used to analyze and predict for existing data.
e. Which are the key factors that significantly impact on profitability?
f. Predict High Value in Operating Profit based on Sales Method and Units
Sold. Calculate odds ratio
III. Discussion and Recommendations
IV. Conclusion
The three most profitable retailers were identified as Sports Direct, West Gear, and Foot
Locker. Sports Direct led with the highest average profit of 36,581.18. West Gear
followed with an average profit of 36,085.88. Foot Locker came in third place with an
average profit of 30,611.35
The results of regression analysis show that Units Sold has a positive and significant
influence on Operating Profit. Each additional unit in sales volume increases profits,
suggesting that increasing sales can be an effective strategy for improving profits. Sales
Method also has a clear impact, in which In-store has higher profits than other methods.
In contrast, Online has a negative impact on profits, suggesting that online sales may
have higher costs or lower profit margins than in-store sales.
Besides, the Region factor also significantly affects profits. Regions such as the Midwest
and Northeast show a positive and statistically significant impact, indicating that these
are regions that are more profitable than other regions. Meanwhile, Southeast has a
negative impact, reflecting lower business performance and suggesting that a strategic
adjustment may be needed in this region.