1. What is bigger? Population or sample?
Q2
Population is bigger than sample.
2. Categorical values are called as Qualitative and Numerical value are called as Quantitative variable
3. 50 percentile is also known as
a. Q2
b. A second quartile.
C. =Quartile.inc(data,2)
d. A data point such that 50% of all data are above this
value. 4.
For each of the charts listed below, indicate how many variables from your
sample data (i.e., how many columns from your Excel spreadsheet) you need
to construct this particular chart.
Question Correct Match
Relative frequency histogram D.
1
Scatterplot E.
2
Bubble chart A.
3
Time series plot E.
2
Pie chart D.
1
5. Left-skewed data means that, compared to the main bulk of
the data, there are very few data points that are low in
magnitude.
Right-skewed data means that, compared to the main bulk of
the data, there are very few data points that are high in
magnitude
7.
The charts listed below are useful to plot the
relationship between which types of variables
and how many?
Question Correct Match
Bubble
3
quantitativ
Scatter
plot 2
quantitativ
Stacked
histogra 1 categorical
and 1 numerical
Stacked
box plot 1 categorical
and 1 numerical
Stack bar chart 1 categorical
and 1
numerical
Stacked 2
Pire chart
categorical.
8. Categorical variables are classified into Nominal and
ordinal. 9.
To create a bar graph in Excel, we do the following:
Highlight the sample data column that we want to chart
On the top, we go to INSERT
Click on the column/bar chart icon
Selected Answer:
True
Answers: True
False
Response Feedback: ◆_:¨ No, we first need to create a summary table.
10 . Relative Frequency histogram is used to Illustrate the distribution of a quantitative variable. To
create a relative frequency histogram, You need 1 data from variables on the vertical axis of a relative
frequency histogram we have %/proportion/relative frequency/fraction of observation
11. Time series plot can be created only for a quantitative variable.to create a time series plot, you need
data from 2 variables.
12 For left skewed data, nearly always mean < median < mode
13. Quantitative variable can be turned into qualitative variables using the technique called
binning. Qualitative variables can be be turned into quantitative variables, but only for ordinal
qualitative variables.
14. Categorical variables can be turned into numerical variables but only when such categorical
variables are ordinal varaiable.
15. Pivot chart can be a time series plot.
True
16. Pivot chart can be a pie chart.
True
17. Independence implies no linear association
TRUE
18. Zero correlation implies independence.
False
19 Independence implies zero correlation.
TRUE
20. No linear association implies independence.
False
21. In regression analysis, interaction is…
a way to capture how the effect of an explanatory variable on the dependent
variable varies depending on the values of another explanatory variable.
22.To model this type of relationship between X and Y, an appropriate regression equation is:
Predicted Y = a + b1 * X + b2 * X2
23. the two most popular graphical method to illustrate the distribution of a sample from a categorical
variables are Bar and PIE chart
24. For absolute z score if value is between 0 to +2 then it is not an outlier
If z score is >2 but <=3 then it is possible outlier
If it is >3 then it is possible outlier
1. Strongly agree/Agree/Neutral/Disagree/Strongly Disagree is an example of
Ordinal Categorical
2. Housing prices is an example of what kind of data?
Numerical Continuous
3. Dummy Variable is:
Dichotomous
variable 0/1
variable
Categorical variable that takes two values
4. Mean>Median:
Right Skewed
5. Median>Mean:
Left Skewed
6. Sample mean is to outliers
sensitive
7. Sample Median is to outliers
Not sensitive
8. Which command in excel compute this
Average()
Average.S
()
9. Which command computes 75% of data
Percentile.INC(data,0.75)
Quartile(data,3)
10. COUNTIF does what?
Count the number of data points that match 1 condition.
11. COUNTIFS does what?
COUNT the number of data points that match 1,2,3 conditions
12. To create a pie chart what we need to do?
Make the summary table then insert-> pie chart
13. Scatter plot illustrates the relationship between
2 quantitative variable
14. A common method to detect Outlier formally is:
Compute the Z-score and see if |Z| >3
See if any data point are outside UF and LF fence
15. Outlier is the data point that is -unusual
16. Sample std dev is to outliers
Sensitive
17. Z-score =-4.56
The data point is outlier.
18. Pivot tables can be used for categorical data and quantitative variables
True
It depends
19. Pivot chart can be only be bar graph
False
20. Pick up the method that will allow you to filter the data
21. What is excel filter tool used for?
Filter data, detect data entry, sort
22. Which excel command is used to merge two
data set? Vlookup, XLookup
23. Data Validation tool can be used in
excel to Detect data entry
Make sure that data is in correct format.
24. Correct way to use Date command in
excel Date(YYYY,MM,DD)
25. Independence means zero
correlation True
26. Correlation=0 means
independence False
27. When correlation= 0 that means X and Y are not related to
each other. False
28. When correlation =0 that means there is no linear relationship
between X and Y True
Right or left Skewed
# Positive and negative skewed
# Types of Qualitative and Quantitative
# Uni and bimodal
# Time series and cross sectional data
29 .Regression value of following graph
Correlation between X and Y =0
30. The correlation between X and Y
—>0.4
31.
The correlation between X and Y
is 0.6 32.
The correlation between X and Y is 0.3
33.
The correlation between the X and Y is 0
34.
The correlation between X and Y is -0.4
35. r= -0.85 what is the correct interpretation
The linear relationship between X and Y is negative but strong.
36. Which of the following is not true about
correlation. Correlation implies causation
37. Cheese consumption is positively correlated with # death of being tangled in bedsheet.
It is spurious relationship
38. Spurious relationship occurs when
2 variables are wrongly assumed to be related.
39. Dependent variable = Sales($000), predicted = Advertising($00). Yhat=
1.02+2.73x A regression equation
A predictive model
40. Dependent variable = sales($000). Predictor =
advertising($000) Yhat = 1.02+2.73 x
Interpret the intercept
When the advertising expense = $0then the sales is predicted to be 1020
41. Interpret the slope for 100$
Sales are predicted to go up by $2730
42. Yhat= 1.02+2.73 Y(sales)($00), X=advertsing($00). We invest $800 on
advertising. Projected Sales= $22860
43. Yhat= 1.02+2.73X. For this Rsquare =0.96
This is good linear predicted model.
We can use advertising expenditure to predict sales well.
44. Y= beer sales($000), outdoor temperature(F). What is
yhat= 15+10x When temperature is 1F, the sales is predicted
to be $15000
45. Y= beer sales($000), outdoor temperature(F). What is yhat=
15+10ln(x) When temperature increase by 1%, sales go up by
$100
46. A multiple linear regression there are several linear regression
equation False
47. In multiple linear regression, we evaluate the predictive power of linear model
by looking at R square adjusted and r square
48. In multiple regression R-square always
> R square adjusted
49. In multiple linear regression , when we add new variable the R-sqaure
always goes up? True
50. In multiple linear regression , when we add new variable the R-sqaure
adjusted always goes up?
False
51. Yhat= a+b*Dummy, where dummy is :0=Female and
1 = Male Intercept = ybar(female)
52. Yhat= a+b*Dummy, where dummy is :0=Female and 1 = Male what is the
value of dummy coefficient b?
ybar(male)
53. Dummy shows the difference in slope, holding all else
constant False
54. Dummy shows the difference in slope, holding all else
constant True
55. Interaction variable captures the difference in , holding all else
constant. Slopes
56. The relationship between X and Y shown in scatterplot is
Quadratic, Second degree polynomial
57. X variable, India, China, USA, Russia, Korea. How many
dummies 4,3,2,1
58. Y=quarterly sales
X-variables =Price,Country of origin. You want to see sales per
quarter. Create 4 quarterly dummies and include any 3 in
regression
59. Predicted sales($000) = 12+3*Mon-2*Tue+4*Wed+6*Thurs+10 Fri-5*
Sat+10.75*Temp On friday, we predict the sales to go up by 4k
On tue, we predict the sales to be 2k lower on Sunday
60. Which of the following cannot be modeled using Logistic
regression Y= starting salary
Y= ln(starting salary)
61. Mark anyon example where logistic regression can be appropriate
62. Y(1=Like,0=Dislike) is linearly dependent on explanatory variable
False
63. Logit is linearly dependent on explanatory
variables. True
64. We can tell X increases The probability of
Y=1 when Coefficient of X is positive
Types of Relationship
DADM cheat sheet
Tuesday, March 7, 2023
1:08 PM
Slope and intercept
1. Dependent is Y and other is X
For Slope =Slope(Known y column, Known X column
For intercept =Intercept(Known y column, Known X column
2. Y = MX+ C
For interpreting intercept consider X = 0 , So the C is the intercept value for example
When a borrower has zero years of education, FICO score is predicted to be 631.17.
For intercepting slope consider x increasing so the M is the value of slope for example
For every additional year of education, FICO score is predicted to decrease by 0.5153.
MATCH AND INDEX
1. Use Index to find the data in column or
row 1.
Use an appropriate Excel command to extract the value located in cell C15.
= INDEX ( C:C , 15 , 1 )
2. Use Index to find the location of data it located in row and column
1. Use an appropriate Excel command to figure out which row number contains Item No. 9966.
= MATCH ( 9966 , C:C , 0 )
+ 4.64 * CUSTOMER_Occasional
+ 2.81 * CUSTOMER_Frequent
− 0.70 * Num. Lipsticks
She is currently in a relationship,
She is a frequent customer of Sephora,
She has purchased 6 lipsticks from Sephora in the past three years.
Solution Y= -0.59+2.05*1+4.64*0+2.81*1-0.70*6=0.07
Logit = EXP(Y)/(1+EXP(y))
Time series and other types
# 1 entry for each year time series
# multiple entry for each year is timeseries cross sectional data
Right and left Skewed
Types of Qualitative and Quantitative
1.X can take values 1, 2, and 3. Respective probabilities are: 0.5, 0.3, and 0.1.
Impossible because probabilities do not add upto 10
2. X can take values 1, 2, and 3. Respective probabilities are 0.455, 0.311, and 0.234.
This X variable is discrete
3. You are a risk-averse investor. (You avoid risk.) You prefer to pick two stocks that are...
negatively correlated
4. You are a risk-averse investor. You invest in 2 stocks. Your portfolio risk will ...
go up if the two stocks are positively correlated, go down if the two stocks are negatively
correlated
5. You are a risk-averse investor. You invest in 2 stocks. Your portfolio expected return will ...
not change even if the stocks are correlated
6. X = quarterly sales of pizza. Y = annual sales of pizza. Express Y in terms of X.
Y=X+X+X+X
7. Normal distribution is continuous.
True
8. X follows Normal distribution with mu=10 and sigma=2.
P(X ≥ 15) is equal to P(X > 15)
9. X follows Normal distribution with mu=10 and sigma=2. MEDIAN = __10_____
10. X follows Normal distribution with mu=10 and sigma=2. (Hint: Use Empirical Rule)
P(X > 14) ≈ 0.025
11. Parameter = a characteristic of the entire population; parameter is a constant. Variable
= a characteristic that changes from one sample to the next.
12. Population mean is a .....................
Parameter
13. Sample mean is a .....................
Variable
14. In Central Limit Theorem, sample size is considered large when n ≥ 30.
True
15. The distribution of XBAR is ________ for larger samples (large n).
Narrower
16. Prob(MU-1 < XBAR < MU+1) is ________ for larger samples.
Higher
17. The probability that XBAR is within 10 points of MU depends on the value of MU.
FALSE
18. Confidence interval that we learnt today is an interval for ____________ (xbar / mu).
population mean
19. We estimate population mean ________ (more / less) accurately if we use data from a
larger sample.
More
20. Confidence interval for MU is wider if it's based on a larger sample size. (true / false)
FALSE
Look at the formula for the margin of error: Zα/2×δ/√n . Sample size n is in the denominator,
so, if n is smaller then the margin of error is higher. So, the confidence interval is wider if it's
based on a smaller sample.
21. Confidence interval for MU is wider if confidence level is higher. (true / false)
TRUE
formula. Higher conf. level ⇨ higher Z. (Recall from lecture: you are 99.999% confident that
C.I. for MU is wider if confidence level is higher." Conf. level determines the Zα/2 value in the
the first person you see will be 4 to 9 feet tall.)
22. Confidence interval for MU is: [$3,000 to $5,000]. Margin of error = ______________ .
$1,000
23. Confidence interval for MU is: [$3,000 to $5,000]. Sample mean (XBAR) =
______________ .
$4,000
Recall: XBAR is the point estimate of MU and lies exactly in the center of the interval. So, it's
4,000.
24. We can estimate the population mean (μ) more accurately if... (pick any one that is
correct)
we have collected a large sample of data, the confidence interval is narrow
25. 99% C.I. for mu is: [ 3 , 10 ]. Interpret this interval.
I'm 99% confident that population mean (mu) is between 3 and 10., The probability that
population mean (mu) is between 3 and 10 is 0.99
26. The confidence interval in the previous question was an interval for ...
population mean
Confidence interval is ALWAYS for population something (e.g., population mean, difference
in population means).
27. The interval [-13.5 , 28.7] is the confidence interval for... (2 correct answers)
Population mean, Difference between population means
28. 95% confidence interval for mu1-mu2 is: [0.53, 4.79]. What's the conclusion?
Population mean is higher for group 1 than for group 2
29. 95% confidence interval for mu1-mu2 is: [-1.79, -0.05]. What's the conclusion?
Population mean is higher for group 2 than for group 1
30. 95% confidence interval for mu1-mu2 is: [-0.05, 10.84]
. What's the conclusion?
Inconclusive
31. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
collect larger samples
32. 95% confid. interval for mu1-mu2 is: [-0.05, 10.84]. What can we do to REVERSE the
conclusion?
decrease confidence level
33. We have 2 INDEPENDENT samples. n1=10, n2=15. What degrees of freedom should we
use to construct C.I. for μ1-μ2? (number)
23
34. We have 2 MATCHED samples. n1=10, n2=10. What degrees of freedom should we use
to construct C.I. for μ1-μ2? (number)
9
We have 2 matched samples. Each pair of data is an observation. We have a total of 10
observations (pairs that can't be broken). d.f.= n-1 = 10-1 = 9. If you're not convinced,
review how we solved the Sales Presentations problem and took differences.
35. Which of these hypotheses is the RESEARCH HYPOTHESIS (i.e., the hypothesis that we
want to test)?
Alternative hypothesis
36. "Alpha" is a probability that is typically...
very low
37. When p-value < alpha, ...
We reject the null hypothesis
38. When p-value < alpha, there is
sufficient evidence to support alternative hypothesis
39.In Excel, to compute p-value, which command do we use?
norm.dist, t.dist
40. p-value for a regression coefficient is associated with a _________ test.
Two-tailed
41. p-values in a regression show
how well each individual X variable predicts Y linearly
42. In regression output, when p-value is close to 0, we say that this explanatory variable
this variable's coefficient is statistically significant, is a good linear predictor of the
dependent variable
43. In regression output, when p-value is high, we say that this explanatory variable
is not statistically significant, is a poor linear predictor of the dependent variable
44. STEPWISE REGRESSION. We add X2 to our regression. p-value of the coefficient is
0.0269. α=5%.
Keep this variable X2
45. BACKWARD ELIMINATION. A variable X5 is in our regression. p-value of the coefficient is
0.3274. α=5%.
Drop this variable X5
46. To create a cubic trend model, the explanatory variables in time-series regression must
include:
Trend t, t2, t3
47. Pick ALL MODELS that have a non-linear trend. (TO GET FULL CREDIT, YOU NEED TO
CLICK ON ALL CORRECT ANSWERS.)
Cubic trend model, Quadratic trend model, Exponential trend model
48. How do we capture seasonal effects in time-series regression models?
include dummies
49. To forecast this monthly sales data < picture >, regression should include ________ .
11 dummies
50. To forecast quarterly Amazon sales, regression should include ____________ .
3 dummies
51. To forecast this quarterly Amazon sales, regression should include _____________
trend and 3 dummies
52. To capture the evolution of this stock price data, we need to include ________ trend.
53. For the HOUSE CONSTRUCTION regression model from today's lecture, the interaction
variable was:
trend * before 2005/after 2005 dummy
54. Autoregressive model AR(5) means that the model includes __________ lags. (type your
numerical answer)
5