Quiz 2

CORRELATION & REGRESSION
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, select the best answer.
1. The correlation coefficient is used to determine:

a. A specific value of the y-variable given a specific value of the x-variable
b. A specific value of the x-variable given a specific value of the y-variable
c. The strength of the relationship between the x and y variables
d. None of these
2. If there is a very strong correlation between two variables then the correlation coefficient must be
a. any value larger than 1
b. much smaller than 0, if the correlation is negative
c. much larger than 0, regardless of whether the correlation is negative or positive
d. None of these alternatives is correct.
3. In regression, the equation that describes how the response variable (y) is related to the
explanatory variable (x) is:
a. the correlation model
b. the regression model
c. used to compute the correlation coefficient
4. The relationship between number of beers consumed (x) and blood alcohol content (y) was studied
in 16 male college students by using least squares regression. The following regression equation
was obtained from this study:
!= -0.0127 + 0.0180x
The above equation implies that:

a. each beer consumed increases blood alcohol by 1.27%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by an average of amount of 1.8%
d. each beer consumed increases blood alcohol by exactly 0.018
5. SSE can never be

a. larger than SST
b. smaller than SST
c. equal to 1
d. equal to zero
1
6. Regression modeling is a statistical framework for developing a mathematical equation that
describes how
a. one explanatory and one or more response variables are related
b. several explanatory and several response variables response are related
c. one response and one or more explanatory variables are related
d. All of these are correct.
7. In regression analysis, the variable that is being predicted is the

a. response, or dependent, variable
b. independent variable
c. intervening variable
d. is usually x
8. Regression analysis was applied to return rates of sparrowhawk colonies. Regression analysis was
used to study the relationship between return rate (x: % of birds that return to the colony in a given
year) and immigration rate (y: % of new adults that join the colony per year). The following
regression equation was obtained.
! = 31.9 – 0.34x
Based on the above estimated regression equation, if the return rate were to decrease by 10% the
rate of immigration to the colony would:
a. increase by 34%
b. increase by 3.4%
c. decrease by 0.34%
d. decrease by 3.4%
9. In least squares regression, which of the following is not a required assumption about the error
term ε?
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.
10. Larger values of r2 (R2) imply that the observations are more closely grouped about the
a. average value of the independent variables
b. average value of the dependent variable
c. least squares line
d. origin
11. In a regression analysis if r2 = 1, then

a. SSE must also be equal to one
b. SSE must be equal to zero
c. SSE can be any positive value
d. SSE must be negative
12. The coefficient of correlation
a. is the square of the coefficient of determination
b. is the square root of the coefficient of determination
c. is the same as r-square
d. can never be negative
13. In regression analysis, the variable that is used to explain the change in the outcome of an
experiment, or some natural process, is called
a. the x-variable
b. the independent variable
c. the predictor variable
d. the explanatory variable
e. all of the above (a-d) are correct
f. none are correct
14. In the case of an algebraic model for a straight line, if a value for the x variable is specified, then
a. the exact value of the response variable can be computed
b. the computed response to the independent value will always give a minimal residual
c. the computed value of y will always be the best estimate of the mean response
d. none of these alternatives is correct.
15. A regression analysis between sales (in $1000) and price (in dollars) resulted in the following
equation:
! = 50,000 - 8X
The above equation implies that an

a. increase of $1 in price is associated with a decrease of $8 in sales
b. increase of $8 in price is associated with an increase of $8,000 in sales
c. increase of $1 in price is associated with a decrease of $42,000 in sales
d. increase of $1 in price is associated with a decrease of $8000 in sales
16. In a regression and correlation analysis if r2 = 1, then

a. SSE = SST
b. SSE = 1
c. SSR = SSE
d. SSR = SST
17. If the coefficient of determination is a positive value, then the regression equation
a. must have a positive slope
b. must have a negative slope
c. could have either a positive or a negative slope
d. must have a positive y intercept
18. If two variables, x and y, have a very strong linear relationship, then
a. there is evidence that x causes a change in y
b. there is evidence that y causes a change in x
c. there might not be any causal relationship between x and y
19. If the coefficient of determination is equal to 1, then the correlation coefficient

a. must also be equal to 1
b. can be either -1 or +1
c. can be any value between -1 to +1
d. must be -1
20. In regression analysis, if the independent variable is measured in kilograms, the dependent
variable
a. must also be in kilograms
b. must be in some unit of weight
c. cannot be in kilograms
d. can be any units
21. The data are the same as for question 4 above. The relationship between number of beers
consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least
squares regression. The following regression equation was obtained from this study:
!= -0.0127 + 0.0180x
Suppose that the legal limit to drive is a blood alcohol content of 0.08. If Ricky consumed 5 beers
the model would predict that he would be:
a. 0.09 above the legal limit
b. 0.0027 below the legal limit
c. 0.0027 above the legal limit
d. 0.0733 above the legal limit
22. In a regression analysis if SSE = 200 and SSR = 300, then the coefficient of determination is
a. 0.6667
b. 0.6000
c. 0.4000
d. 1.5000
23. If the correlation coefficient is 0.8, the percentage of variation in the response variable explained
by the variation in the explanatory variable is
a. 0.80%
b. 80%
c. 0.64%
d. 64%
24. If the correlation coefficient is a positive value, then the slope of the regression line
a. must also be positive
b. can be either negative or positive
c. can be zero
d. can not be zero
25. If the coefficient of determination is 0.81, the correlation coefficient

a. is 0.6561
b. could be either + 0.9 or - 0.9
c. must be positive
d. must be negative
26. A fitted least squares regression line

a. may be used to predict a value of y if the corresponding x value is given
b. is evidence for a cause-effect relationship between x and y
c. can only be computed if a strong linear relationship exists between x and y
27. Regression analysis was applied between $ sales (y) and $ advertising (x) across all the branches
of a major international corporation. The following regression function was obtained.
! = 5000 + 7.25x
If the advertising budgets of two branches of the corporation differ by $30,000, then what will be
the predicted difference in their sales?
a. $217,500
b. $222,500
c. $5000
d. $7.25
28. Suppose the correlation coefficient between height (as measured in feet) versus weight (as
measured in pounds) is 0.40. What is the correlation coefficient of height measured in inches
versus weight measured in ounces? [12 inches = one foot; 16 ounces = one pound]
a. 0.40
b. 0.30
c. 0.533
d. cannot be determined from information given
e. none of these
29. Assume the same variables as in question 28 above; height is measured in feet and weight is
measured in pounds. Now, suppose that the units of both variables are converted to metric (meters
and kilograms). The impact on the slope is:
a. the sign of the slope will change
b. the magnitude of the slope will change
c. both a and b are correct
d. neither a nor b are correct
30. Suppose that you have carried out a regression analysis where the total variance in the response is
133452 and the correlation coefficient was 0.85. The residual sums of squares is:
a. 37032.92
b. 20017.8
c. 113434.2
d. 96419.07
e. 15%
f. 0.15
31. This question is related to questions 4 and 21 above. The relationship between number of beers
consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least
squares regression. The following regression equation was obtained from this study:
!= -0.0127 + 0.0180x
Another guy, his name Dudley, has the regression equation written on a scrap of paper in his
pocket. Dudley goes out drinking and has 4 beers. He calculates that he is under the legal limit
(0.08) so he decides to drive to another bar. Unfortunately Dudley gets pulled over and
confidently submits to a road-side blood alcohol test. He scores a blood alcohol of 0.085 and gets
himself arrested. Obviously, Dudley skipped the lecture about residual variation. Dudley’s
residual is:
a. +0.005
b. -0.005
c. +0.0257
d. -0.0257
32. You have carried out a regression analysis; but, after thinking about the relationship between
variables, you have decided you must swap the explanatory and the response variables. After
refitting the regression model to the data you expect that:
a. the value of the correlation coefficient will change
b. the value of SSE will change
c. the value of the coefficient of determination will change
d. the sign of the slope will change
e. nothing changes
33. Suppose you use regression to predict the height of a woman’s current boyfriend by using her own
height as the explanatory variable. Height was measured in feet from a sample of 100 women
undergraduates, and their boyfriends, at Dalhousie University. Now, suppose that the height of
both the women and the men are converted to centimeters. The impact of this conversion on the
slope is:
a. the sign of the slope will change
b. the magnitude of the slope will change
c. both a and b are correct
d. neither a nor b are correct
34. A residual plot:
a. displays residuals of the explanatory variable versus residuals of the response variable.
b. displays residuals of the explanatory variable versus the response variable.
c. displays explanatory variable versus residuals of the response variable.
d. displays the explanatory variable versus the response variable.
e. displays the explanatory variable on the x axis versus the response variable on the y axis.
35. When the error terms have a constant variance, a plot of the residuals versus the independent
variable x has a pattern that
a. fans out
b. funnels in
c. fans out, but then funnels in
d. forms a horizontal band pattern
e. forms a linear pattern that can be positive or negative
36. You studied the impact of the dose of a new drug treatment for high blood pressure. You think
that the drug might be more effective in people with very high blood pressure. Because you
expect a bigger change in those patients who start the treatment with high blood pressure, you use
regression to analyze the relationship between the initial blood pressure of a patient (x) and the
change in blood pressure after treatment with the new drug (y). If you find a very strong positive
association between these variables, then:
a. there is evidence that the higher the patients initial blood pressure, the bigger the impact
of the new drug.
b. there is evidence that the higher the patients initial blood pressure, the smaller the impact
of the new drug.
c. there is evidence for an association of some kind between the patients initial blood
pressure and the impact of the new drug on the patients blood pressure
d. none of these are correct, this is a case of regression fallacy
Question 37:
A variety of summary statistics were collected for a small sample (10) of bivariate data, where the
dependent variable was y and an independent variable was x.
ΣX = 90 Σ (Y − Y )(X − X) = 466
2
ΣY = 170 Σ (X − X ) = 234
2
n = 10 Σ (Y − Y ) = 1434
SSE = 505.98
37.1 Use the formula to the right to compute the sample correlation coefficient:
a. 0.8045
b. -0.8045
c. 0
d. 1
37.2 The least squares estimate of b1 equals
a. 0.923
b. 1.991
c. -1.991
d. -0.923
37.3 The least squares estimate of b0 equals

a. 0.923
b. 1.991
c. -1.991
d. -0.923
37.4 The sum of squares due to regression (SSR) is

a. 1434
b. 505.98
c. 50.598
d. 928.02
37.5 The coefficient of determination equals

a. 0.6471
b. -0.6471
c. 0
d. 1
37.6 The point estimate of y when x = 0.55 is

a. 0.17205
b. 2.018
c. 1.0905
d. -2.018
e. -0.17205
MULTIPLE CHOICE ANSWERS
1. c 11. b 21. b 31. c 37.5 a

2. b 12. b 22. b 32. b 37.6 a
3. b 13. e 23. d 33. d
4. c 14. a 24. a 34. c
5. a 15. d 25. b 35. d
6. c 16. d 26. a 36. d
7. a 17. c 27. a 37.1 a
8. b 18. c 28. a 37.2 b
9. a 19. b 29. b 37.3 d
10. c 20. d 30. a 37.4 d
1. The identification problem refers to the difficulties that a researcher encounters when
trying to
a. determine which independent variables influence quantity demanded.
b. find accurate data on the price of a commodity and on the quantity

demanded of a commodity.
c. estimate a demand function from data on commodity price and quantity

demanded.
d. measure the impact of extraneous variables on experimental market data.
2. The estimation of consumer demand by questioning a sample of consumers is referred to

as the
a. consumer survey approach.
b. observational research approach.
c. consumer clinic approach.
d. market experiment approach.
3. The estimation of consumer demand by setting up simulated stores, providing a sample of

consumers with money, and then allowing them to purchase and keep the commodities
they select in the stores is called the
4. The estimation of consumer demand by monitoring actual purchasing and consumption

behavior by a sample of consumers is called the
5. If the t ratio for the slope of a simple linear regression equation is -2.48 and the critical
values of the t distribution at the 1% and 5% levels, respectively, are 3.499 and 2.365,
then the slope is
a. not significantly different from zero.
b. significantly different from zero at both the 1% and the 5% levels.
c. significantly different from zero at the 1% level but not at the 5% level.
d. significantly different from zero at the 5% level but not at the 1% level.
6. Ordinary least squares is used to estimate a linear relationship between a firm's quantity
sold per month and its total promotional expenditures and the slope of the linear function is
found to be positive and significantly different from zero. Assuming that all other variables,
including product price, were constant during the period covered by the data set, this result
implies that
a. the firm should spend more on promotional expenditures.
b. the firm should spend less on promotional expenditures.
c. promotional expenditures influence demand.
d. promotional expenditures have no influence on demand.
7. Ordinary least squares is used to estimate a linear relationship between a firm's total
revenue per week (in $1,000s) and the average percentage discount from list price allowed
to customers by salespersons. A 95% confidence interval on the slope is calculated from
the regression output. The interval ranges from 1.05 to 2.38. Based on this result, the
researcher
a. can conclude that the slope is significantly different from zero at the 5%
level of significance.
b. can be 95% confident that the effect of a 1% increase in the average price
discount will increase weekly total revenue by between $1,050 and $2,380.
c. has one chance in twenty of incorrectly concluding that the slope is within
the estimated confidence interval.
d. All of the above are correct.
8. The coefficient of determination
a. is maximized by ordinary least squares.
b. has a value between zero and one.
c. will generally increase if additional independent variables are added to a

regression analysis.
9. The coefficient of correlation is
a. a measure of the strength and direction of the linear relationship between

two variables.
b. equal to the size of the change in the Y variable that is caused by a change
in the X variable.
c. is equal to the proportion of the variation in the Y variable that is due to

variations in the X variable.
10. Multiple regression analysis is used when
a. there is not enough data to carry out simple linear regression analysis.
b. the dependent variable depends on more than one independent variable.
c. one or more of the assumptions of simple linear regression are not correct.
d. the relationship between the dependent variable and the independent
variables cannot be described by a linear function.
11. The adjusted value of the coefficient of determination
a. will always increase if additional independent variables are added to the

regression model.
b. is equal to the proportion of the sum of the squared deviations of the

dependent variable from its mean that is explained by the regression model.
c. is always greater than the proportion of the sum of the squared deviations
of the dependent variable from its mean that is explained by the regression model.
d. is always less than the proportion of the sum of the squared deviations of
the dependent variable from its mean that is explained by the regression model.
12. If the F test statistic for a regression is greater than the critical value from the F
distribution, it implies that
a. none of the independent variables in the regression model have a

significant effect on the dependent variable.
b. all of the independent variables in the regression model have significant

effects on the dependent variable.
c. one or more of the independent variables in the regression model have a

significant effect on the dependent variable.
d. None of the above is correct.
13. The standard error of the regression measures the
a. variability of the independent variable(s) relative to its (their) mean.
b. variability of the dependent variable relative to its mean.
c. variability of the dependent variable relative to the regression line.
d. average error that will result if the regression line is used to predict.
14. Multicollinearity refers to a situation in which
a. successive error terms derived from the application of regression analysis

to time series data are correlated.
b. there is a high degree of correlation between the independent variables

included in a multiple regression model.
c. the dependent variable is highly correlated with the independent variable(s)

in a regression analysis.
d. the application of a multiple regression model yields estimates that are

nonlinear in form.
15. Autocorrelation refers to a situation in which
a. successive error terms derived from the application of regression analysis

to time series data are correlated.
b. there is a high degree of correlation between two or more of the
independent variables included in a multiple regression model.
c. the dependent variable is highly correlated with the independent variable(s)

in a regression analysis.
d. the application of a multiple regression model yields estimates that are

nonlinear in form.
16. Heteroskedasticity refers to a situation in which the error terms from a regression analysis
a. do not have equal variance.
b. are not normally distributed.
c. do not have a mean of zero.
17. The Durbin-Watson statistic is used to test for
a. multicollinearity.
b. autocorrelation.
c. heteroskedasticity.
18. Autocorrelation may be the result of
a. the omission of an important explanatory variable.
b. the presence of a trend in the independent variable.
c. nonlinearities in the relationship between the dependent and independent

variables.
19. One advantage of estimating a function in which all variables have been transformed into
their natural logarithms is that
a. problems with multicollinearity will be eliminated.
b. problems with heteroskedasticity will be eliminated.
c. the estimated slope coefficients are all elasticities.
d. None of the above is correct.
20. One difference between foreign and domestic demand for a commodity exported by the
U.S. is that
a. foreign demand is unrelated to the dollar price of the commodity.
b. foreign demand depends on the exchange rate between domestic and

foreign currencies.
c. the domestic price elasticity of demand depends on the availability of

substitute commodities.
d. foreign-made commodities are not good substitutes for U.S. made

commodities.
This set of R Programming Language Multiple Choice Questions & Answers (MCQs)
focuses on “Linear Regression – 2”.
1. In practice, Line of best fit or regression line is found when _____________

a) Sum of residuals (∑(Y – h(X))) is minimum
b) Sum of the absolute value of residuals (∑|Y-h(X)|) is maximum
c) Sum of the square of residuals ( ∑ (Y-h(X))2) is minimum
d) Sum of the square of residuals ( ∑ (Y-h(X))2) is maximum
Answer: c
Explanation: Here we penalize higher error value much more as compared to the smaller
one, such that there is a significant difference between making big errors and small
errors, which makes it easy to differentiate and select the best fit line.
2. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
a) Test error is also always zero
b) Test error is non zero
c) Couldn’t comment on Test error
d) Test error is equal to Train error
Answer: c
Explanation: Test Error depends on the test data. If the Test data is an exact
representation of train data then test error is always zero. But this may not be the case.
3. Which of the following metrics can be used for evaluating regression models?
i) R Squared
ii) Adjusted R Squared
iii) F Statistics
iv) RMSE / MSE / MAE
a) ii and iv
b) i and ii
c) ii, iii and iv
d) i, ii, iii and iv
View Answer
Answer: d
Explanation: These (R Squared, Adjusted R Squared, F Statistics, RMSE / MSE / MAE)
are some metrics which you can use to evaluate your regression model.
4. How many coefficients do you need to estimate in a simple linear regression model
(One independent variable)?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: b
Explanation: In simple linear regression, there is one independent variable so 2
coefficients (Y=a+bx+error).
5. In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
View Answer
Answer: d
Explanation: For linear regression Y=a+bx+error. If neglect error then Y=a+bx. If x
increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
6. Function used for linear regression in R is __________
a) lm(formula, data)
b) lr(formula, data)
c) lrm(formula, data)
d) regression.linear(formula, data)
View Answer
Answer: a
Explanation: lm(formula, data) refers to a linear model in which formula is the object of
the class “formula”, representing the relation between variables. Now this formula is on
applied on the data to create a relationship model.
7. In syntax of linear model lm(formula,data,..), data refers to ______
a) Matrix
b) Vector
c) Array
d) List
View Answer
Answer: b
Explanation: Formula is just a symbol to show the relationship and is applied on data
which is a vector. In General, data.frame are used for data.
8. In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
View Answer
Answer: c
Explanation: Y-intercept is β1 and X-intercept is – (β1 / β2). Intercepts are defined for
axis and formed when the coordinates are on the axis.
1. ________ is an incredibly powerful tool for analyzing data.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
View Answer
Answer: a
Explanation: Linear regression is an incredibly powerful tool for analysing data. we’ll
focus on finding one of the simplest type of relationship: linear. This process is
unsurprisingly called linear regression, and it has many applications.
2. The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
View Answer
Answer: b
Explanation: The square of the correlation coefficient r square will always be positive and
is called the coefficient of determination. This also is equal to the proportion of the total
variability that’s explained by a linear model.
3. Predicting y for a value of x that’s outside the range of values we actually saw for x in
the original data is called ___________
a) Regression
b) Extrapolation
c) Intra polation
d) Polation
View Answer
Answer: b
Explanation: Predicting y for a value of x that is within the interval of points that we saw
in the original data is called interpolation. Predicting y for a value of x that’s outside the
range of values we actually saw for x in the original data is called extrapolation.
4. What is predicting y for a value of x that is within the interval of points that we saw in
the original data called?
a) Regression
b) Extrapolation
c) Intra polation
d) Polation
View Answer
Answer: c
Explanation: Predicting y for a value of x that is within the interval of points that we saw
in the original data is called interpolation. Predicting y for a value of x that’s outside the
range of values we actually saw for x in the original data is called extrapolation.
5. Analysis of variance in short form is?
a) ANOV
b) AVA
c) ANOVA
d) ANVA
View Answer
Answer: c
Explanation: If the ANOVA test determines that the model explains a significant portion
of the variability in the data, then we can consider testing each of the hypotheses and
correcting for multiple comparisons.
6. ________ is a simple approach to supervised learning. It assumes that the
dependence of Y on X1, X2, . . . Xp is linear.
c) Gradient Descent
View Answer
Answer: a
Explanation: Linear regression is a simple approach to supervised learning. It assumes
that the dependence of Y on X1, X2, . . . Xp is linear. linear regression is an incredibly
powerful tool for analysing data.
7. Although it may seem overly simplistic, _______ is extremely useful both conceptually
and practically.
c) Gradient Descent
View Answer
Answer: a
Explanation: Linear regression is a simple approach to supervised learning. It assumes
that the dependence of Y on X1, X2, . . . Xp is linear. linear regression is an incredibly
powerful tool for analysing data.
8. When there are more than one independent variables in the model, then the linear
model is termed as _______
a) Unimodal
b) Multiple model
c) Multiple Linear model
d) Multiple Logistic model
View Answer
Answer: c
Explanation: When there are more than one independent variables in the model, then the
linear model is termed as the multiple linear regression model.
9. The parameter β0 is termed as intercept term and the parameter β1 is termed as
slope parameter. These parameters are usually called as _________
a) Regressionists
b) Coefficients
c) Regressive
d) Regression coefficients
View Answer
Answer: d
Explanation: The parameter β0 is termed as intercept term and the parameter β1 is
termed as slope parameter. These parameters are usually called as regression
coefficients.
10. The sum of squares of the difference between the observations and the line in the
horizontal direction in the scatter diagram can be minimized to obtain the estimates is
generally called?
a) reverse regression method
b) formal regression
c) logistic regression
d) simple regression
View Answer
Answer: a
Explanation: The sum of squares of the difference between the observations and the line
in the horizontal direction in the scatter diagram can be minimized to obtain the estimates
of 0 1 β and β. This is generally called a reverse or inverse regression method.
11. ______ regression method is also known as the ordinary least squares estimation.
a) Simple
b) Direct
c) Indirect
d) Mutual
View Answer
Answer: b
Explanation: Direct regression method also known as the ordinary least squares
estimation. Assuming that a set of n paired observations are available which satisfy the
linear regression model.
12. __________ refers to a group of techniques for fitting and studying the straight-line
relationship between two variables.
c) Gradient Descent
View Answer
Answer: a
Explanation: Linear regression is an incredibly powerful tool for analysing data. we’ll
focus on finding one of the simplest type of relationship: linear. This process is
unsurprisingly called linear regression, and it has many applications.
13. In order to calculate confidence intervals and hypothesis tests, it is assumed that the
errors are independent and normally distributed with mean zero and _______
a) Mean
b) Variance
c) SD
d) KNN
View Answer
Answer: b
Explanation: In order to calculate confidence intervals and hypothesis tests, it is
assumed that the errors are independent and normally distributed with mean zero and
variance.
14. What do we do the curvilinear relationship in linear regression?
a) consider
b) ignore
c) may be considered
d) sometimes consider
View Answer
Answer: b
Explanation: Linear regression models the straight-line relationship between Y and X.
Any curvilinear relationship is ignored. This assumption is most easily evaluated by using
a scatter plot.
15. When hypothesis tests and confidence limits are to be used, the residuals are
assumed to follow the __________distribution.
a) Formal
b) Mutual
c) Normal
d) Abnormal
View Answer
Answer: c
Explanation: When hypothesis tests and confidence limits are to be used, the residuals
are assumed to follow the normal distribution.
1. Which of the following convert a matrix of phi coefficients to polychoric correlations?
a) poly()
b) qline()
c) phi2poly
d) multi.plot()
View Answer
Answer: c
Explanation: In statistics, polychoric correlation is a technique for estimating the
correlation between two theorized normally distributed continuous latent variables, from
two observed ordinal variables.
2. Which of the following is used to plot multiple histograms?
a) multi.plot()
b) multi.hist
c) xyplot.multi()
d) poly()
View Answer
Answer: b
Explanation: A histogram is a graphical representation of the distribution of numerical
data.
3. Which of the following count the number of good cases when doing pairwise analysis?
a) count.pairwise
b) count() +
c) anova.para()
d) count.poly()
View Answer
Answer: a
Explanation: Pairwise comparison generally is any process of comparing entities in pairs
to judge which of each entity is preferred.
4. Which of the following gives the summary of values likes mean etc?
a) mean
b) sd
c) describe
d) lm
View Answer
Answer: c
Explanation: Describe give means, sd, skew, n, and se.
5. The purpose of correct.cor is to correct _________ in values.
a) difference
b) reliability
c) error
d) similar
View Answer
Answer: b
Explanation: Correlation matrix and a vector of reliabilities is given to correct reliability.
6. What plot(s) are used to view the linear regression?
a) Scatterplot
b) Box plot
c) Density plot
d) Scatterplot, Boxplot, Density plot
View Answer
Answer: d
Explanation: Each plot has its own importance of highlighting a specific feature. Scatter
plot is used to visualise the relationship between the variables, Box plot is used to spot
the outliers which effect line of best fit.
7. Common Metrics which are used to select linear model are ____________
a)
R-Squared Lower the better
F-Statistic Higher the better
b)
R-Squared Lower the better
F-Statistic Lower the better
c)
R-Squared Higher the better
F-Statistic Higher the better
d)
R-Squared Higher the better
F-Statistic Lower the better
View Answer
Answer: c
Explanation: For choosing linear regression model it is always advised to have more R-
squared and lower F-Statistic. It ensures the best fit for the given data.
8. In lm(response ~ terms), terms specification of the form “first*second” is same as

__________
a) first+second
b) first:second
c) first+second+first:second
d) first:second+second:first
View Answer
Answer: c
Explanation: A terms specification of the form “first + second” indicates all the terms in
first together with all the terms in second with duplicates removed.
Question 1
An analyst runs a regression of monthly value-stock returns on four independent
variables over 48 months. The total sum of squares for the regression is 360 and
the sum of squared errors is 120. Calculate the R2.
A. 42.1%
B. 50%
C. 33.3%
D. 66.7%
The correct answer is D.
R2=ESSTSS=360–120360=66.7R2=ESSTSS=360–120360=66.7
Question 2
Refer to the previous problem and calculate the adjusted R2.
A. 27.1%
B. 63.6%
C. 72.9%
D. 36.4%
The correct answer is B.
¯R2=1−n−1n−k−1×(1–R2)R¯2=1−n−1n−k−1×(1–R2)
¯R2=1−48−148−4−1×(1–0.667)=63.6%R¯2=1−48−148−4−1×(1–0.667)=63.6%
Question 3
Refer to the previous problem. The analyst now adds four more independent
variables to the regression and the new R2 increases to 69%. What is the new
adjusted R2 and which model would the analyst prefer?
A. The analyst would prefer the model with four variables because its adjusted R2 is
higher.
B. The analyst would prefer the model with four variables because its adjusted R2 is
lower.
C. The analyst would prefer the model with eight variables because its adjusted
R2 is higher.
D. The analyst would prefer the model with eight variables because its adjusted
R2 is lower.
The correct answer is A.
NewR2=69%NewR2=69%
NewadjustedR2=1−48−148−8−1×(1–
0.69)=62.6%NewadjustedR2=1−48−148−8−1×(1–0.69)=62.6%
The analyst would prefer the first model because it has a higher adjusted R 2 and the
model has four independent variables as opposed to eight.
Question 4
An economist tests the hypothesis that GDP growth in a certain country can be
explained by interest rates and inflation.
Using some 30 observations, the analyst formulates the following regression

equation:
GDP growth=^β0+^β1Interest+^β2InflationGDP
growth=β^0+β^1Interest+β^2Inflation
Regression estimates are as follows:
CoefficientStandard errorIntercept0.100.5%Interest
rates0.200.05Inflation0.150.03CoefficientStandard errorIntercept0.100.5%Interest
rates0.200.05Inflation0.150.03
Is the coefficient for interest rates significant at 5%?
A. Since the test statistic < t-critical, we accept H0; the interest rate coefficient
is not significant at the 5% level.
B. Since the test statistic > t-critical, we reject H0; the interest rate coefficient
is not significant at the 5% level.
C. Since the test statistic > t-critical, we reject H0; the interest rate coefficient is
significant at the 5% level.
D. Since the test statistic < t-critical, we accept H1; the interest rate coefficient
is significant at the 5% level.
The correct answer is C.
We have GDP growth = 0.10 + 0.20(Int) + 0.15(Inf)
Hypothesis:
H0:^β1=0vsH1:^β1≠0H0:β^1=0vsH1:β^1≠0
The test statistic is:
t=(0.20–00.05)=4t=(0.20–00.05)=4
Decision: Since test statistic > t-critical, we reject H0.
Conclusion: The interest rate coefficient is significant at the 5% level.

Quiz 2

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Quiz 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quiz 2

Uploaded by

Copyright:

Available Formats

CORRELATION & REGRESSION

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, select the best answer.

1. The correlation coefficient is used to determine:

The above equation implies that:

5. SSE can never be

7. In regression analysis, the variable that is being predicted is the

11. In a regression analysis if r2 = 1, then

The above equation implies that an

16. In a regression and correlation analysis if r2 = 1, then

19. If the coefficient of determination is equal to 1, then the correlation coefficient

25. If the coefficient of determination is 0.81, the correlation coefficient

26. A fitted least squares regression line

37.3 The least squares estimate of b0 equals

37.4 The sum of squares due to regression (SSR) is

37.5 The coefficient of determination equals

37.6 The point estimate of y when x = 0.55 is

MULTIPLE CHOICE ANSWERS

1. c 11. b 21. b 31. c 37.5 a

a. determine which independent variables influence quantity demanded.

b. find accurate data on the price of a commodity and on the quantity

c. estimate a demand function from data on commodity price and quantity

d. measure the impact of extraneous variables on experimental market data.

2. The estimation of consumer demand by questioning a sample of consumers is referred to

a. consumer survey approach.

b. observational research approach.

c. consumer clinic approach.

d. market experiment approach.

3. The estimation of consumer demand by setting up simulated stores, providing a sample of

a. consumer survey approach.

b. observational research approach.

c. consumer clinic approach.

d. market experiment approach.

4. The estimation of consumer demand by monitoring actual purchasing and consumption

a. consumer survey approach.

b. observational research approach.

c. consumer clinic approach.

d. market experiment approach.

a. not significantly different from zero.

b. significantly different from zero at both the 1% and the 5% levels.

a. the firm should spend more on promotional expenditures.

b. the firm should spend less on promotional expenditures.

c. promotional expenditures influence demand.

d. promotional expenditures have no influence on demand.

d. All of the above are correct.

8. The coefficient of determination

a. is maximized by ordinary least squares.

b. has a value between zero and one.

c. will generally increase if additional independent variables are added to a

d. All of the above are correct.

9. The coefficient of correlation is

a. a measure of the strength and direction of the linear relationship between

c. is equal to the proportion of the variation in the Y variable that is due to

d. All of the above are correct.

10. Multiple regression analysis is used when

b. the dependent variable depends on more than one independent variable.

11. The adjusted value of the coefficient of determination

a. will always increase if additional independent variables are added to the