R07 Machine Learning - Answers
R07 Machine Learning - Answers
R07 Machine Learning - Answers
Explanation
Over tting results when a large number of features (i.e., independent variables) are included in the data
sample. The resulting model can use the "noise" in the dependent variables to improve the model t.
Over tting the model in this way will actually decrease the accuracy of model forecasts on other (out-of-
sample) data.
A) Typical data analytics tasks for supervised learning include classi cation and prediction.
Explanation
Supervised learning utilizes labeled training data to guide the ML program but does not need "human
intervention." Typical data analytics tasks for supervised learning include classi cation and prediction.
What is the appropriate remedy in the presence of excessive number of features in a data set?
A) Dimension reduction.
C) Unsupervised learning.
Explanation
Big Data refers to very large data sets which may include both structured (e.g. spreadsheet) data and
unstructured (e.g. emails, text, or pictures) data and includes a large number of features as well as
number of observations. Dimension reduction seeks to remove the noise (i.e., those attributes that do
not contain much information) when the number of features in a data set (its dimension) is excessive.
Explanation
In unsupervised learning, the ML program is not given labeled training data. Instead, inputs are
provided without any conclusions about those inputs. In the absence of any tagged data, the program
seeks out structure or inter-relationships in the data. Clustering is one example of the output of
unsupervised ML program while classi cation is suited for supervised learning.
Considering the various supervised machine learning algorithms, a linear classi er that seeks the optimal
hyperplane and is typically used for classi cation, best describes:
Explanation
Support vector machine (SVM) is a linear classi er that aims to seek the optimal hyperplane, i.e. the one
that separates the two sets of data points by the maximum margin. SVM is typically used for
classi cation.
Explanation
Out-of-sample error equals bias error plus variance error plus base error. Bias error is the extent to
which a model ts the training data. Variance error describes the degree to which a model's results
change in response to new data from validation and test samples. Base error comes from randomness
in the data.
The degree to which a machine learning model retains its explanatory power when predicting out-of-
sample is most commonly described as:
A) predominance.
B) generalization.
C) hegemony.
Explanation
Generalization describes the degree to which, when predicting out-of-sample, a machine learning model
retains its explanatory power.
An algorithm that involves an agent that performs actions that will maximize its rewards over time, taking
into consideration the constraints of its environment, best describes:
A) neural networks.
B) reinforcement learning.
Explanation
Reinforcement learning algorithms involve an agent that will perform actions that will maximize its
rewards over time, taking into consideration the constraints of its environment. Neural networks consist
of nodes connected by links; learning takes place in the hidden layer nodes, each of which consists of a
summation operator and an activation function. Neural networks with many hidden layers (often more
than 20) are known as deep learning nets (DLNs) and used in arti cial intelligence.
Which supervised learning model is most appropriate (1) when the Y-variable is continuous and (2) when
the Y-variable is categorical
Continuous Y- Categorical Y-
variable variable
Explanation
When the Y-variable is continuous, the appropriate approach is that of regression (used in a broad, ML
context). When the Y-variable is categorical (i.e., belonging to a category or classi cation) or ordinal (i.e.,
ordered or ranked), a classi cation model is used.
Explanation
One elementary way to think of ML algorithms is to " nd the pattern, apply the pattern."Machine
learning attempts to extract knowledge from large amounts of data by learning from known examples in
order to determine an underlying structure in the data. The focus is on generating structure or
predictions without human intervention.
Explanation
Random forest is a collection of randomly generated classi cation trees from the same data set. A
randomly selected subset of features is used in creating each tree and hence each tree is slightly
di erent from the others. Since each tree only uses a subset of features, random forests can mitigate
the problem of over tting. Because errors across di erent trees tend to cancel each other out, using
random forests can increase the signal-to-noise ratio.
The unsupervised machine learning algorithm that reduces highly correlated features into fewer
uncorrelated composite variables by transforming the feature covariance matrix best describes:
A) k-means clustering
B) hierarchical clustering
Explanation
Principal components analysis (PCA) is an unsupervised machine learning algorithm that reduces highly
correlated features into fewer uncorrelated composite variables by transforming the feature covariance
matrix. K-means partitions observations into a xed number (k) of non-overlapping clusters.
Hierarchical clustering is an unsupervised iterative algorithm used to build a hierarchy of clusters.
A) clustering.
B) supervised learning.
C) unsupervised learning.
Explanation
The technique in which a machine learns to model a set of output data from a given set of inputs is best
described as:
A) deep learning.
B) unsupervised learning.
C) supervised learning.
Explanation
Supervised learning is a machine learning technique in which a machine is given labelled input and
output data and models the output data based on the input data. In unsupervised learning, a machine is
given input data in which to identify patterns and relationships, but no output data to model. Deep
learning is a technique to identify patterns of increasing complexity and may use supervised or
unsupervised learning.
Considering the various supervised machine learning algorithms, a penalized regression where the penalty
term is the sum of the absolute values of the regression coe cients best describes:
Explanation
LASSO (least absolute shrinkage and selection operator) is a popular type of penalized regression in
which the penalty term comprises summing the absolute values of the regression coe cients. The more
included features, the larger the penalty will be. The result is that a feature needs to make a su cient
contribution to model t to o set the penalty from including it.