Question #1 of 23 Question ID: 1472209
Which of the following about unsupervised learning is most accurate?
A) There is no labeled data.
Unsupervised learning has lower forecasting accuracy as compared to supervised
B)
learning.
C) Classification is an example of unsupervised learning algorithm.
Explanation
In unsupervised learning, the ML program is not given labeled training data. Instead, inputs
are provided without any conclusions about those inputs. In the absence of any tagged data,
the program seeks out structure or inter-relationships in the data. Clustering is one example
of the output of unsupervised ML program while classification is suited for supervised
learning.
(Module 3.1, LOS 3.a)
Question #2 of 23 Question ID: 1508649
The unsupervised machine learning algorithm that reduces highly correlated features into
fewer uncorrelated composite variables by transforming the feature covariance matrix best
describes:
A) principal components analysis.
B) k-means clustering.
C) hierarchical clustering.
Explanation
Principal components analysis (PCA) is an unsupervised machine learning algorithm that
reduces highly correlated features into fewer uncorrelated composite variables by
transforming the feature covariance matrix. K-means partitions observations into a fixed
number (k) of non-overlapping clusters. Hierarchical clustering is an unsupervised iterative
algorithm used to build a hierarchy of clusters.
(Module 3.3, LOS 3.d)
Question #3 of 23 Question ID: 1472214
In machine learning, out-of-sample error equals:
A) Standard error plus data error plus prediction error.
B) bias error plus variance error plus base error.
C) forecast error plus expected error plus regression error.
Explanation
Out-of-sample error equals bias error plus variance error plus base error. Bias error is the
extent to which a model fits the training data. Variance error describes the degree to which a
model's results change in response to new data from validation and test samples. Base error
comes from randomness in the data.
(Module 3.1, LOS 3.b)
Question #4 of 23 Question ID: 1472207
The technique in which a machine learns to model a set of output data from a given set of
inputs is best described as:
A) unsupervised learning.
B) supervised learning.
C) deep learning.
Explanation
Supervised learning is a machine learning technique in which a machine is given labelled
input and output data and models the output data based on the input data. In unsupervised
learning, a machine is given input data in which to identify patterns and relationships, but no
output data to model. Deep learning is a technique to identify patterns of increasing
complexity and may use supervised or unsupervised learning.
(Module 3.1, LOS 3.a)
Question #5 - 8 of 23 Question ID: 1472228
Nowak first tries to explain classification and regression tree (CART) to Kowalski. CART is least
likely to be applied to predict a:
A) categorical target variable, producing a classification tree.
B) discrete target variable, producing a cardinal tree.
C) continuous target variable, producing a regression tree.
Explanation
Classification and regression trees (CART) are generally applied to predict either a continuous
target variable, producing a regression tree, or a categorical target variable, producing a
classification tree.
(Module 3.2, LOS 3.c)
Question #6 - 8 of 23 Question ID: 1472229
Which of the following statements Nowak makes about hierarchical clustering is most accurate?
A) In divisive hierarchical clustering, the algorithm seeks out the two closest clusters.
Hierarchical clustering is a supervised iterative algorithm that is used to build a
B)
hierarchy of clusters.
Bottom-up hierarchical clustering begins with each observation being its own
C)
cluster.
Explanation
Agglomerative (bottom-up) hierarchical clustering begins with each observation being its own
cluster. Then, the algorithm finds the two closest clusters, and combines them into a new,
larger cluster. Hierarchical clustering is an unsupervised iterative algorithm. Divisive (top-
down) hierarchical clustering progressively partitions clusters into smaller clusters until each
cluster contains only one observation.
(Module 3.3, LOS 3.d)
Question #7 - 8 of 23 Question ID: 1472230
Which of the following statements Nowak makes about neural networks is most accurate?
Neural networks:
are effective in tasks with non-linearities and complex interactions among
A)
variables.
have four types of layers: an input layer, agglomerative layers, regularization
B)
layers, and an output layer.
have an input layer node that consists of a summation operator and an activation
C)
function.
Explanation
Neural networks have been successfully applied to solve a variety of problems characterized
by non-linearities and complex interactions among variables. Neural networks have three
types of layers: an input layer, hidden layers, and an output layer. The hidden layer nodes
(not the input layer nodes) each consist of a summation operator and an activation function;
these nodes are where learning takes place.
(Module 3.3, LOS 3.e)
Question #8 - 8 of 23 Question ID: 1472231
Nowak tries to explain the reinforcement learning (RL) algorithm to Kowalski and makes a
number of statements about it. The reinforcement learning (RL) algorithm involves an agent
that is most likely to:
A) perform actions that will minimize costs over time.
B) take into consideration the constraints of its environment.
C) make use of direct labeled data and instantaneous feedback.
Explanation
The reinforcement learning (RL) algorithm involves an agent that will perform actions that will
maximize its rewards over time, taking into consideration the constraints of the environment.
Unlike supervised learning, reinforcement learning has neither instantaneous feedback nor
direct labeled data for each observation.
(Module 3.3, LOS 3.e)
Question #9 of 23 Question ID: 1472208
Which of the following statements about supervised learning is most accurate?
A) Supervised learning requires human intervention in machine learning process.
Typical data analytics tasks for supervised learning include classification and
B)
prediction.
C) Supervised learning does not differentiate between tag and features.
Explanation
Supervised learning utilizes labeled training data to guide the ML program but does not need
"human intervention." Typical data analytics tasks for supervised learning include
classification and prediction.
(Module 3.1, LOS 3.a)
Question #10 of 23 Question ID: 1472210
Which supervised learning model is most appropriate (1) when the Y-variable is continuous and
(2) when the Y-variable is categorical
Continuous Y-variable Categorical Y-variable
A) Classification Neural Networks
B) Decision trees Regression
C) Regression Classification
Explanation
When the Y-variable is continuous, the appropriate approach is that of regression (used in a
broad, ML context). When the Y-variable is categorical (i.e., belonging to a category or
classification) or ordinal (i.e., ordered or ranked), a classification model is used.
(Module 3.1, LOS 3.a)
Question #11 of 23 Question ID: 1472221
An algorithm that involves an agent that performs actions that will maximize its rewards over
time, taking into consideration the constraints of its environment, best describes:
A) reinforcement learning.
B) neural networks.
C) deep learning nets.
Explanation
Reinforcement learning algorithms involve an agent that will perform actions that will
maximize its rewards over time, taking into consideration the constraints of its environment.
Neural networks consist of nodes connected by links; learning takes place in the hidden layer
nodes, each of which consists of a summation operator and an activation function. Neural
networks with many hidden layers (often more than 20) are known as deep learning nets
(DLNs) and used in artificial intelligence.
(Module 3.3, LOS 3.e)
Question #12 of 23 Question ID: 1472213
The degree to which a machine learning model retains its explanatory power when predicting
out-of-sample is most commonly described as:
A) hegemony.
B) generalization.
C) predominance.
Explanation
Generalization describes the degree to which, when predicting out-of-sample, a machine
learning model retains its explanatory power.
(Module 3.1, LOS 3.b)
Question #13 of 23 Question ID: 1472218
What is the appropriate remedy in the presence of excessive number of features in a data set?
A) Unsupervised learning.
B) Big data analysis.
C) Dimension reduction.
Explanation
Big Data refers to very large data sets which may include both structured (e.g. spreadsheet)
data and unstructured (e.g. emails, text, or pictures) data and includes a large number of
features as well as number of observations. Dimension reduction seeks to remove the noise
(i.e., those attributes that do not contain much information) when the number of features in
a data set (its dimension) is excessive.
(Module 3.3, LOS 3.d)
Question #14 of 23 Question ID: 1472219
Dimension reduction is most likely to be an example of:
A) supervised learning.
B) unsupervised learning.
C) clustering.
Explanation
Dimension reduction and clustering are examples of unsupervised learning algorithms.
(Module 3.3, LOS 3.d)
Question #15 of 23 Question ID: 1508648
Considering the various supervised machine learning algorithms, a linear classifier that seeks
the optimal hyperplane and is typically used for classification, best describes:
A) classification and regression tree (CART).
B) support vector machine (SVM).
C) k-nearest neighbor (KNN).
Explanation
Support vector machine (SVM) is a linear classifier that aims to seek the optimal hyperplane,
i.e. the one that separates the two sets of data points by the maximum margin. SVM is
typically used for classification.
(Module 3.2, LOS 3.c)
Question #16 of 23 Question ID: 1508647
Considering the various supervised machine learning algorithms, a penalized regression where
the penalty term is the sum of the absolute values of the regression coefficients best describes:
A) k-nearest neighbor (KNN).
B) support vector machine (SVM).
C) least absolute shrinkage and selection operator (LASSO).
Explanation
LASSO (least absolute shrinkage and selection operator) is a popular type of penalized
regression in which the penalty term comprises summing the absolute values of the
regression coefficients. The more included features, the larger the penalty will be. The result
is that a feature needs to make a sufficient contribution to model fit to offset the penalty
from including it.
(Module 3.2, LOS 3.c)
Question #17 of 23 Question ID: 1472215
A random forest is least likely to:
A) provide a solution to overfitting problem.
B) be a classification tree.
C) reduce signal-to-noise ratio.
Explanation
Random forest is a collection of randomly generated classification trees from the same data
set. A randomly selected subset of features is used in creating each tree and hence each tree
is slightly different from the others. Since each tree only uses a subset of features, random
forests can mitigate the problem of overfitting. Because errors across different trees tend to
cancel each other out, using random forests can increase the signal-to-noise ratio.
(Module 3.2, LOS 3.c)
Question #18 of 23 Question ID: 1472212
Overfitting is least likely to result in:
A) higher forecasting accuracy in out-of-sample data.
B) higher number of features included in the data set.
C) inclusion of noise in the model.
Explanation
Overfitting results when a large number of features (i.e., independent variables) are included
in the data sample. The resulting model can use the "noise" in the dependent variables to
improve the model fit. Overfitting the model in this way will actually decrease the accuracy of
model forecasts on other (out-of-sample) data.
(Module 3.1, LOS 3.b)
Question #19 - 22 of 23 Question ID: 1472223
Tan is interested in using a supervised learning algorithm to analyze stocks. This task is least
likely to be a classification problem if the target variable is:
A) categorical.
B) ordinal.
C) continuous.
Explanation
Supervised learning can be divided into two categories: regression and classification. If the
target variable is categorical or ordinal (e.g., determining a firm's rating), then it is a
classification problem. If the target variable to be predicted is continuous, then the task is
one of regression.
(Module 3.1, LOS 3.a)
Question #20 - 22 of 23 Question ID: 1472224
After Tan implements a particular new supervised machine learning algorithm, she begins to
suspect that the holdout samples she is using are reducing the training set size too much. As a
result, she begins to make use of K-fold cross-validation. In the K-fold cross-validation
technique, after Tan shuffles the data randomly it is most likely that:
A) k – 1 samples will be used as validation samples.
B) k – 1 samples will be used as training samples.
C) the data will be divided into k – 1 equal sub-samples.
Explanation
In the K-fold cross-validation technique, the data is shuffled randomly and then divided into k
equal sub-samples. One sample is saved to be used as a validation sample, and the other k –
1 samples are used as training samples.
(Module 3.1, LOS 3.b)
Question #21 - 22 of 23 Question ID: 1472225
At first Tan bases her stock picks on the results of a single machine-learning model, but then
begins to wonder if she should instead be using the predictions of a group of models.
Compared to a single machine-learning model, an ensemble machine learning algorithm is
most likely to produce predictions that are:
A) less reliable but more steady.
B) more accurate and more stable.
C) more precise but less dependable.
Explanation
Ensemble learning, which is a technique of combining the predictions from a number of
models, generally results in more accurate and more stable predictions than a single model.
(Module 3.2, LOS 3.c)
Question #22 - 22 of 23 Question ID: 1627212
Tan is interested in applying neural networks, deep learning nets, and reinforcement learning
to her investment process. Regarding these techniques, which of the following statements is
most accurate?
Neural networks with one or more hidden layers would be considered deep
A)
learning nets (DLNs).
Reinforcement learning algorithms achieve maximum performance when they stay
B)
as far away from their constraints as possible.
Neural networks work well in the presence of non-linearities and complex
C)
interactions among variables.
Explanation
Neural networks have been successfully applied to a variety of investment tasks
characterized by non-linearities and complex interactions among variables.
Neural networks with at least two hidden layers are known as deep learning nets (DLNs).
Reinforcement learning algorithms use an agent that will maximize its rewards over time,
within the constraints of its environment.
(Module 3.3, LOS 3.e)
Question #23 of 23 Question ID: 1472211
A rudimentary way to think of machine learning algorithms is that they:
A) “synthesize the pattern, review the pattern.”
B) “develop the pattern, interpret the pattern.”
C) “find the pattern, apply the pattern.”
Explanation
One elementary way to think of ML algorithms is to "find the pattern, apply the pattern."
Machine learning attempts to extract knowledge from large amounts of data by learning from
known examples in order to determine an underlying structure in the data. The focus is on
generating structure or predictions without human intervention.
(Module 3.1, LOS 3.a)