Datamining Bits
Datamining Bits
1. ...................... is an essential process where intelligent methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
5. ............................. is a comparison of the general features of the target class data objects against the
general features of objects from one or multiple contrasting classes.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection 6. Strategic value of data mining is ......................
A) cost-sensitive
B) work-sensitive
C) time-sensitive
D) technical-sensitive
7. ............................. is the process of finding a model that describes and distinguishes data classes or
concepts.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
A) i, ii and iv only
B) ii, iii and iv only
C) i, ii and iii only
D) All i, ii, iii and iv
A) Knowledge Database
B) Knowledge Discovery Database
C) Knowledge Data House
D) Knowledge Data Definition
A) Data
B) Information
C) Query
D) Useful information
12. The @active data warehouse architecture includes which of the following?
A)A system that is used to run the business in real time and is based on historical data.
B)A system that is used to run the business in real time and is based on current data.
C)A system that is used to support decision making and is based on current data.
D)A system that is used to support decision making and is based on historical data.
17. Represent any n–Dimension data we need a series of _______ Dimension cubes.
A) (n–1)
B) N
C) n+1
D) n+2
18. Data about data is called _____________
A)Table
B) Database
C) Metadata
D) Integration
Answer A
2. Frequency of occurrence of an itemset is called as _____
(A)Support
(B)Confidence
(C)Support Count
(D)Rules
Answe C
3. An itemset whose support is greater than or equal to a minimum support threshold is ______
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Answer B
4. What does FP growth algorithm do?
(A)It mines all frequent patterns through pruning rules with lesser support
(B)It mines all frequent patterns through pruning rules with higher support
(C)It mines all frequent patterns by constructing a FP tree
(D)It mines all frequent patterns by constructing an itemsets
Answer C
5. What techniques can be used to improve the efficiency of apriori algorithm?
(A)Hash-based techniques
(B)Transaction Increases
(C)Sampling
(D)Cleaning
Answer A
6. What do you mean by support(A)?
(A)Total number of transactions containing A
(B)Total Number of transactions not containing A
(C)Number of transactions containing A / Total number of transactions
(D)Number of transactions not containing A / Total number of transactions
Answer C
7. How do you calculate Confidence (A -> B)?
(A)Support(A #∩# B) / Support (A)
(B)Support(A #∩# B) / Support (B)
(C)Support(A #∪# B) / Support (A)
(D)Support(A #∪# B) / Support (B)
Answer A
8. Which of the following is the direct application of frequent itemset mining?
(A)Social Network Analysis
(B)Market Basket Analysis
(C)Outlier Detection
(D)Intrusion Detection
Answer B
9. What is not true about FP growth algorithms?
(A)It mines frequent itemsets without candidate generation
(B)There are chances that FP trees may not fit in the memory
(C)FP trees are very expensive to build
(D)It expands the original database to build FP trees
Answer D
10. When do you consider an association rule interesting?
(A)If it only satisfies min_support
(B)If it only satisfies min_confidence
(C)If it satisfies both min_support and min_confidence
(D)There are other measures to check so
Answer C
11. What is the relation between a candidate and frequent itemsets?
(A)A candidate itemset is always a frequent itemset
(B)A frequent itemset must be a candidate itemset
(C)No relation between these two
(D)Strong relation with transactions
Answer B
12. Which of the following is not a frequent pattern mining algorithm?
(A)Apriori
(B)FP growth
(C)Decision trees
(D)Eclat
Answer C
13. Which algorithm requires fewer scans of data?
(A)Apriori
(B)FP Growth
(C)Naive Bayes
(D)Decision Trees
Answer B
14. For the question given below consider the data Transactions :
Answer A
16. What will happen if support is reduced?
(A)Number of frequent itemsets remains the same
(B)Some itemsets will add to the current set of frequent itemsets
(C)Some itemsets will become infrequent while others will become frequent
(D)Can not say
Answer B
UNIT III
5) You are given data about seismic activity in Japan, and you want to predict the magnitude
of the next earthquake, this is in an example of…
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Ans: A
6) Assume you want to perform supervised learning and to predict a number of newborns
according to the size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it
is an example of …
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
Ans: B
7) Discriminating between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Ans: A
8) In the example of predicting the number of babies based on storks’ population size, a
number of babies are…
A. outcome
B. feature
C. attribute
D. observation
Ans: A
9) It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.
A. True
B. False
Ans: B
12) Which of the following issue is considered before investing in Data Mining?
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Ans: D
16. Bias is
A.A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory
B. Any mechanism employed by a learning system to constrain the search space of a
hypothesis
C. An approach to the design of learning algorithms that are inspired by the fact that
when people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: B
19. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: A
22. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for a machine-
learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be
extracted
D. None of these
Ans: A
UNIT IV
2. it possible that Assignment of observations to clusters does not change between successive iterations
in K-Means
A) Yes
B) No
C) Can’t say
D) None of these
Solution: (A)
3. of the following clustering algorithms suffers from the problem of convergence at local optima?
A) K- Means clustering algorithm
B)Agglomerative clustering algorithm
C) Expectation-Maximization clustering algorithm
D) Diverse clustering algorithm
Solution: (A,C)
5. Which of the following is/are valid iterative strategy for treating missing values before clustering
analysis?
A) Imputation with mean
B) Nearest Neighbor assignment
C) Imputation with Expectation Maximization algorithm
D) All of the above
Solution: (C)
6. What will be the cluster centroids if you want to proceed for second iteration?
A) C1: (4,4), C2: (2,2), C3: (7,7)
B) C1: (6,6), C2: (4,4), C3: (9,9)
C) C1: (2,2), C2: (0,0), C3: (5,5)
D) None of these
Solution: (A)
7. What will be the Manhattan distance for observation (9, 9) from cluster centroid C1. In second
iteration.
A) 10
B) 5*sqrt(2)
C) 13*sqrt(2)
D) None of these
Solution: (A)
Manhattan distance between centroid C1 i.e. (4, 4) and (9, 9) = (9-4) + (9-4) = 10
8. scaling is an important step before applying K-Mean algorithm. What is reason behind this?
A) In distance calculation it will give the same weights for all features
B) You always get the same clusters. If you use or don’t use feature scaling
C) In Manhattan distance it is an important step but in Euclidian it is not
D) None of these
Solution; (A)
9. Which of the following method is used for finding optimal of cluster in K-Mean algorithm?
A) Elbow method
B) Manhattan method
C) Ecludian mehthod
D) All of the above
Solution: (A)
10. Which of the following are the high and low bounds for the existence of F-Score?
A) [0,1]
B) (0,1)
C) [-1,1]
D) None of the above
Solution: (A)
12. In systematic sampling, population is 200 and selected sample size is 50 then sampling interval is
A) 250
B) 0.25
C) 4
D) 40
a. Solution: (C)
13. In cluster sampling, elements of selected clusters are classified as
A) elementary units
B) primary units
C) secondary units
D) proportional units
Solution: (A)
14. Type of sampling in which desired and useful information is gathered from best position holder is
classified as
A) quota sampling
B) convenience sampling
C) purposive sampling
D) judgment sampling
Solution: (D)
15. Method of sampling in which random sampling will not be possible because population is widely spread
is classified as
A) secondary stage sampling
B) multistage sampling
C) primary stage sampling
D) sub stage sampling
Solution: (B)
16. The most important part of _____ is selecting the variables on which clustering is baseD)
17. The most commonly used measure of similarity is the _____ or its square.
A) euclidean distance
B) city-block distance
C) Chebychev’s distance
D) Manhattan distance
Solution: (A)
18. _____ is a clustering procedure where all objects start out in one giant cluster. Clusters are formed by
dividing this cluster into smaller and smaller clusters.
A) Non-hierarchical clustering
B) Hierarchical clustering
C) Divisive clustering
D) Agglomerative clustering
Solution: (C)
19. The _____ method uses information on all pairs of distances, not merely the minimum or maximum
distances.
A) single linkage
B) medium linkage
C) complete linkage
D) average linkage
Solution: (D)
UNIT V
6.Which of the following is not one of the four types of variation that is estimated in time-series analysis?
A) Predictable
B) Trend
C) Cyclical
D) Irregular
8.In time-series analysis, which source of variation can be estimated by the ratio-to-trend method?
A) Cyclical
B) Trend
C) Seasonal
D) Irregular
9.If regression analysis is used to estimate the linear relationship between the natural logarithm of the
variable to be forecast and time, then the slope estimate is equal to
A) the linear trend.
B) the natural logarithm of the rate of growth.
C) the natural logarithm of one plus the rate of growth.
D) the natural logarithm of the square root of the rate of growth.
15.If 3 of the leading indicators move up, 2 move down, and the remaining 6 are constant, then the diffusion
index is
A) 3/6 = 50%
B) 3/11 = 27%
C) 5/11 = 45%
D) 6/11 = 55%
16. A single-equation econometric model of the demand for a product is a ________ equation in which
the quantity demanded of the product is an ________ variable.
A) structural, exogenous
B) structural, endogenous
C) definitional, exogenous
D) definitional, endogenous
Answers
1. A qualitative forecast
The correct answer was: b. predicts the direction, but not the magnitude, of change in a variable.
2. Which of the following is not a qualitative forecasting technique?
The correct answer was: d. Time-series analysis.
3. The first step in time-series analysis is to
The correct answer was: c. plot the data on a graph..
4. Forecasts are referred to as naive if they
The correct answer was: a. are based only on past values of the variable..
5. Time-series analysis is based on the assumption that
The correct answer was: c. past patterns in the variable to be forecast will continue unchanged into
the future..
6. Which of the following is not one of the four types of variation that is estimated in time-series
analysis?
The correct answer was: a. Predictable.
7. The cyclical component of time-series data is usually estimated using
The correct answer was: d. qualitative methods..
8. In time-series analysis, which source of variation can be estimated by the ratio-to-trend method?
The correct answer was: c. Seasonal.
9. If regression analysis is used to estimate the linear relationship between the natural logarithm of
the variable to be forecast and time, then the slope estimate is equal to
The correct answer was: c. the natural logarithm of one plus the rate of growth..
10. The use of a smoothing technique is appropriate when
The correct answer was: a. random behavior is the primary source of variation..
11. The greatest smoothing effect is obtained by using
The correct answer was: b. exponential smoothing with a small weight value..
12. The root-mean-square error is a measure of
The correct answer was: d. forecast accuracy..
13. Barometric methods are used to forecast
The correct answer was: c. cyclical variation..
14. A leading indicator is a measure that usually
The correct answer was: c. changes in the same direction as the general economy before the general
economy changes..
15. If 3 of the leading indicators move up, 2 move down, and the remaining 6 are constant, then the
diffusion index is
The correct answer was: b. 3/11 = 27%.
16. A single-equation econometric model of the demand for a product is a ________ equation in
which the quantity demanded of the product is an ________ variable.
The correct answer was: b. structural, endogenous.
17. A reduced form equation expresses
The correct answer was: b. an endogenous variable as a function of exogenous variables..
18. Trend projection is an example of which kind of forecasting?
The correct answer was: b. Time-series.
19. Turning points in the level of economic activity can be forecast by using
The correct answer was: c. Barometric methods.
20. Econometric forecasts require
The correct answer was: d. all of the above..