[go: up one dir, main page]

0% found this document useful (0 votes)
56 views16 pages

Datamining Bits

Uploaded by

Gangadas G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views16 pages

Datamining Bits

Uploaded by

Gangadas G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT I

1. ...................... is an essential process where intelligent methods are applied to extract data patterns.

A) Data warehousing
B) Data mining
C) Text mining
D) Data selection

2. Data mining can also applied to other forms such as ................

i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data

A) i, ii, iii and v only


B) ii, iii, iv and v only
C) i, iii, iv and v only
D) All i, ii, iii, iv and

3. Which of the following is not a data mining functionality?

A) Characterization and Discrimination


B) Classification and regression
C) Selection and interpretation
D) Clustering and Analysis

4. ............................. is a summarization of the general characteristics or features of a target class of data.

A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection

5. ............................. is a comparison of the general features of the target class data objects against the
general features of objects from one or multiple contrasting classes.

A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection 6. Strategic value of data mining is ......................

A) cost-sensitive
B) work-sensitive
C) time-sensitive
D) technical-sensitive
7. ............................. is the process of finding a model that describes and distinguishes data classes or
concepts.

A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection

8. The various aspects of data mining methodologies is/are ...................

i) Mining various and new kinds of knowledge


ii) Mining knowledge in multidimensional space
iii) Pattern evaluation and pattern or constraint-guided mining.
iv) Handling uncertainty, noise, or incompleteness of data

A) i, ii and iv only
B) ii, iii and iv only
C) i, ii and iii only
D) All i, ii, iii and iv

9. The full form of KDD is ..................

A) Knowledge Database
B) Knowledge Discovery Database
C) Knowledge Data House
D) Knowledge Data Definition

10. The output of KDD is.............

A) Data
B) Information
C) Query
D) Useful information

11. Data scrubbing is which of the following?


A)A process to reject data from the data warehouse and to create the necessary indexes
B)A process to load the data in the data warehouse and to create the necessary indexes
C)A process to upgrade the quality of data after it is moved into a data warehouse
D)A process to upgrade the quality of data before it is moved into a data warehouse

12. The @active data warehouse architecture includes which of the following?

A)At least one data mart


B)Data that can extracted from numerous internal and external sources
C)Near real-time updates
D)All of the above.

13. A goal of data mining includes which of the following?


A)To explain some observed event or condition
B)To confirm that data exists
C)To analyze data for expected relationships
D)To create a new data warehouse

14. An operational system is which of the following?

A)A system that is used to run the business in real time and is based on historical data.
B)A system that is used to run the business in real time and is based on current data.
C)A system that is used to support decision making and is based on current data.
D)A system that is used to support decision making and is based on historical data.

15. A data warehouse is which of the following?

A.Can be updated by end users.


B.Contains numerous naming conventions and formats.
C.Organized around important subject areas.
D.Contains only current data.

16. Data Mining is also referred to as ____________

A) Knowledge discovery in databases


B) Data Cleaning
C) Data extraction
D) Data management

17. Represent any n–Dimension data we need a series of _______ Dimension cubes.
A) (n–1)

B) N

C) n+1
D) n+2
18. Data about data is called _____________
A)Table

B) Database

C) Metadata

D) Integration

9. The full form of KDD is ..................


B) Knowledge Discovery Database
10. The out put of KDD is .............
D) Useful information
11. Answer: Option D
12 .Answer: Option D
13. Answer: Option A
14. Answer: Option B
15. Answer: Option C
16. A
17. D
18.C
UNIT II

1. A collection of one or more items is called as _____


(A)Itemset
(B)Support
(C)Confidence
(D)Support Count

Answer A
2. Frequency of occurrence of an itemset is called as _____
(A)Support
(B)Confidence
(C)Support Count
(D)Rules

Answe C
3. An itemset whose support is greater than or equal to a minimum support threshold is ______
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values

Answer B
4. What does FP growth algorithm do?
(A)It mines all frequent patterns through pruning rules with lesser support
(B)It mines all frequent patterns through pruning rules with higher support
(C)It mines all frequent patterns by constructing a FP tree
(D)It mines all frequent patterns by constructing an itemsets

Answer C
5. What techniques can be used to improve the efficiency of apriori algorithm?
(A)Hash-based techniques
(B)Transaction Increases
(C)Sampling
(D)Cleaning
Answer A
6. What do you mean by support(A)?
(A)Total number of transactions containing A
(B)Total Number of transactions not containing A
(C)Number of transactions containing A / Total number of transactions
(D)Number of transactions not containing A / Total number of transactions

Answer C
7. How do you calculate Confidence (A -> B)?
(A)Support(A #∩# B) / Support (A)
(B)Support(A #∩# B) / Support (B)
(C)Support(A #∪# B) / Support (A)
(D)Support(A #∪# B) / Support (B)

Answer A
8. Which of the following is the direct application of frequent itemset mining?
(A)Social Network Analysis
(B)Market Basket Analysis
(C)Outlier Detection
(D)Intrusion Detection

Answer B
9. What is not true about FP growth algorithms?
(A)It mines frequent itemsets without candidate generation
(B)There are chances that FP trees may not fit in the memory
(C)FP trees are very expensive to build
(D)It expands the original database to build FP trees

Answer D
10. When do you consider an association rule interesting?
(A)If it only satisfies min_support
(B)If it only satisfies min_confidence
(C)If it satisfies both min_support and min_confidence
(D)There are other measures to check so

Answer C
11. What is the relation between a candidate and frequent itemsets?
(A)A candidate itemset is always a frequent itemset
(B)A frequent itemset must be a candidate itemset
(C)No relation between these two
(D)Strong relation with transactions

Answer B
12. Which of the following is not a frequent pattern mining algorithm?
(A)Apriori
(B)FP growth
(C)Decision trees
(D)Eclat

Answer C
13. Which algorithm requires fewer scans of data?
(A)Apriori
(B)FP Growth
(C)Naive Bayes
(D)Decision Trees

Answer B
14. For the question given below consider the data Transactions :

1. I1, I2, I3, I4, I5, I6


2. I7, I2, I3, I4, I5, I6
3. I1, I8, I4, I5
4. I1, I9, I10, I4, I6
5. I10, I2, I4, I11, I5

15. With support as 0.6 find all frequent itemsets?


(A)<I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
(B)<I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
(C)<I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
(D)<I1>, <I4>, <I5>, <I6>

Answer A
16. What will happen if support is reduced?
(A)Number of frequent itemsets remains the same
(B)Some itemsets will add to the current set of frequent itemsets
(C)Some itemsets will become infrequent while others will become frequent
(D)Can not say

Answer B
UNIT III

1) The problem of finding hidden structure in unlabeled data is called…


A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
Ans: B

2) The task of inferring a model from labeled training data is called


A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
Ans: B
3) Some telecommunication company wants to segment their customers into distinct groups in
order to send appropriate subscription offers, this is an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
Ans: D

4) Self-organizing maps are an example of…


A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
D. Missing data imputation
Ans: A

5) You are given data about seismic activity in Japan, and you want to predict the magnitude
of the next earthquake, this is in an example of…
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Ans: A

6) Assume you want to perform supervised learning and to predict a number of newborns
according to the size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it
is an example of …

A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
Ans: B

7) Discriminating between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Ans: A

8) In the example of predicting the number of babies based on storks’ population size, a
number of babies are…

A. outcome
B. feature
C. attribute
D. observation
Ans: A
9) It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.

A. True
B. False
Ans: B

10) which of the following is not involved in data mining?


A. Knowledge extraction
B. Data archaeology
C. Data exploration
D. Data transformation
Ans: D

11) Which is the right approach of Data Mining?


A. Infrastructure, exploration, analysis, interpretation, exploitation
B. Infrastructure, exploration, analysis, exploitation, interpretation
C. Infrastructure, analysis, exploration, interpretation, exploitation
D. Infrastructure, analysis, exploration, exploitation, interpretation
Ans: A

12) Which of the following issue is considered before investing in Data Mining?
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Ans: D

13. Adaptive system management is


A. It uses machine-learning techniques. Here program can learn from past experience and adapt
themselves to new situations
B. A computational procedure that takes some value as input and produces some value as output.
C. Science of making machines performs tasks that would require intelligence when performed
by humans
D. none of these
Ans: A

14. Bayesian classifiers are


A. A class of learning algorithm that tries to find an optimum classification of a set of examples
using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
C. An approach to the design of learning algorithms that are inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D. None of these
Ans: A
15. Algorithm is
A. It uses machine-learning techniques. Here program can learn from past experience and
adapt themselves to new situations
B. A computational procedure that takes some value as input and produces some value as
output
C. Science of making machines performs tasks that would require intelligence when
performed by humans
D. None of these
Ans: B

16. Bias is
A.A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory
B. Any mechanism employed by a learning system to constrain the search space of a
hypothesis
C. An approach to the design of learning algorithms that are inspired by the fact that
when people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: B

17. Background knowledge referred to


A. An additional acquaintance used by a learning algorithm to facilitate the learning
process
B. A neural network that makes use of a hidden layer
C. It is a form of automatic learning.
D. None of these
Ans: A

18. Case-based learning is


A. A class of learning algorithm that tries to find an optimum classification of a set of examples
using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of hypothesis
c. An approach to the design of learning algorithms that are inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D. None of these
Ans: C

19. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: A

20. Binary attribute is


A. This takes only two values. In general, these values will be 0 and 1 and .they can be coded as
one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Ans: A

21. Classification accuracy is


A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain
theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: B

22. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for a machine-
learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be
extracted
D. None of these
Ans: A

23. A definition of a concept is if it recognizes all the instances of that concept


A. Complete
B. Consistent
C. Constant
D. None of these
Ans: A

UNIT IV

1. What is the minimum no. of variables/ features required to perform clustering?


A) 0
B)1
C) 2
D) 3
Solution: (B)

2. it possible that Assignment of observations to clusters does not change between successive iterations
in K-Means
A) Yes
B) No
C) Can’t say
D) None of these
Solution: (A)

3. of the following clustering algorithms suffers from the problem of convergence at local optima?
A) K- Means clustering algorithm
B)Agglomerative clustering algorithm
C) Expectation-Maximization clustering algorithm
D) Diverse clustering algorithm
Solution: (A,C)

4. Which of the following algorithm is most sensitive to outliers?


A) K-means clustering algorithm
B) K-medians clustering algorithm
C) K-modes clustering algorithm
D) K-medoids clustering algorithm
Solution: (A)

5. Which of the following is/are valid iterative strategy for treating missing values before clustering
analysis?
A) Imputation with mean
B) Nearest Neighbor assignment
C) Imputation with Expectation Maximization algorithm
D) All of the above
Solution: (C)

6. What will be the cluster centroids if you want to proceed for second iteration?
A) C1: (4,4), C2: (2,2), C3: (7,7)
B) C1: (6,6), C2: (4,4), C3: (9,9)
C) C1: (2,2), C2: (0,0), C3: (5,5)
D) None of these
Solution: (A)
7. What will be the Manhattan distance for observation (9, 9) from cluster centroid C1. In second
iteration.
A) 10
B) 5*sqrt(2)
C) 13*sqrt(2)
D) None of these
Solution: (A)

Manhattan distance between centroid C1 i.e. (4, 4) and (9, 9) = (9-4) + (9-4) = 10
8. scaling is an important step before applying K-Mean algorithm. What is reason behind this?
A) In distance calculation it will give the same weights for all features
B) You always get the same clusters. If you use or don’t use feature scaling
C) In Manhattan distance it is an important step but in Euclidian it is not
D) None of these
Solution; (A)

9. Which of the following method is used for finding optimal of cluster in K-Mean algorithm?
A) Elbow method
B) Manhattan method
C) Ecludian mehthod
D) All of the above
Solution: (A)

10. Which of the following are the high and low bounds for the existence of F-Score?
A) [0,1]
B) (0,1)
C) [-1,1]
D) None of the above
Solution: (A)

11. In systematic sampling, value of k is classified as


A) sampling interval
B) sub stage interval
C) secondary stage interval
D) multistage interval
Solution: (A)

12. In systematic sampling, population is 200 and selected sample size is 50 then sampling interval is
A) 250
B) 0.25
C) 4
D) 40
a. Solution: (C)
13. In cluster sampling, elements of selected clusters are classified as
A) elementary units
B) primary units
C) secondary units
D) proportional units
Solution: (A)

14. Type of sampling in which desired and useful information is gathered from best position holder is
classified as
A) quota sampling
B) convenience sampling
C) purposive sampling
D) judgment sampling
Solution: (D)

15. Method of sampling in which random sampling will not be possible because population is widely spread
is classified as
A) secondary stage sampling
B) multistage sampling
C) primary stage sampling
D) sub stage sampling
Solution: (B)

16. The most important part of _____ is selecting the variables on which clustering is baseD)

A) interpreting and profiling clusters


B) selecting a clustering procedure
C) assessing the validity of clustering
D) formulating the clustering problem
Solution: (D)

17. The most commonly used measure of similarity is the _____ or its square.
A) euclidean distance
B) city-block distance
C) Chebychev’s distance
D) Manhattan distance
Solution: (A)

18. _____ is a clustering procedure where all objects start out in one giant cluster. Clusters are formed by
dividing this cluster into smaller and smaller clusters.
A) Non-hierarchical clustering
B) Hierarchical clustering
C) Divisive clustering
D) Agglomerative clustering
Solution: (C)
19. The _____ method uses information on all pairs of distances, not merely the minimum or maximum
distances.
A) single linkage
B) medium linkage
C) complete linkage
D) average linkage
Solution: (D)

20. _____ is frequently referred to as k-means clustering.


A) Non-hierarchical clustering
B) Optimizing partitioning
C) Divisive clustering
D) Agglomerative clustering
Solution: (A)

UNIT V

1.A qualitative forecast


A) predicts the quality of a new product.
B) predicts the direction, but not the magnitude, of change in a variable.
C) is a forecast that is classified on a numerical scale from 1 (poor quality) to 10 (perfect quality).
D) is a forecast that is based on econometric methods.

2.Which of the following is not a qualitative forecasting technique?


A) Surveys of consumer expenditure plans
B) Perspectives of foreign advisory councils
C) Consumer intention polling
D) Time-series analysis

3.The first step in time-series analysis is to


A) perform preliminary regression calculations.
B) calculate a moving average.
C) plot the data on a graph.
D) identify relevant correlated variables.

4.Forecasts are referred to as naive if they


A) are based only on past values of the variable.
B) are short-term forecasts.
C) are long-term forecasts.
D) generally result in incorrect forecasts.

5.Time-series analysis is based on the assumption that


A) random error terms are normally distributed.
B) there are dependable correlations between the variable to be forecast and other independent variables.
C) past patterns in the variable to be forecast will continue unchanged into the future.
D) the data do not exhibit a trend.

6.Which of the following is not one of the four types of variation that is estimated in time-series analysis?
A) Predictable
B) Trend
C) Cyclical
D) Irregular

7.The cyclical component of time-series data is usually estimated using


A) linear regression analysis.
B) moving averages.
C) exponential smoothing.
D) qualitative methods.

8.In time-series analysis, which source of variation can be estimated by the ratio-to-trend method?
A) Cyclical
B) Trend
C) Seasonal
D) Irregular

9.If regression analysis is used to estimate the linear relationship between the natural logarithm of the
variable to be forecast and time, then the slope estimate is equal to
A) the linear trend.
B) the natural logarithm of the rate of growth.
C) the natural logarithm of one plus the rate of growth.
D) the natural logarithm of the square root of the rate of growth.

10.The use of a smoothing technique is appropriate when


A) random behavior is the primary source of variation.
B) seasonality is present.
C) data exhibit a strong trend.
D) all of the above are correct.

11.The greatest smoothing effect is obtained by using


A) a moving average based on a small number of periods.
B) exponential smoothing with a small weight value.
C) the root-mean-square error.
D) the barometric method.

12.The root-mean-square error is a measure of


A) sample size.
B) moving average periods.
C) exponential smoothing.
D) forecast accuracy.

13.Barometric methods are used to forecast


A) seasonal variation.
B) secular trend.
C) cyclical variation.
D) irregular variation.

14.A leading indicator is a measure that usually


A) changes at the same time and in the same direction as the general economy.
B) responds to a change in the general economy after a time lag.
C) changes in the same direction as the general economy before the general economy changes.
D) has all of the properties listed above.

15.If 3 of the leading indicators move up, 2 move down, and the remaining 6 are constant, then the diffusion
index is
A) 3/6 = 50%
B) 3/11 = 27%
C) 5/11 = 45%
D) 6/11 = 55%

16. A single-equation econometric model of the demand for a product is a ________ equation in which
the quantity demanded of the product is an ________ variable.
A) structural, exogenous
B) structural, endogenous
C) definitional, exogenous
D) definitional, endogenous

17.A reduced form equation expresses


A) an exogenous variable as a function of endogenous variables.
B) an endogenous variable as a function of exogenous variables.
C) an exogenous variable as a function of both endogenous and exogenous variables.
D) an endogenous variable as a function of both exogenous and endogenous variables.

18.Trend projection is an example of which kind of forecasting?


A) Qualitative
B) Time-series
C) Barometric
D) Econometric

19.Turning points in the level of economic activity can be forecast by using


A) Time-series analysis
B) Exponential smoothing
C) Barometric methods
D) Moving average

20.Econometric forecasts required


A) accurate estimates of the coefficients of structural equations.
B) forecasts of future values of exogenous variables.
C) appropriate theoretical models.
D) all of the above.

Answers
1. A qualitative forecast
The correct answer was: b. predicts the direction, but not the magnitude, of change in a variable.
2. Which of the following is not a qualitative forecasting technique?
The correct answer was: d. Time-series analysis.
3. The first step in time-series analysis is to
The correct answer was: c. plot the data on a graph..
4. Forecasts are referred to as naive if they
The correct answer was: a. are based only on past values of the variable..
5. Time-series analysis is based on the assumption that
The correct answer was: c. past patterns in the variable to be forecast will continue unchanged into
the future..
6. Which of the following is not one of the four types of variation that is estimated in time-series
analysis?
The correct answer was: a. Predictable.
7. The cyclical component of time-series data is usually estimated using
The correct answer was: d. qualitative methods..
8. In time-series analysis, which source of variation can be estimated by the ratio-to-trend method?
The correct answer was: c. Seasonal.
9. If regression analysis is used to estimate the linear relationship between the natural logarithm of
the variable to be forecast and time, then the slope estimate is equal to
The correct answer was: c. the natural logarithm of one plus the rate of growth..
10. The use of a smoothing technique is appropriate when
The correct answer was: a. random behavior is the primary source of variation..
11. The greatest smoothing effect is obtained by using
The correct answer was: b. exponential smoothing with a small weight value..
12. The root-mean-square error is a measure of
The correct answer was: d. forecast accuracy..
13. Barometric methods are used to forecast
The correct answer was: c. cyclical variation..
14. A leading indicator is a measure that usually
The correct answer was: c. changes in the same direction as the general economy before the general
economy changes..
15. If 3 of the leading indicators move up, 2 move down, and the remaining 6 are constant, then the
diffusion index is
The correct answer was: b. 3/11 = 27%.
16. A single-equation econometric model of the demand for a product is a ________ equation in
which the quantity demanded of the product is an ________ variable.
The correct answer was: b. structural, endogenous.
17. A reduced form equation expresses
The correct answer was: b. an endogenous variable as a function of exogenous variables..
18. Trend projection is an example of which kind of forecasting?
The correct answer was: b. Time-series.
19. Turning points in the level of economic activity can be forecast by using
The correct answer was: c. Barometric methods.
20. Econometric forecasts require
The correct answer was: d. all of the above..

You might also like