1
Data Mining Objectives
2020
1.The full form of KDD is
(a) Knowledge Database
(b) Knowledge discovery in databases
(c) Knowledge data division
(d) Knowledge data definition
Ans- b
2.You are given data about seismic activity in Japan, and you want to predict a magnitude of the next
earthquake, this is an example of
(a) Supervised Learning
(b) Unsupervised Learning
(c) Serration
(d) dimensionality reduction
Ans- a
3.Which of the following does not involve in data mining?
(a) Knowledge extraction
(b) Data archaeology
(c) Data exploration
(d) Data transformation
Ans- d
4. ____ is a comparison of the general features of the target class data objects against the general
features of objects from one or multiple contrasting classes
(a) Data Characterization
(b) Data classification
(c) Data discrimination
(d)Data selection
Ans- c
5. Bayesian classifiers is
(a) a class of learning algorithm that tries to find an optimum classification of a set of examples using
the probabilistic theory
(b) any mechanism employed by a learning system to constrain the search space of a hypothesis
2
(c) an approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to familiar experiences, adapting
the explanations to fit the new situation
(d) None of the above
Ans- a
6. The output of KDD is
(a) Data
(b) Information
(c) Query
(d) Useful information
Ans-d
7. Cluster is
(a) Group of similar objects that differ significantly from other objects
(b) Operations on a database to transform or simplify data in order to prepare it for a machine
learning algorithm
(c) Symbol of Representation of facts or ideas from which information can potentially be extracted
(d) None of the above
Ans- a
8. Background Knowledge referred to
(a) Additional acquaintance used by a learning algorithm to facilitate the learning process
(b) a neural network that makes use of a hidden layer
(c) it is a form of automatic learning
(d) None of the above
Ans- a
9. Case-based learning is
(a) A class of learning algorithm that tries to find an optimum classification of a set of examples using
the probabilistic theory
(b) Any mechanism employed by a learning system to constrain the search space of a hypothesis
(c) An approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to the familiar experiences,
adapting the explanations to fit the new situation
(d) None of the above
Ans- c
10. Some telecommunication companies want to segment their customers into distinct groups in
order to send appropriate subscription offers this is an example of
3
(a) Supervised Learning
(b) Data extraction
(c) Serration
(d) Unsupervised Learning
Ans- d
2017
11. An ……………… system is market-oriented and is used for data analysis by knowledge workers, including
managers, executives, and analysts.
(a) OLAP
(b) OLTP
(c) Both of the above
(d) None of the above
Ans- a
12. Which of the following is not a kind of data warehouse application
(a) Information Processing
(b) Analytical Processing
(c) Data Mining
(d) Transaction Processing
Ans- d
13. Data can be frequently Updated in …………. environment
(a) Data Warehouse
(b) Data Mining
(c) Operational
(d) Informational
Ans- c
14. ……… is data about data.
(a) Metadata
(b) Microdata
(c) Minidata
(d) Multidata
Ans- a
15. Which of the following is required K -means clustering?
(a) Define distance matric
(b) Number of clusters
4
(c) Initial guess as to cluster centroids
(d) All of the above
Ans- d
16. The sigmoid also known as ……….. functions.
(a) Regression
(b) Logistic
(c) Probability
(d) Neural
Ans- b
17. A FP – Tree Growth Algorithm can be implemented ………... Phases.
(a) One
(b) Two
(c) Three
(d) Five
Ans- Two
18. DENCLUE clustering method is of type
(a) Partitioning
(b) Hierarchical
(c) Density Based
(d) Grid Based
Ans- c
19. STING in grid-based multi resolution clustering stands for
(a) Statistical information Grid
(b) Statistics in Geometric Data
(c) Standardization of Geometric
(d) None of the above
Ans- a
20. A tree structure called a dendrogram is commonly used in
(a) Partitioning Base Clustering
(b) Hierarchical clustering
(c) Model-based clustering
(d) All of the above
Ans- b
5
2016
21. An operational system is which of the following?
(a) A system that is used to turn the business in real time and is based on historical data
(b) A system that is used to run the business in real time and is based on current data
(c) A system that is used to support decision making and is based on current data
(d) A system that is used to support decision making and is based on historical data
Ans- b
22. A star schema has what type of relationship between a dimension and fact table
(a) Many to Many
(b) One to One
(c) One to many
(d) All of the above
Ans- c
23. A data warehouse is which of the following?
(a) Can be updated by end users
(b) Contains numerous naming conventions and formats
(c) Organized around important subject areas
(d) Contains only current data
Ans- c
24. Which of the following schema contains multiple fact tables?
(a) Star schema
(b) Snowflake schema
(c) Fact constellation schema
(d) All of the above
Ans- c
25. The ……. operation performs a selection on One dimension of the given cube, resulting in a subcube
(a) pivot
(b) slice
(c) roll-up
(d) drill down
Ans- b
26. The process of partitioning the ranges of quantitative attributes into intervals, is called
6
(a) Splitting
(b) grouping
(c) binning
(d) None of the above
Ans- c
27. OPTICS clustering method is
(a) Partitioning method
(b) grid-based method
(c) hierarchical method
(d) density-based method
Ans- d
28. A prior Algorithm forms frequent k-itemset candidates based on the
(a) frequent (k-5) itemsets
(b) frequent (5-3) itemsets
(c) frequent (5-2) itemsets
(d) frequent (5-1) itemsets
Ans- d
29. ID3, C4.5 and CART are used in
(a) Association rules generation
(b) Decision trees
(c) clustering
(d) Web mining
Ans- b
30. CLARANS stands for
Ans- -Clustering Large Applications based on RANdomized Search
2015
31. Data mining is also referred to as
(a) Knowledge discovery in databases
(b) data cleaning
(c) data extraction
(d) data management
7
Ans- a
32. Data about data is called
(a) table
(b) database
(c) metadata
(d) integration
Ans- c
33. To represent any n-dimension data we need a series of ………. Dimension cubes.
(a) (n-1)
(b) n
(c) n+1
(d) n+2
Ans- a
34. The ……... operation performs a selection on one dimension of the given cube, resulting in a subcube.
(a) pivot
(b) slice
(c) roll-up
(d) drill-down
Ans- b
35. ________ serves support multidimensional views of data through array-based multidimensional storage
engines.
(a) ROLAP
(b) MOLAP
(c) Data warehouse
(d) database
Ans- b
36. The ______ software gives the user the opportunity to look at the data from a variety of different
dimensions.
(a) query tools
(b) multidimensional analysis
(c) data mining tools
(d) None of the above
Ans- b
8
37. _____ Techniques can be used to reduce the number of values for a given continuous attribute by
dividing the range of the attribute into two intervals
(a) Discretization
(b) Transformation
(c) Smoothing
(d) Generalization
Ans- a
38. FP tree growth algorithm can be implemented in
(a) one phase
(b) two phases
(c) Three phases
(d) four phases
Ans- a
39. Consider a scenario where a bin contains values 4,8 and 15. if smoothing by bin-means method is
applied to clean the data then each of the original value in the bin will be replaced by
(a) 8
(b) 9
(c) 15
(d) 4
Ans- b
40. ____ is simple text files that are automatically generated every time someone accesses one web site.
(a) server session
(b) Log file
(c) User session
(d) None of the above
Ans- b