ACADEMY OF TECHNOLOGY
Subject (Code): Data Mining & Analytics (PEC-CSBS601D)
Question Bank (Part-1)
Semester/Branch: 6th Sem/ CSBS
1. The output of KDD is
(a) Data (b) Information (c) Knowledge (d) None of these
2. Which of the following is the right approach to Data Mining?
(a) Infrastructure, exploration, analysis, exploitation, interpretation
(b) Infrastructure, exploration, analysis, interpretation, exploitation
(c) Infrastructure, analysis, exploration, interpretation, exploitation
(d) None of these
3. Which of the following applied to the warehouse?
a) Write only b) Read only c) Both (a) and (b) d) None of these
4. Subjective-specific smaller local data warehouse is called ______________
a) data model b) database c) data mart d) metadata
5. To detect fraudulent usage of credit cards, the following data mining task should be used
a) Feature selection b) Association analysis c) Prediction d) Outlier analysis
6. Which one of the following can be defined as the data object that does not comply
with the general behavior (or the model of available data)?
a) Evaluation Analysis b) Outlier Analysis c) Classification d) Prediction
7. ___________ is the output of the KDD process.
(a) Query (b) Useful information (c) Information (d) Data
8. Data cleaning is
(a) Large collection of data mostly stored in a computer system
(b) The removal of noise errors and incorrect input from a database
(c) The systematic description of the syntactic structure of a specific database. It describes
the structure of the attributes the tables and foreign key relationships.
(d) None of these
9. Which of the following refers to the steps of the knowledge discovery process, in
which of the several data sources are combined?
(a) Data selection (b) Data cleaning (c) Data transformation (d) Data Integration
10. Which one of the following can be considered as the correct application of data mining?
(a) Fraud detection (b) Corporate Analysis & Risk management
(c) Management & market analysis (d) All of the above
11. __________________ is data about data.
(a) Metadata (b) Microdata (c) Minidata (d) Multidata
12. What do data warehouses support?
(a) OLTP (b) OLAP (c) Operational databases (d) All of the above
13. A data warehouse is said to contain a ‘time-varying’ collection of data because:
(a) Its contents vary automatically with time (b) Its life-span is very limited
(c) It contains historical data (d) Its content has an explicit time-stamp
14. A data warehouse is an ‘integrated’ collection of data because:
(a) It is a collection of data of different types
(b) It is a collection of data derived from multiple sources
(c) It is a relational database
(d) It contains summarized data
15.______________is an essential process where intelligent methods are applied to extract data
patterns.
(a) Data warehousing (b) Data mining (c) Text mining (d) Data selection
16. The _______ data are stored in a data warehouse.
(a) Operational (b) Historical (c) Transactional (d) Optimized
17. Which of the following terms is used as a synonym for data mining?
(a) Knowledge discovery in databases (b) Data warehousing
(c) Regression analysis (d) Parallel processing in databases
18. The full form of OLAP is:
(a) Online Analytical Processing (b) Online Advanced Processing
(c) Online Advanced Preparation (d) Online Analytical Performance
19. The term "DMQL" stands for _____
a) Data Marts Query Language
b) DBMiner Query Language
c) Data Mining Query Language
d) None of the above
20. Data Warehousing is defined as________
a) A subject-oriented integrated time variant non-volatile collection of data in support
of management.
b) Selecting the right dataset for KDD
c) The real discovery stage of the KDD process
d) All of these
21. Expansion for DSS in DW is__________
a) Decision support system
b) Decision single system
c) Data storable system
d) Data support system
22. What are the functions of data mining?
a) Association analysis b) Correlation analysis
c) Prediction analysis d) All above
23. The content of a data warehouse is said to be ‘non-volatile’, because
a) It remains the same even after the system crashes
b) Its life-span is very limited
c) It is a read-only data
d) It disappears when the system is switched off
24. The important aspect of the data warehouse environment is that data found within the data
warehouse is
a) Subject-oriented b) Time-variant c) Integrated d) All of the above
25. What does the acronym ETL stands for?
a) Explain, transfer and lead
b) Extract, transform and load
c) Extract, transfer and load
d) Effect, transfer and load
26. State whether the following statements about the OLTP system are True.
i) Clerk, database administrators, and database professionals are the users of
the OLTP system.
ii) It is used on long-term informational requirements.
iii) It has a short and simple transaction.
a) (i) and (ii) only
b) (ii) and (iii) only
c) (i) and (iii) only
d) All of the above
27. State whether the following statements about the OLAP system are True or False.
i) Knowledge workers such as managers, and executive analysts are the
users of the OLAP system.
ii) This system is used in day-to-day operations.
iii) The database size of the OLAP system will be 100GB to TB.
a) i) True, ii) False, iii) True
b) i) False, ii) True, iii) True
c) i) True, ii) True, iii) False
d) i) False, ii) False, iii) True
28. Which among the following techniques can be used to aid decision-making when those
decisions depend upon some available data?
a) descriptive statistics
b) prescriptive analytics
c) predictive analytics
d) All of the above
29. Point out the correct statement.
a) Raw data is the original source of data
b) Pre-processed data is the original source of data
c) Raw data is the data obtained after processing steps
d) None of the mentioned
30. Which of the following is performed by a Data Scientist?
a) Define the question
b) Create reproducible code
c) Challenge results
d) All of the mentioned
31. What does the acronym “EDA” stand for in Data Mining?
a) Effective Data Algorithms
b) Extracted Data Aggregation
c) Exploratory Data Analysis
d) Efficient Data Assessment
32. What is the purpose of feature engineering in data preprocessing?
a) To make data more complex
b) To extract meaningful features
c) To perform data cleaning
d) To visualize data
33. Which of the following is used as the first step in the knowledge discovery process?
a) Data selection
b) Data cleaning
c) Data transformation
d) Data integration
34. What is data mining?
a) Deleting unnecessary data
b) Sorting data alphabetically
c) Storing data securely
d) Extracting useful patterns or information from large datasets
35. The median of the following data:
8, 15, 53, 49, 19, 62, 7, 15, 95, 77.
a) 19 b) 62 c) 34 d) 40
36. The mean of 88, 72, 33, 29, 70, 54, 86, 91, 57, 61
a) 61 b) 64.1 c) 70 d) 65.5
37. The median of 88, 72, 33, 29, 70, 54, 86, 91, 57, 61
a) 61 b) 64.1 c) 70 d) 65.5
38. The mode of 7, 4, 3, 5, 6, 3, 3, 2, 4, 3, 4, 3, 3, 4, 4, 3, 4, 3, 4, 3, 1, 2, 3
a) 1 b) 2 c) 3 d) 4
39. The mode of the following data: 2, 1, 3, 2, 1, 5, 2, 2, 1, 6, 4, 2, 1, 3 is
a) 5 b) 3 c) 1 d) 2
40. A.M. of 8, 1, 6 with frequencies 3, 2, 5 respectively is
a) 5 b) 1 c) 5.6 d) 10
41. The numbers 3.2, 5.8, 7.9 and 4.5 have frequencies x, (x+2), (x-3) and (x+6)
respectively. If A.M. is 4.876, the value of x:
a) 4.876 b) 5 c) 5.8 d) 6
42. A list of 5 pulse rates is: 70, 64, 80, 74, 92. What is the median for this list?
a) 80 b) 76 c) 74 d) 64
43. The mode of the observations: 5, 3, 6, 3, 5, 6.
a) 3 b) 5 c) 3, 5, 6 d) No mode exists
44. If mean and mode of some data are 4 & 10 respectively. Its median will be:
a) 1.5 b) 5.3 c) 6 d) 16
45. To find the _______________ you put all numbers in order from least to greatest and
find the number that is in the middle.
a) Median b) Mode c) Mean d) Range
46. Choose the correct keyword for the term:
A graphical representation of a data set
a) Data set b) Investigative Cycle c) Visualization d) None
47. Which of the following is an example of a continuous attribute?
a) Height of a person b) Name of a person c) Gender of a person d) None
48. The earliest step in the data mining process is usually?
a) Visualization b) Preprocessing c) Modelling d) Deployment
49. Which of the following empirical relation is valid for moderately skewed data?
a) mean – mode ≈ 2 X (mean – median)
b) mean – mode ≈ 3 X (mean – median)
c) mean – mode ≈ 4 X (mean – median)
d) mean – mode ≈ mean – median
50. The full form of KDD is __________________.
a) Knowledge database
b) Knowledge discovery in database
c) Knowledge data house
d) Knowledge data definition