Questions:
1. Skewness of a dataset measures:
a) Central Tendency
b) Spread of Data
c) Asymmetry of Distribution
d) Correlation Strength
Answer: c
2. Which Python library is best suited for kernel density estimation plots?
a) NumPy
b) Pandas
c) Seaborn
d) Matplotlib
Answer: c
3. ROC curve is used to measure:
a) Feature correlation
b) Classifier performance
c) Clustering efficiency
d) Regression error
Answer: b
4. What is the output of the following code?
x = [1, 2, 3]
y = x
y += [4, 5]
print(x)
a) [1, 2, 3]
b) [1, 2, 3, 4, 5]
c) Error
d) [4, 5]
Answer: b
5. What is "feature engineering"?
a) Testing the model
b) Creating new input variables from existing data
c) Building a user interface
d) Deploying the model
Answer: b
6. Which visualization technique is most suitable for outlier detection?
a) Histogram
b) Box Plot
c) Heatmap
d) Scatter Matrix
Answer: b
7. Cross-validation is used primarily to:
a) Reduce bias
b) Prevent overfitting
c) Increase dataset size
d) Normalize data
Answer: b
8. Which of the following is an unsupervised learning model?
a) Linear Regression
b) KNN
c) PCA
d) Logistic Regression
Answer: c
9. A/B testing is an example of:
a) Parametric Regression
b) Hypothesis Testing
c) Clustering
d) Feature Engineering
Answer: b
10. Kurtosis is used to measure:
a) Skewness of distribution
b) Shape/Peakedness of distribution
c) Central tendency
d) Feature correlation
Answer: b
11. In classification, the contingency table is also known as:
a) Covariance Matrix
b) Confusion Matrix
c) PCA Matrix
d) Regression Table
Answer: b
12. Which of the following distributions has memoryless property?
a) Normal
b) Exponential
c) Poisson
d) Binomial
Answer: b
13. The distribution of the sample variance of a normally distributed population
follows:
a) Normal Distribution
b) Chi-Square Distribution
c) t Distribution
d) F Distribution
Answer: b
14. In hypothesis testing, Type II error occurs when:
a) Rejecting H₀ when H₀ is true
b) Accepting H₀ when H₀ is false
c) Rejecting H₁ when H₁ is false
d) Rejecting both H₀ and H₁
Answer: b
15. The distribution of the ratio of two independent chi-squares divided by their degrees
of freedom is:
a) t Distribution
b) F Distribution
c) Normal Distribution
d) Exponential Distribution
Answer: b
16. Which test is suitable for testing independence of two categorical variables?
a) Z test
b) F test
c) Chi-Square test
d) t test
Answer: c
17. In NoSQL, a document-oriented database stores data in:
a) Tables and columns
b) Rows and joins
c) JSON-like key-value pairs
d) Graphs
Answer: c
18. ACID property 'Durability' ensures:
a) Transaction changes are visible to others immediately
b) Changes are permanent even after system crash
c) All concurrent transactions are serialized
d) Intermediate results are stored
Answer: b
19. In MongoDB, data is stored in:
a) Tables
b) Collections of JSON-like documents
c) Flat files
d) Tuples and relations
Answer: b
20. What is the default "_id" field in MongoDB used for?
a) Indexing only text fields
b) Unique primary key for each document
c) Foreign key linking
d) Reference to external file
Answer: b
21. What does db.collection.find({}).pretty() do?
a) Finds documents and outputs raw JSON
b) Inserts pretty-printed JSON
c) Displays documents in readable format
d) Lists only schema
Answer: c
22. To retrieve documents where field "age" > 25, which query is correct?
a) db.users.find({"age": ">25"})
b) db.users.find({age: {$gt: 25}})
c) db.users.find({age: $greater(25)})
d) db.users.get({age > 25})
Answer: b
23. Cassandra follows which type of database architecture?
a) Master-slave
b) Peer-to-peer
c) Hierarchical
d) Client-server
Answer: b
24. In Cassandra, a tombstone is:
a) A pointer to last write timestamp
b) A deleted row marker
c) A snapshot
d) A replica heartbeat
Answer: b
25. What will be the output of the expression type of (5L) in R?
a) "double"
b) "numeric"
c) "integer"
d) "character"
Answer: c
26. Which of the following is a major task in data mining?
a) Data Normalization
b) Data Indexing
c) Pattern Discovery
d) Query Processing
Answer: c
27. In k-means clustering, which of the following is a limitation?
a) Handles non-numeric data well
b) Finds global optimum
c) Sensitive to initial centroids
d) Suitable for categorical data
Answer: c
28. What is the key difference between classification and clustering?
a) Classification uses labeled data; clustering doesn’t
b) Clustering uses supervised learning
c) Classification identifies structure
d) Clustering uses dependent variables
Answer: a
29. The major drawback of Apriori algorithm is:
a) Cannot find association rules
b) Requires numeric data only
c) Expensive candidate generation
d) Only works for continuous data
Answer: c
30. Entropy in a dataset refers to:
a) Number of classes
b) Purity of the data
c) Error rate
d) Margin of classifier
Answer: b
31. What is the output of the expression list(range(2, 10, 2))?
a) [2, 3, 4, 5, 6, 7, 8, 9]
b) [2, 4, 6, 8]
c) [2, 6, 10]
d) [2, 5, 8]
Answer: b
32. Which of the following Python libraries is best suited for handling labeled data and
time series?
a) NumPy
b) Pandas
c) Matplotlib
d) Scikit-learn
Answer: b
33. Which NumPy function returns the standard deviation of an array?
a) stddev()
b) sd()
c) std()
d) var()
Answer: c
34. Which of the following libraries is used for machine learning in Python?
a) NumPy
b) Matplotlib
c) Seaborn
d) Scikit-learn
Answer: d
35. Which method is used to detect null values in pandas?
a) isna()
b) notnull()
c) isnan()
d) isnull()
Answer: d
36. What is the purpose of ACID properties in databases?
a) Speed
b) Backup
c) Transaction reliability
d) Schema design
Answer: c