UNIT I
1. Need for Data Science
1. What is the primary purpose of data science?
A) Store data
B) Analyze and extract insights from data
C) Replace humans in decision-making
D) Develop hardware
Answer: B
2. Why has data science gained importance in recent years?
A) Increase in data availability
B) Decrease in computing power
C) Elimination of the internet
D) Reduced need for programming
Answer: A
3. Which industry heavily relies on data science for customer behavior analysis?
A) Banking
B) Retail
C) Healthcare
D) All of the above
Answer: D
4. What type of data is essential for data science?
A) Structured data only
B) Unstructured data only
C) Both structured and unstructured data
D) No data is required
Answer: C
5. Which of these roles is closely related to data science?
A) Web developer
B) Data analyst
C) Graphic designer
D) Network administrator
Answer: B
2. Benefits and Uses of Data Science
6. How does data science benefit organizations?
A) Reduces decision-making errors
B) Increases complexity in tasks
C) Replaces manual work with spreadsheets
D) Eliminates data collection needs
Answer: A
7. Which of these is NOT a use of data science?
A) Fraud detection
B) Data storage
C) Predictive modeling
D) Customer segmentation
Answer: B
8. What is one benefit of applying data science in healthcare?
A) Higher patient privacy violations
B) Personalized medicine recommendations
C) Increased medication costs
D) Reduced focus on patient care
Answer: B
9. Which sector uses data science for inventory management?
A) Retail
B) Transportation
C) Education
D) Law
Answer: A
10. What is an outcome of implementing data science in marketing?
A) Predicting customer churn
B) Increasing email spam
C) Lowering marketing efficiency
D) Decreasing customer satisfaction
Answer: A
3. Facets of Data
11. What are the facets of data?
A) Volume, Velocity, Variety, Veracity
B) Length, Width, Height
C) Shape, Texture, Density
D) None of the above
Answer: A
12. Which facet of data represents the speed at which data is generated?
A) Volume
B) Velocity
C) Variety
D) Veracity
Answer: B
13. What does "Variety" in data facets signify?
A) Quality of data
B) Different formats and types of data
C) Speed of data collection
D) Accuracy of data analysis
Answer: B
14. What is the challenge with the "Veracity" of data?
A) High cost of storing data
B) Inaccuracy and inconsistency of data
C) Too much data to analyze
D) Data moving too quickly
Answer: B
15. What does "Volume" in data refer to?
A) The size of data
B) The speed of data generation
C) The variety of data types
D) The reliability of data
Answer: A
4. Data Science Process
16. What is the first step in the data science process?
A) Data modeling
B) Data visualization
C) Problem definition
D) Model deployment
Answer: C
17. Which step involves cleaning and preprocessing data?
A) Data collection
B) Data preparation
C) Data analysis
D) Model evaluation
Answer: B
18. What does the data modeling step involve?
A) Generating hypotheses
B) Building algorithms to identify patterns
C) Visualizing insights
D) Collecting data
Answer: B
19. In which stage is data visualization used?
A) Preprocessing
B) Model building
C) Results interpretation
D) Data storage
Answer: C
20. What happens during the deployment phase?
A) The final model is put into production
B) Data is cleaned and structured
C) Features are engineered
D) Data is visualized
Answer: A
5. Basics of Python
21. Which file extension is used for Python scripts?
A) .py
B) .txt
C) .java
D) .exe
Answer: A
22. Which function is used to print output in Python?
A) output()
B) print()
C) display()
D) show()
Answer: B
23. How do you declare a variable in Python?
A) let x = 5
B) int x = 5
C) x = 5
D) declare x = 5
Answer: C
24. Which data type is mutable in Python?
A) Tuple
B) String
C) List
D) Integer
Answer: C
25. What is the result of 10 // 3 in Python?
A) 3.33
B) 3
C) 10.0
D) None
Answer: B
6. Setting Working Directory
26. Which library is commonly used to set the working directory in Python?
A) os
B) math
C) random
D) re
Answer: A
27. What function sets the working directory?
A) set_dir()
B) os.chdir()
C) os.getdir()
D) os.mkdir()
Answer: B
28. How do you check the current working directory?
A) os.checkdir()
B) os.getcwd()
C) os.curdir()
D) os.dir()
Answer: B
7. File Execution
29. How do you execute a Python script?
A) Run it in Notepad
B) Double-click the file
C) Use python script.py in the terminal
D) Compile it
Answer: C
30. Which IDE is commonly used for executing Python code?
A) Eclipse
B) PyCharm
C) IntelliJ
D) NetBeans
Answer: B
8. Variable Management
31. How do you delete a variable in Python?
A) del variable_name
B) remove variable_name
C) delete variable_name
D) clear variable_name
Answer: A
32. What is used to clear all variables in Python?
A) os.clear()
B) %reset
C) del all()
D) None of the above
Answer: B
9. Commenting Script Files
33. How do you write single-line comments in Python?
A) /* comment */
B) // comment
C) # comment
D) <!-- comment -->
Answer: C
10. Data Types and Operators
34. Which is a numeric data type in Python?
A) int
B) str
C) list
D) dict
Answer: A
35. What does True and False evaluate to?
A) True
B) False
C) None
D) Error
Answer: B
36. What operator is used for exponentiation?
A) ^
B) **
C) %
D) //
Answer: B
UNIT II
1. Control Structures
1. What is the purpose of control structures in programming?
A) Organize data
B) Control the flow of execution
C) Store large data sets
D) None of the above
Answer: B
2. Which of these is a conditional control structure in Python?
A) for
B) if-else
C) while
D) break
Answer: B
3. What does the elif keyword represent in Python?
A) End of a loop
B) Else if
C) Initiates a loop
D) None of the above
Answer: B
4. Which of the following is NOT a valid control structure?
A) if
B) elif
C) switch
D) None of the above
Answer: C
5. What is the default control flow in Python?
A) Sequential execution
B) Parallel execution
C) Random execution
D) Iterative execution
Answer: A
2. Loops
6. Which keyword is used to terminate a loop prematurely?
A) pass
B) break
C) continue
D) stop
Answer: B
7. What is the purpose of the continue keyword in loops?
A) Ends the loop
B) Skips the current iteration and proceeds to the next
C) Stops execution completely
D) Executes the loop condition again
Answer: B
8. Which loop is best for iterating over a range of numbers?
A) while
B) for
C) do-while
D) None of the above
Answer: B
9. What is the output of the following code?
python
Copy code
for i in range(3):
print(i)
A) 1 2 3
B) 0 1 2
C) 0 1 2 3
D) None of the above
Answer: B
10. What happens when the else block is used with a loop?
A) Runs only if the loop executes at least once
B) Executes when the loop condition is false
C) Skips to the next loop iteration
D) Only works with while loops
Answer: B
3. Functions
11. Which keyword is used to define a function in Python?
A) func
B) function
C) def
D) define
Answer: C
12. What is the purpose of the return statement in functions?
A) To end a function
B) To pass back a value to the caller
C) To call another function
D) None of the above
Answer: B
13. Which of the following is NOT a valid function parameter type?
A) Positional
B) Keyword
C) Default
D) Constant
Answer: D
14. What is a lambda function in Python?
A) A function defined inside another function
B) An anonymous, inline function
C) A recursive function
D) None of the above
Answer: B
15. How do you call a function named my_func in Python?
A) call my_func()
B) my_func()
C) def my_func()
D) execute my_func()
Answer: B
4. Data Structures
16. Which data structure is mutable in Python?
A) List
B) Tuple
C) String
D) None of the above
Answer: A
17. How do you access the first element of a list named my_list?
A) my_list[0]
B) my_list(0)
C) my_list[1]
D) my_list.first()
Answer: A
18. What is a tuple?
A) An immutable list
B) A mutable dictionary
C) A mutable set
D) None of the above
Answer: A
19. What method is used to add an element to a set?
A) add()
B) append()
C) insert()
D) push()
Answer: A
20. Which of the following is a valid key type in a dictionary?
A) Integer
B) String
C) Tuple
D) All of the above
Answer: D
5. NumPy Library
21. What is the primary purpose of the NumPy library?
A) Data visualization
B) Numerical computing
C) Text processing
D) Web development
Answer: B
22. How do you import NumPy in Python?
A) import numpy as np
B) include numpy
C) require numpy
D) import np
Answer: A
23. Which function creates an array of zeros in NumPy?
A) zeros()
B) empty()
C) ones()
D) array()
Answer: A
24. What is the shape of the following NumPy array?
python
Copy code
np.array([[1, 2], [3, 4]])
A) (2, 2)
B) (1, 4)
C) (4,)
D) None of the above
Answer: A
25. What is the difference between a list and a NumPy array?
A) NumPy arrays are slower than lists
B) NumPy arrays support vectorized operations
C) Lists are immutable
D) None of the above
Answer: B
6. Data Collection and Types
26. What is primary data?
A) Data collected by someone else
B) Data collected firsthand
C) Data from online sources
D) None of the above
Answer: B
27. Which of these is an example of structured data?
A) Audio files
B) Spreadsheets
C) Videos
D) Images
Answer: B
28. What type of data is “age in years”?
A) Categorical
B) Numerical
C) Ordinal
D) None of the above
Answer: B
29. Which method is NOT used for data collection?
A) Surveys
B) Experiments
C) Data cleaning
D) Interviews
Answer: C
30. What is metadata?
A) Data about data
B) Processed data
C) Data stored in arrays
D) Data visualizations
Answer: A
7. Data Preprocessing
31. Which of the following is a step in data preprocessing?
A) Data cleaning
B) Data visualization
C) Data modeling
D) Model deployment
Answer: A
32. What is the purpose of feature scaling?
A) Normalize data range
B) Add more features
C) Increase the dataset size
D) None of the above
Answer: A
33. What does one-hot encoding do?
A) Handles missing values
B) Encodes categorical variables
C) Scales numerical features
D) None of the above
Answer: B
8. Exploratory Data Analysis (EDA)
34. What is the primary goal of EDA?
A) Build predictive models
B) Summarize and visualize data
C) Collect data
D) Scale data
Answer: B
35. Which library is commonly used for data visualization in Python?
A) NumPy
B) Matplotlib
C) os
D) random
Answer: B
36. What does a boxplot visualize?
A) Relationships between variables
B) Distribution and outliers
C) Missing values
D) Categorical data
Answer: B
Unit III
1. Descriptive Statistics
1. What is the purpose of descriptive statistics?
A) Predict future outcomes
B) Summarize and describe data
C) Test hypotheses
D) Explore relationships between variables
Answer: B
2. Which of the following is NOT a measure of central tendency?
A) Mean
B) Median
C) Mode
D) Standard Deviation
Answer: D
3. Which of the following measures dispersion in a dataset?
A) Mean
B) Range
C) Mode
D) Median
Answer: B
4. The interquartile range (IQR) is calculated as:
A) Q1 - Q3
B) Q3 - Q1
C) Mean - Median
D) Median - Mode
Answer: B
5. What does the term "outlier" refer to in a dataset?
A) The average value
B) Values significantly different from others
C) The middle value
D) A value that repeats often
Answer: B
2. Mean
6. How is the mean calculated?
A) Sum of all values divided by the number of values
B) Middle value in a dataset
C) Most frequent value
D) Difference between maximum and minimum values
Answer: A
7. Which of the following affects the mean?
A) Outliers
B) Median
C) Mode
D) None of the above
Answer: A
8. What is the mean of the dataset {2, 4, 6, 8}?
A) 4
B) 5
C) 6
D) 10
Answer: C
9. What is the mean of {5, 10, 15}?
A) 10
B) 15
C) 12.5
D) 11
Answer: A
10. If all values in a dataset are increased by 5, how does the mean change?
A) Increases by 5
B) Decreases by 5
C) Remains the same
D) Doubles
Answer: A
3. Standard Deviation
11. What does standard deviation measure?
A) Central tendency
B) Spread of data around the mean
C) Median
D) Mode
Answer: B
12. If the standard deviation is 0, what can be inferred?
A) Data is widely spread
B) All data points are equal
C) Data has many outliers
D) Data has no mean
Answer: B
13. What happens to standard deviation if all data points are increased by a
constant?
A) Increases by the same constant
B) Remains unchanged
C) Doubles
D) Becomes zero
Answer: B
14. What does a large standard deviation indicate?
A) Data points are close to the mean
B) Data points are widely spread
C) Data points are all identical
D) None of the above
Answer: B
15. Which of the following datasets has the largest standard deviation?
A) {5, 5, 5, 5}
B) {1, 5, 9}
C) {2, 4, 6, 8}
D) {10, 10, 10}
Answer: B
4. Skewness and Kurtosis
16. What does skewness measure?
A) The shape of the distribution
B) The spread of the data
C) The mean value
D) The correlation between variables
Answer: A
17. A positively skewed distribution has:
A) A longer tail on the left
B) A longer tail on the right
C) Equal tails on both sides
D) No tails
Answer: B
18. What does kurtosis measure?
A) Spread of data
B) Peakedness of a distribution
C) Average of data
D) Number of outliers
Answer: B
19. Which distribution has kurtosis greater than 3?
A) Normal distribution
B) Platykurtic distribution
C) Leptokurtic distribution
D) Mesokurtic distribution
Answer: C
20. What does a negative skewness indicate?
A) Symmetrical distribution
B) Longer tail on the left
C) Longer tail on the right
D) No skewness
Answer: B
5. Inferential Statistics
21. What is the primary goal of inferential statistics?
A) Summarize data
B) Make conclusions about a population based on a sample
C) Collect data
D) Identify outliers
Answer: B
22. What is the null hypothesis (H₀)?
A) The hypothesis being tested
B) The hypothesis assumed true unless evidence suggests otherwise
C) The hypothesis that always gets rejected
D) None of the above
Answer: B
23. Which test is used to compare the means of two independent groups?
A) Chi-square test
B) t-test
C) ANOVA
D) Regression analysis
Answer: B
24. What does a p-value less than 0.05 indicate?
A) Fail to reject the null hypothesis
B) Reject the null hypothesis
C) Results are insignificant
D) None of the above
Answer: B
25. What type of error occurs when the null hypothesis is rejected but is actually
true?
A) Type I error
B) Type II error
C) Sampling error
D) None of the above
Answer: A
6. Probability Theory
26. What is the range of probability values?
A) -1 to 1
B) 0 to 1
C) 0 to 100
D) None of the above
Answer: B
27. What is the probability of an impossible event?
A) 0
B) 0.5
C) 1
D) Undefined
Answer: A
28. What is the sum of probabilities of all outcomes in a sample space?
A) 0
B) 1
C) Infinity
D) Depends on the event
Answer: B
29. What is conditional probability?
A) Probability of A given B
B) Probability of B given A
C) Probability of A and B
D) None of the above
Answer: A
30. What formula represents Bayes’ Theorem?
A) P(A∩B)×P(B)P(A)
B) P(A∣B)=P(B∣A)P(A)/P(B)
C) P(A)+P(B)P(A) + P(B)P(A)+P(B)
D) None of the above
Answer: B
7. Pandas Library
31. What is the primary purpose of the Pandas library?
A) Data manipulation and analysis
B) Web scraping
C) Numerical computing
D) None of the above
Answer: A
32. Which object in Pandas represents tabular data?
A) Series
B) DataFrame
C) Array
D) List
Answer: B
33. How do you import Pandas in Python?
A) import pandas as pd
B) include pandas
C) require pandas
D) import pd
Answer: A
34. Which method reads a CSV file into a DataFrame?
A) pd.read_table()
B) pd.read_csv()
C) pd.read_file()
D) None of the above
Answer: B
35. How do you select a column named "Age" from a DataFrame df?
A) df[Age]
B) df["Age"]
C) df.Age
D) Both B and C
Answer: D
8. DataFrame Operations
36. Which method adds a new column to a DataFrame?
A) append()
B) insert()
C) assign()
D) None of the above
Answer: C
37. What does the head() method do?
A) Shows the first few rows of a DataFrame
B) Deletes rows
C) Sorts rows
D) Merges two DataFrames
Answer: A
38. Which method removes missing values from a DataFrame?
A) drop()
B) dropna()
C) fillna()
D) None of the above
Answer: B
39. How do you sort a DataFrame by a column?
A) sort()
B) sort_values()
C) arrange()
D) None of the above
Answer: B
40. Which method provides a summary of statistics for a DataFrame?
A) describe()
B) info()
C) summary()
D) stats()
Answer: A
Unit IV
1. Data Cleaning and Preparation
1. What is the primary goal of data cleaning?
A) Data modeling
B) Remove inconsistencies and errors
C) Predict future outcomes
D) Visualize data
Answer: B
2. Which of the following is NOT part of data preparation?
A) Data transformation
B) Model evaluation
C) Removing duplicates
D) Handling missing values
Answer: B
3. What is data normalization?
A) Removing duplicates
B) Converting data to a uniform scale
C) Identifying outliers
D) None of the above
Answer: B
4. Which method is commonly used for text data cleaning?
A) Encoding
B) Tokenization
C) Visualization
D) Regression
Answer: B
5. What is the process of reducing a dataset's dimensionality called?
A) Data cleaning
B) Feature selection
C) Data wrangling
D) Data scaling
Answer: B
2. Handling Missing Data
6. What is the simplest way to handle missing data?
A) Replace with zeros
B) Remove rows/columns with missing values
C) Predict missing values
D) All of the above
Answer: D
7. Which method in pandas removes rows with missing values?
A) drop()
B) dropna()
C) fillna()
D) replace()
Answer: B
8. What does the fillna() method in pandas do?
A) Drops missing values
B) Fills missing values
C) Detects missing values
D) Removes duplicates
Answer: B
9. Which method in pandas can interpolate missing values?
A) fillna()
B) interpolate()
C) replace()
D) dropna()
Answer: B
10. What is imputation in the context of handling missing data?
A) Removing duplicates
B) Replacing missing values with statistical measures
C) Normalizing data
D) Creating new features
Answer: B
3. Data Transformations (pandas and sklearn)
11. What does the apply() function in pandas do?
A) Filters rows
B) Applies a function to a DataFrame or Series
C) Removes duplicates
D) Converts data types
Answer: B
12. Which sklearn function is used to standardize features?
A) MinMaxScaler
B) StandardScaler
C) LabelEncoder
D) OneHotEncoder
Answer: B
13. What is the range of values after using MinMaxScaler?
A) -1 to 1
B) 0 to 1
C) No fixed range
D) None of the above
Answer: B
14. Which sklearn class is used for encoding categorical variables?
A) LabelEncoder
B) OneHotEncoder
C) Both A and B
D) None of the above
Answer: C
15. Which pandas method is used for renaming columns?
A) rename()
B) rename_columns()
C) reindex()
D) None of the above
Answer: A
4. Removing Duplicates
16. How do you remove duplicate rows in pandas?
A) drop_duplicates()
B) remove_duplicates()
C) delete_duplicates()
D) None of the above
Answer: A
17. What is the default behavior of drop_duplicates() in pandas?
A) Removes all duplicates
B) Removes the first occurrence of a duplicate
C) Keeps the first occurrence and removes the rest
D) Does not remove any rows
Answer: C
18. Which parameter in drop_duplicates() specifies columns to check for
duplicates?
A) subset
B) columns
C) check_cols
D) None of the above
Answer: A
19. What does inplace=True do in pandas methods?
A) Creates a new DataFrame
B) Modifies the original DataFrame
C) Copies the DataFrame
D) None of the above
Answer: B
20. How can you detect duplicate rows in a DataFrame?
A) duplicated()
B) is_duplicate()
C) check_duplicates()
D) None of the above
Answer: A
5. Replacing Values
21. Which pandas method is used to replace specific values?
A) replace()
B) update()
C) fillna()
D) None of the above
Answer: A
22. How do you replace all occurrences of 10 with 0 in a DataFrame?
A) df.replace(10, 0)
B) df.fillna(10, 0)
C) df.drop(10, 0)
D) df.update(10, 0)
Answer: A
23. Which parameter in replace() allows replacing values with a dictionary?
A) mapping
B) dict
C) to_replace
D) None of the above
Answer: C
24. What does the regex=True option in replace() enable?
A) Regex-based value matching
B) String replacement only
C) Numerical replacement only
D) None of the above
Answer: A
25. Can replace() work on both rows and columns?
A) Yes
B) No
C) Only rows
D) Only columns
Answer: A
6. Detecting Outliers
26. Which plot is most commonly used to detect outliers?
A) Box plot
B) Histogram
C) Scatter plot
D) Line plot
Answer: A
27. What is the IQR (Interquartile Range)?
A) Q1 - Q3
B) Q3 - Q1
C) Mean of the dataset
D) None of the above
Answer: B
28. Which formula is used to detect outliers based on IQR?
A) Values < Q1 - 1.5 * IQR or > Q3 + 1.5 * IQR
B) Values < Q1 - IQR or > Q3 + IQR
C) Values > Mean + Std. Dev
D) None of the above
Answer: A
29. Which library provides the IsolationForest algorithm for detecting outliers?
A) sklearn
B) pandas
C) matplotlib
D) seaborn
Answer: A
30. What does a Z-score measure in outlier detection?
A) Distance from the mean in terms of standard deviations
B) Distance from the median
C) Difference between two values
D) None of the above
Answer: A
7. Data Visualization
31. Which library is used for creating static, interactive, and animated
visualizations?
A) matplotlib
B) seaborn
C) pandas
D) sklearn
Answer: A
32. Which seaborn function is used to create pair plots?
A) pairplot()
B) scatterplot()
C) boxplot()
D) lineplot()
Answer: A
33. What type of plot is best for visualizing data distribution?
A) Histogram
B) Scatter plot
C) Line plot
D) Bar plot
Answer: A
34. Which plot visualizes the relationship between two variables?
A) Scatter plot
B) Histogram
C) Box plot
D) Pie chart
Answer: A
35. Which seaborn function creates a heatmap?
A) heatmap()
B) barplot()
C) scatterplot()
D) None of the above
Answer: A
8. Scatter Plot
36. What does a scatter plot show?
A) Relationships between two variables
B) Data distribution
C) Outliers only
D) None of the above
Answer: A
37. Which function creates a scatter plot in matplotlib?
A) plt.scatter()
B) plt.plot()
C) plt.scatterplot()
D) plt.line()
Answer: A
9. Line Plot
38. Which function is used to plot a line graph in matplotlib?
A) plt.plot()
B) plt.line()
C) plt.scatter()
D) None of the above
Answer: A
Unit V
1. Supervised Learning: Basics
1. What is supervised learning?
A) Training a model with labeled data
B) Training a model with unlabeled data
C) Reinforcement through trial and error
D) None of the above
Answer: A
2. Which of the following is NOT an example of supervised learning?
A) Linear regression
B) Clustering
C) Decision tree
D) Logistic regression
Answer: B
3. What are the two main categories of supervised learning?
A) Regression and clustering
B) Classification and regression
C) Clustering and classification
D) Regression and reinforcement learning
Answer: B
2. Regression
4. What is the primary goal of regression?
A) Predict continuous values
B) Classify data into categories
C) Identify clusters in data
D) Reinforce learning from past actions
Answer: A
5. Which metric is commonly used to evaluate regression models?
A) Accuracy
B) Mean Squared Error (MSE)
C) Precision
D) Recall
Answer: B
6. In regression, the line that minimizes the sum of squared errors is called the:
A) Decision boundary
B) Regression line
C) Margin
D) Hyperplane
Answer: B
7. Which algorithm is commonly used for regression problems?
A) K-means
B) Linear regression
C) Naïve Bayes
D) DBSCAN
Answer: B
8. The slope in a simple linear regression equation represents:
A) Intercept
B) Rate of change in the dependent variable
C) Sum of squared errors
D) None of the above
Answer: B
3. Classification
9. What is the primary goal of classification?
A) Predict categories or labels
B) Predict continuous values
C) Identify clusters
D) Reinforce learning from actions
Answer: A
10. Which algorithm is used for binary classification?
A) Logistic regression
B) K-means
C) DBSCAN
D) PCA
Answer: A
11. Which of the following is a classification metric?
A) R-squared
B) Confusion matrix
C) Mean Absolute Error
D) Sum of squares
Answer: B
12. Which algorithm assumes conditional independence of features?
A) Decision tree
B) Random forest
C) Naïve Bayes
D) K-Nearest Neighbor
Answer: C
4. Linear Regression
13. What is the assumption in linear regression about the relationship between
variables?
A) Non-linear
B) Linear
C) Polynomial
D) Exponential
Answer: B
14. What is the cost function used in linear regression?
A) Entropy
B) Mean Squared Error (MSE)
C) Log loss
D) Gini impurity
Answer: B
15. What does multicollinearity refer to in linear regression?
A) High correlation between independent variables
B) High correlation between dependent and independent variables
C) Low correlation between all variables
D) No correlation between variables
Answer: A
5. Logistic Regression
16. What is the primary use of logistic regression?
A) Regression tasks
B) Classification tasks
C) Clustering tasks
D) Reinforcement tasks
Answer: B
17. Which function does logistic regression use to predict probabilities?
A) Linear function
B) Sigmoid function
C) Polynomial function
D) Exponential function
Answer: B
18. What is the range of predicted values in logistic regression?
A) -∞ to ∞
B) 0 to 1
C) -1 to 1
D) None of the above
Answer: B
19. Which loss function is used in logistic regression?
A) Mean Squared Error
B) Log loss (Cross-Entropy)
C) Gini impurity
D) Entropy
Answer: B
6. Decision Tree
20. What is a decision tree?
A) A clustering algorithm
B) A model that splits data based on feature values
C) A reinforcement learning algorithm
D) None of the above
Answer: B
21. What is the purpose of entropy in a decision tree?
A) Measure information gain
B) Identify clusters
C) Perform regression analysis
D) None of the above
Answer: A
22. Which algorithm is used to build decision trees using entropy?
A) Random Forest
B) ID3
C) KNN
D) Naïve Bayes
Answer: B
23. Information gain measures:
A) Reduction in entropy after a split
B) Increase in variance after a split
C) Distance between clusters
D) None of the above
Answer: A
7. Random Forest
24. What is random forest?
A) A single decision tree
B) An ensemble of decision trees
C) A clustering algorithm
D) None of the above
Answer: B
25. Which method is used to combine multiple decision trees in random forest?
A) Averaging for regression, voting for classification
B) Clustering
C) Bagging
D) Boosting
Answer: A
8. K-Nearest Neighbors (KNN)
26. What is the main idea of KNN?
A) Identify the nearest neighbors and classify based on majority voting
B) Build a tree structure
C) Perform regression analysis
D) None of the above
Answer: A
27. K in KNN represents:
A) Number of clusters
B) Number of neighbors to consider
C) Number of classes
D) None of the above
Answer: B
28. Which distance metric is commonly used in KNN?
A) Manhattan
B) Euclidean
C) Minkowski
D) All of the above
Answer: D
9. Unsupervised Learning: Clustering
29. What is the main goal of clustering?
A) Group similar data points
B) Predict future outcomes
C) Perform classification
D) None of the above
Answer: A
30. Which of the following is NOT a clustering algorithm?
A) K-means
B) DBSCAN
C) Random Forest
D) Agglomerative Clustering
Answer: C
31. Which parameter specifies the number of clusters in K-means?
A) k
B) max_iter
C) epsilon
D) None of the above
Answer: A
10. Reinforcement Learning
32. What is the primary goal of reinforcement learning?
A) Find an optimal policy to maximize cumulative reward
B) Classify data into categories
C) Reduce dimensionality
D) None of the above
Answer: A
33. In reinforcement learning, the agent interacts with:
A) The environment
B) A supervisor
C) Labeled data
D) None of the above
Answer: A
34. Which algorithm is commonly used in reinforcement learning?
A) Q-learning
B) Linear regression
C) K-means
D) Logistic regression
Answer: A