Here’s a set of Python coding questions :-
LINEAR REGRESSION QUESTIONS:-
1. Linear Regression (10 Questions)
1. Import the libraries required for linear regression.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
2. Load a dataset and split it into train and test sets.
df = pd.read_csv('data.csv')
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
3. Fit a linear regression model.
model = LinearRegression()
model.fit(X_train, y_train)
4. Predict the target variable for test data.
y_pred = model.predict(X_test)
5. Calculate Mean Squared Error (MSE).
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print(f'MSE: {mse}')
6. Display model coefficients and intercept.
print(f'Coefficients: {model.coef_}')
print(f'Intercept: {model.intercept_}')
7. Plot a regression line on a scatter plot.
import matplotlib.pyplot as plt
plt.scatter(X_test['feature1'], y_test, color='blue')
plt.plot(X_test['feature1'], y_pred, color='red')
plt.show()
8. Normalize features before fitting the model.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model.fit(X_train_scaled, y_train)
9. Check the R-squared value of the model.
r_squared = model.score(X_test, y_test)
print(f'R-squared: {r_squared}')
10. Perform a train-test split with stratification.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
stratify=y, random_state=42)
Here are 50 Python coding questions and answers focused on Linear
Regression, ranging from basic to intermediate levels:
---
Basic Linear Regression Questions
1. Q: Import the necessary libraries for Linear Regression in Python.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
2. Q: How do you create a simple dataset for linear regression using
NumPy?
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])
3. Q: How do you split data into training and test sets?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
4. Q: Create and train a simple Linear Regression model using Scikit-
learn.
model = LinearRegression()
model.fit(X_train, y_train)
5. Q: How do you make predictions using a trained model?
predictions = model.predict(X_test)
print(predictions)
---
Data Preprocessing and Handling
6. Q: Load a dataset for Linear Regression from a CSV file.
df = pd.read_csv('data.csv')
X = df[['Feature1', 'Feature2']].values
y = df['Target'].values
7. Q: Check for null values in a dataset.
print(df.isnull().sum())
8. Q: Fill missing values with the mean of a column.
df['Feature1'] = df['Feature1'].fillna(df['Feature1'].mean())
9. Q: Normalize a dataset using Min-Max scaling.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
10. Q: Encode categorical variables using OneHotEncoding.
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
X_encoded = encoder.fit_transform(df[['Category']]).toarray()
---
Evaluation and Metrics
11. Q: Calculate the Mean Squared Error (MSE) for predictions.
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)
12. Q: Calculate the R-squared score.
r2_score = model.score(X_test, y_test)
print("R-squared:", r2_score)
13. Q: Plot the actual vs. predicted values.
import matplotlib.pyplot as plt
plt.scatter(y_test, predictions)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Actual vs Predicted')
plt.show()
14. Q: Compute the Root Mean Squared Error (RMSE).
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print("Root Mean Squared Error:", rmse)
15. Q: Print the model’s coefficients and intercept.
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
---
Intermediate Linear Regression Questions
16. Q: Generate synthetic data for linear regression using Scikit-
learn.
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=1, noise=10,
random_state=42)
17. Q: Perform a train-test split with stratification (if applicable).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
18. Q: Save a trained model using joblib.
import joblib
joblib.dump(model, 'linear_model.pkl')
19. Q: Load a saved model using joblib.
loaded_model = joblib.load('linear_model.pkl')
20. Q: Evaluate model performance on a validation dataset.
val_predictions = loaded_model.predict(X_test)
print("Validation R2:", loaded_model.score(X_test, y_test))
---
Advanced Concepts
21. Q: Implement polynomial regression using Scikit-learn.
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model.fit(X_poly, y)
22. Q: Add interaction terms using PolynomialFeatures.
X_poly = poly.fit_transform(X)
23. Q: Perform cross-validation for a Linear Regression model.
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print("Cross-Validation Scores:", scores)
24. Q: Perform Ridge Regression using Scikit-learn.
from sklearn.linear_model import Ridge
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
25. Q: Perform Lasso Regression using Scikit-learn.
from sklearn.linear_model import Lasso
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
---
Plotting and Visualization
26. Q: Plot the regression line for a simple Linear Regression model.
plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red')
plt.show()
27. Q: Visualize residuals to check model assumptions.
residuals = y_test - predictions
plt.hist(residuals, bins=20)
plt.title('Residual Distribution')
plt.show()
28. Q: Create a heatmap of correlations for feature selection.
import seaborn as sns
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
29. Q: Plot feature importance for linear regression coefficients.
plt.bar(range(len(model.coef_)), model.coef_)
plt.show()
30. Q: Plot actual vs. predicted values for a test set.
plt.scatter(y_test, predictions)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Actual vs. Predicted Values')
plt.show()
---
Coding Challenges
31. Q: Write a function to calculate the RMSE manually.
def calculate_rmse(y_true, y_pred):
return np.sqrt(np.mean((y_true - y_pred)**2))
32. Q: Write a function to calculate R-squared manually.
def r2_score_manual(y_true, y_pred):
ss_total = np.sum((y_true - np.mean(y_true))**2)
ss_residual = np.sum((y_true - y_pred)**2)
return 1 - (ss_residual / ss_total)
33. Q: Implement Gradient Descent for Linear Regression manually.
def gradient_descent(X, y, lr=0.01, epochs=1000):
m, n = X.shape
weights = np.zeros(n)
for _ in range(epochs):
predictions = np.dot(X, weights)
gradients = -2/m * np.dot(X.T, (y - predictions))
weights -= lr * gradients
return weights
More Linear Regression Coding Questions
34. Q: Calculate the cost function (Mean Squared Error) manually.
def calculate_cost(X, y, weights):
m = len(y)
predictions = np.dot(X, weights)
cost = np.sum((predictions - y)**2) / (2 * m)
return cost
35. Q: Implement a Linear Regression model manually using normal
equation.
def linear_regression_normal_equation(X, y):
X_transpose = np.transpose(X)
weights =
np.linalg.inv(X_transpose.dot(X)).dot(X_transpose).dot(y)
return weights
36. Q: Add a bias term to the feature matrix manually.
X_with_bias = np.c_[np.ones((X.shape[0], 1)), X]
37. Q: Perform Ridge Regression using manual calculations.
def ridge_regression(X, y, alpha):
X_transpose = np.transpose(X)
I = np.identity(X.shape[1])
weights = np.linalg.inv(X_transpose.dot(X) + alpha *
I).dot(X_transpose).dot(y)
return weights
38. Q: Implement feature scaling manually (standardization).
def standardize_features(X):
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)
X_standardized = (X - mean) / std
return X_standardized
39. Q: Implement a stochastic gradient descent (SGD) optimizer for
Linear Regression.
def stochastic_gradient_descent(X, y, lr=0.01, epochs=1000):
m, n = X.shape
weights = np.zeros(n)
for epoch in range(epochs):
for i in range(m):
idx = np.random.randint(m)
X_i = X[idx:idx+1]
y_i = y[idx:idx+1]
gradient = -2 * X_i.T.dot(y_i - X_i.dot(weights))
weights -= lr * gradient
return weights
40. Q: Write a function to split data into k folds for cross-validation.
from sklearn.model_selection import KFold
def k_fold_split(X, y, k=5):
kf = KFold(n_splits=k)
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
yield X_train, X_test, y_train, y_test
---
Exploratory Data Analysis (EDA)
41. Q: Check the correlation between independent and dependent
variables.
correlation = df.corr()
print(correlation['Target'])
42. Q: Plot a pairplot to visualize feature relationships.
import seaborn as sns
sns.pairplot(df)
43. Q: Plot a scatterplot for a feature against the target variable.
plt.scatter(df['Feature1'], df['Target'])
plt.xlabel('Feature1')
plt.ylabel('Target')
plt.show()
44. Q: Identify multicollinearity using the Variance Inflation Factor
(VIF).
from statsmodels.stats.outliers_influence import
variance_inflation_factor
def calculate_vif(X):
vif_data = pd.DataFrame()
vif_data['Feature'] = X.columns
vif_data['VIF'] = [variance_inflation_factor(X.values, i) for i in
range(X.shape[1])]
return vif_data
45. Q: Visualize the residual errors of a regression model.
residuals = y_test - predictions
sns.histplot(residuals, kde=True)
plt.title('Residual Errors')
plt.show()
---
Advanced Model Evaluation
46. Q: Use cross-validation to evaluate the model’s performance.
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5,
scoring='neg_mean_squared_error')
print("Mean CV MSE:", -np.mean(scores))
47. Q: Compare training and test performance to identify
overfitting.
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)
train_mse = mean_squared_error(y_train, train_predictions)
test_mse = mean_squared_error(y_test, test_predictions)
print("Train MSE:", train_mse)
print("Test MSE:", test_mse)
48. Q: Evaluate the impact of adding/removing features using
forward selection.
from sklearn.feature_selection import SequentialFeatureSelector
sfs = SequentialFeatureSelector(model, n_features_to_select=2,
direction='forward')
sfs.fit(X, y)
print("Selected Features:", sfs.get_support())
49. Q: Regularize features with L2 (Ridge) regularization.
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
print("Ridge Coefficients:", ridge.coef_)
50. Q: Perform grid search to optimize hyperparameters in Ridge
regression.
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.1, 1.0, 10.0, 100.0]}
grid_search = GridSearchCV(Ridge(), param_grid,
scoring='neg_mean_squared_error', cv=5)
grid_search.fit(X_train, y_train)
print("Best Alpha:", grid_search.best_params_)
LOGISTIC REGRESSION QUESTIONS:-
Basic Logistic Regression Questions
1. How do you import Logistic Regression in Python?
from sklearn.linear_model import LogisticRegression
2. Write code to load a dataset for logistic regression.
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target
3. How do you split the dataset into training and testing sets?
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
4. Create and fit a logistic regression model.
model = LogisticRegression()
model.fit(X_train, y_train)
5. How do you make predictions using the logistic regression model?
y_pred = model.predict(X_test)
6. How do you check the accuracy of the logistic regression model?
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)
7. How do you calculate the confusion matrix?
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
8. How do you calculate precision and recall?
from sklearn.metrics import precision_score, recall_score
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
print(precision, recall)
9. What function retrieves logistic regression coefficients?
coefficients = model.coef_
print(coefficients)
10. How do you get the intercept in logistic regression?
intercept = model.intercept_
print(intercept)
---
Intermediate Logistic Regression Questions
11. How do you normalize data for logistic regression?
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
12. How do you handle categorical features?
import pandas as pd
data = pd.get_dummies(data, drop_first=True)
13. How do you add regularization in logistic regression?
model = LogisticRegression(penalty='l2', C=1.0) # L2 regularization
model.fit(X_train, y_train)
14. What is the difference between L1 and L2 regularization in
logistic regression?
L1: Adds the absolute values of the coefficients.
L2: Adds the squared values of the coefficients.
15. How do you perform multi-class classification using logistic
regression?
model = LogisticRegression(multi_class='multinomial',
solver='lbfgs')
model.fit(X_train, y_train)
16. How do you calculate the ROC-AUC score?
from sklearn.metrics import roc_auc_score
y_prob = model.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_prob)
print(auc_score)
17. How do you plot the ROC curve?
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
fpr, tpr, _ = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.show()
18. How do you tune hyperparameters using GridSearchCV?
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver':
['liblinear']}
grid = GridSearchCV(LogisticRegression(), param_grid, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)
19. How do you add class weighting in logistic regression?
model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)
20. What is the use of predict_proba() in logistic regression?
It gives the probability estimates for each class.
probabilities = model.predict_proba(X_test)
print(probabilities)
---
Feature Engineering and Model Evaluation
21. How do you calculate the log loss for logistic regression?
from sklearn.metrics import log_loss
log_loss_value = log_loss(y_test, y_prob)
print(log_loss_value)
22. How do you handle missing values in the dataset?
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)
23. How do you check feature importance in logistic regression?
import numpy as np
importance = np.abs(model.coef_[0])
print(importance)
24. How do you perform cross-validation?
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(scores.mean())
25. How do you standardize data using MinMaxScaler?
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
26. How do you visualize the decision boundary for logistic
regression?
import matplotlib.pyplot as plt
import numpy as np
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k')
plt.show()
27. How do you implement polynomial logistic regression?
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LogisticRegression()
model.fit(X_poly, y)
28. How do you compute the F1-score?
from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_pred, average='weighted')
print(f1)
29. How do you use a custom threshold for classification?
threshold = 0.6
y_custom = (model.predict_proba(X_test)[:, 1] >
threshold).astype(int)
30. How do you evaluate the precision-recall curve?
from sklearn.metrics import precision_recall_curve
precision, recall, _ = precision_recall_curve(y_test, y_prob)
plt.plot(recall, precision)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()
---
Advanced Questions
31. How do you save a logistic regression model?
import joblib
joblib.dump(model, 'logistic_model.pkl')
32. How do you load a saved logistic regression model?
model = joblib.load('logistic_model.pkl')
33. How do you handle imbalanced datasets?
Use class_weight='balanced' or SMOTE:
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)
34. How do you visualize probabilities?
import seaborn as sns
sns.histplot(model.predict_proba(X_test)[:, 1], kde=True)
35. How do you set a custom solver in logistic regression?
model = LogisticRegression(solver='liblinear')
NAÏVE BAYES QUESTIONS:-
Basic Level Questions
1. What is Naive Bayes? Write a Python definition.
# Naive Bayes is a probabilistic classifier based on Bayes' theorem.
# It assumes independence between predictors.
2. Write Python code to import the Naive Bayes model.
from sklearn.naive_bayes import GaussianNB
3. How do you import a dataset to work with Naive Bayes?
import pandas as pd
data = pd.read_csv('dataset.csv')
4. Write Python code to split a dataset into training and testing
sets.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
5. How do you train a Naive Bayes classifier in Python?
model = GaussianNB()
model.fit(X_train, y_train)
6. Write code to make predictions using a trained Naive Bayes
model.
y_pred = model.predict(X_test)
7. How do you calculate the accuracy of a Naive Bayes classifier?
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
8. How do you load and preprocess text data for Naive Bayes?
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(text_data)
9. What library is used for Multinomial Naive Bayes in Python?
from sklearn.naive_bayes import MultinomialNB
10. Write code to instantiate a Multinomial Naive Bayes model.
model = MultinomialNB()
---
Intermediate Level Questions
11. What is Laplace smoothing in Naive Bayes, and how do you set it
in Python?
# Laplace smoothing is controlled using the 'alpha' parameter.
model = MultinomialNB(alpha=1.0)
12. How do you evaluate the confusion matrix of a Naive Bayes
model?
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
13. Write code to perform cross-validation on a Naive Bayes model.
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(scores)
14. How do you handle categorical data for Naive Bayes?
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
X_encoded = encoder.fit_transform(X_categorical)
15. Write code to compute classification report for Naive Bayes.
from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)
16. How do you use Naive Bayes with a pipeline?
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('vectorizer', CountVectorizer()),
('classifier', MultinomialNB())
])
pipeline.fit(X_train, y_train)
17. How do you compute log probabilities in Naive Bayes?
log_probs = model.predict_log_proba(X_test)
print(log_probs)
18. How do you visualize the confusion matrix for a Naive Bayes
classifier?
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(cm, annot=True, fmt='d')
plt.show()
19. Write Python code for loading a text dataset for Naive Bayes.
from sklearn.datasets import fetch_20newsgroups
data = fetch_20newsgroups(subset='train', categories=['sci.space',
'rec.sport.baseball'])
20. How do you save a Naive Bayes model?
import joblib
joblib.dump(model, 'naive_bayes_model.pkl')
---
Scenario-Based Intermediate Questions
21. How do you implement Naive Bayes with TF-IDF?
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(text_data)
22. What is partial_fit in Naive Bayes? Provide an example.
# Useful for incremental learning
model = MultinomialNB()
model.partial_fit(X_train, y_train, classes=np.unique(y_train))
23. Write code to compute probabilities instead of predictions.
probabilities = model.predict_proba(X_test)
print(probabilities)
24. How do you balance a dataset before applying Naive Bayes?
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X, y)
25. Write code to test your model with new text data.
new_data = ["Space is vast and infinite."]
new_vector = vectorizer.transform(new_data)
prediction = model.predict(new_vector)
print(prediction)
---
Advanced Intermediate Questions
26. How do you plot ROC curve for Naive Bayes?
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)
[:, 1])
plt.plot(fpr, tpr)
plt.show()
27. Explain and implement prior probability in Naive Bayes.
# Prior probabilities are the probabilities of each class before
seeing data.
print(model.class_prior_)
28. Write code for Multinomial Naive Bayes with binary data.
binary_model = MultinomialNB(binarize=0.0)
binary_model.fit(X_train, y_train)
29. How do you tune hyperparameters in Naive Bayes?
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.1, 0.5, 1.0]}
grid = GridSearchCV(MultinomialNB(), param_grid, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)
30. How do you handle missing data in Naive Bayes?
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_filled = imputer.fit_transform(X)
---
Comparison and Analysis
31. How do you compare GaussianNB and MultinomialNB?
# GaussianNB is for continuous data, MultinomialNB is for discrete
data.
32. Can Naive Bayes be used for regression?
# No, Naive Bayes is a classification algorithm.
33. How do you visualize word importance in a text Naive Bayes
model?
words = vectorizer.get_feature_names_out()
weights = model.coef_[0]
word_importance = sorted(zip(words, weights), key=lambda x: x[1],
reverse=True)
print(word_importance[:10])
34. Write code to check the assumptions of Naive Bayes.
# Check feature independence:
print(X.corr())
35. How do you evaluate precision and recall for Naive Bayes?
from sklearn.metrics import precision_score, recall_score
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
print(precision, recall)
---
Special Use Cases
36. How do you use Naive Bayes with imbalanced datasets?
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler()
X_resampled, y_resampled = rus.fit_resample(X, y)
37. Write code to compare Naive Bayes with another classifier.
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
rf_score = rf.score(X_test, y_test)
nb_score = model.score(X_test, y_test)
print(nb_score, rf_score)
38. How do you compute log loss for a Naive Bayes model?
from sklearn.metrics import log_loss
loss = log_loss(y_test, model.predict_proba(X_test))
print(loss)
39. What datasets work best with Naive Bayes?
# Datasets where features are independent.
40. How do you visualize the decision boundary of Naive Bayes?
from mlxtend.plotting import plot_decision_regions
plot_decision_regions(X_test.values, y_test.values, clf=model)
plt.show()
---
Exploratory Questions
41. Why does Naive Bayes perform well on text data?
# Assumption of independence aligns well with bag-of-words
models.
42. How do you interpret alpha in MultinomialNB?
# Controls the impact of Laplace smoothing.
43. How does Naive Bayes scale with large datasets?
# It scales well due to low computational complexity.
44. How do you convert probabilities to labels?
labels = [1 if prob > 0.5 else 0 for prob in probabilities]
45. Can Naive Bayes work with dense features?
# Yes, GaussianNB can handle dense features.
46. How do you handle overfitting in Naive Bayes?
# Use smoothing and cross-validation.
47. What is the role of feature scaling in Naive Bayes?
# GaussianNB benefits from scaling; others do not.
48. How do you visualize text classification results?
from wordcloud import WordCloud
wc = WordCloud().generate(" ".join(text_data))
plt.imshow(wc, interpolation='bilinear')
plt.show()
49. What metrics are used to evaluate Naive Bayes?
# Accuracy, Precision, Recall, F1-Score, ROC-AUC.
50. How do you interpret Naive Bayes predictions?
# Predictions are based on the posterior probability of each class.
MODEL SELECTION QUESTIONS:-
Basic Questions
1. What is model selection in machine learning?
Answer: Model selection is the process of choosing the best-
performing machine learning model from a set of candidates based
on evaluation metrics like accuracy, precision, recall, or other
criteria.
2. How can you split a dataset into training and testing sets in
Python?
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
3. What is cross-validation?
Answer: Cross-validation is a technique used to evaluate the
performance of a machine learning model by splitting the data into
multiple subsets and training/testing the model on different
combinations of these subsets.
4. Write code to perform K-Fold cross-validation.
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
scores = cross_val_score(model, X, y, cv=5)
print(scores)
5. What is the difference between validation set and test set?
Answer:
Validation Set: Used for tuning model hyperparameters.
Test Set: Used for evaluating final model performance.
6. What does train_test_split() do?
Answer: It splits a dataset into training and testing subsets for
model training and evaluation.
7. How to shuffle data before splitting?
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True,
random_state=42)
8. What is the purpose of the random_state parameter in
train_test_split()?
Answer: It ensures reproducibility by fixing the random seed.
9. Write Python code to perform stratified sampling.
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
10. How do you evaluate model performance during cross-
validation?
Answer: Use metrics like accuracy, precision, recall, or F1-score.
---
Intermediate Questions
11. What is Grid Search? Write code to implement it.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
param_grid = {'n_estimators': [10, 50, 100], 'max_depth': [None, 10,
20]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)
12. Explain RandomizedSearchCV. How is it different from
GridSearchCV?
Answer: RandomizedSearchCV randomly samples a subset of
hyperparameter combinations, making it faster than GridSearchCV,
which exhaustively tries all combinations.
13. Write code for RandomizedSearchCV.
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint
model = RandomForestClassifier()
param_dist = {'n_estimators': randint(10, 100), 'max_depth': [None,
10, 20]}
random_search = RandomizedSearchCV(model, param_dist, cv=5,
n_iter=10)
random_search.fit(X_train, y_train)
print(random_search.best_params_)
14. How do you compare multiple models using cross-validation?
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
models = {'LogisticRegression': LogisticRegression(),
'RandomForest': RandomForestClassifier()}
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=5)
print(f'{name}: Mean Accuracy = {scores.mean()}')
15. What are some common hyperparameters for decision trees?
Answer:
max_depth: Maximum depth of the tree.
min_samples_split: Minimum number of samples to split.
min_samples_leaf: Minimum number of samples in a leaf node.
16. How can you implement early stopping?
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100,
validation_fraction=0.2, n_iter_no_change=10)
model.fit(X_train, y_train)
17. What is an overfitting model? How can it be avoided?
Answer:
An overfitting model performs well on training data but poorly on
test data. Avoid it using regularization, dropout, cross-validation, or
early stopping.
18. What does the cv parameter do in GridSearchCV?
Answer: It specifies the number of cross-validation splits.
19. What is feature selection, and why is it important in model
selection?
Answer: Feature selection identifies the most relevant features to
improve model performance and reduce overfitting.
20. Write code for feature selection using SelectKBest.
from sklearn.feature_selection import SelectKBest, f_classif
selector = SelectKBest(score_func=f_classif, k=5)
X_new = selector.fit_transform(X, y)
---
Practical Implementation Questions
21. How do you handle imbalanced datasets during model selection?
Answer: Use techniques like oversampling, undersampling, or class
weighting.
22. How do you implement SMOTE for oversampling?
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X, y)
23. How can you visualize cross-validation results?
import matplotlib.pyplot as plt
scores = cross_val_score(model, X, y, cv=5)
plt.plot(scores)
plt.show()
24. What is nested cross-validation?
Answer: A technique that combines hyperparameter tuning (inner
loop) with model evaluation (outer loop) to reduce bias.
25. Implement nested cross-validation.
from sklearn.model_selection import cross_val_score, GridSearchCV
param_grid = {'n_estimators': [10, 50, 100]}
model = GridSearchCV(RandomForestClassifier(), param_grid, cv=3)
scores = cross_val_score(model, X, y, cv=5)
print(scores)
---
Advanced Concepts
26. How do you evaluate the performance of a classification model?
Answer: Use metrics like confusion matrix, accuracy, precision,
recall, F1-score, and ROC-AUC.
27. How to calculate precision and recall in Python?
from sklearn.metrics import precision_score, recall_score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
28. What is the difference between a pipeline and GridSearchCV?
Answer: A pipeline automates preprocessing and model training,
while GridSearchCV optimizes hyperparameters.
29. How do you create a pipeline?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([('scaler', StandardScaler()), ('model',
RandomForestClassifier())])
pipeline.fit(X_train, y_train)
30. How can feature engineering affect model selection?
Answer: Better features can improve model performance and reduce
the need for complex models.
---
Miscellaneous Questions
31. What are the common issues with GridSearchCV?
32. How do you handle categorical features in model selection?
33. How do you evaluate a regression model?
34. What is StratifiedShuffleSplit?
35. How do you assess bias vs. variance?
INFERENTIAL STATISTICS QUESTIONS:-
Basic Python and Statistics Questions
1. Question: How do you import key libraries for statistical
analysis in Python?
Answer:
Import numpy as np
Import pandas as pd
From scipy import stats
Import matplotlib.pyplot as plt
2. Question: Write Python code to calculate the mean of a list of
numbers.
Answer:
Data = [10, 20, 30, 40, 50]
Mean = np.mean(data)
Print(“Mean:”, mean)
3. Question: How can you calculate the median of a dataset?
Answer:
Data = [10, 20, 30, 40, 50]
Median = np.median(data)
Print(“Median:”, median)
4. Question: Calculate the mode of a dataset.
Answer:
Data = [10, 20, 30, 30, 40, 50]
Mode = stats.mode(data)
Print(“Mode:”, mode.mode[0])
5. Question: How do you calculate the standard deviation in
Python?
Answer:
Data = [10, 20, 30, 40, 50]
Std_dev = np.std(data)
Print(“Standard Deviation:”, std_dev)
Intermediate Level Questions: Inferential Statistics
6. Question: Perform a one-sample t-test in Python.
Answer:
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
T_stat, p_value = stats.ttest_1samp(data, popmean=3.0)
Print(“T-statistic:”, t_stat, “P-value:”, p_value)
7. Question: Conduct a two-sample independent t-test.
Answer:
Group1 = [2.1, 3.5, 2.8, 3.6]
Group2 = [3.3, 3.9, 4.0, 3.7]
T_stat, p_value = stats.ttest_ind(group1, group2)
Print(“T-statistic:”, t_stat, “P-value:”, p_value)
8. Question: How do you perform a paired t-test?
Answer:
Before = [2.5, 3.6, 2.9, 3.2]
After = [3.1, 3.7, 3.2, 3.5]
T_stat, p_value = stats.ttest_rel(before, after)
Print(“T-statistic:”, t_stat, “P-value:”, p_value)
9. Question: How can you test for normality in a dataset?
Answer:
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
Stat, p_value = stats.shapiro(data)
Print(“Statistic:”, stat, “P-value:”, p_value)
10. Question: Conduct a chi-square test for independence.
Answer:
Observed = np.array([[10, 20], [20, 40]])
Chi2, p_value, dof, expected = stats.chi2_contingency(observed)
Print(“Chi-squared:”, chi2, “P-value:”, p_value)
Visualizing Inferential Statistics
11. Question: Plot a histogram of a dataset.
Answer:
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
Plt.hist(data, bins=5)
Plt.show()
12. Question: Create a boxplot in Python.
Answer:
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
Plt.boxplot(data)
Plt.show()
13. Question: How do you generate a QQ plot for normality
testing?
Answer:
Import statsmodels.api as sm
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
Sm.qqplot(np.array(data), line=’s’)
Plt.show()
14. Question: Plot a confidence interval for a mean.
Answer:
Data = [2.5, 3.6, 2.9, 3.2, 3.1]
Mean = np.mean(data)
Se = stats.sem(data)
Ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=se)
Print(“95% Confidence Interval:”, ci)
Hypothesis Testing
15. Question: What is the null hypothesis in a one-sample t-
test?
Answer:
The null hypothesis states that the sample mean is equal to the
population mean.
16. Question: How do you calculate a p-value in Python?
Answer:
T_stat, p_value = stats.ttest_1samp(data, popmean=3.0)
Print(“P-value:”, p_value)
17. Question: How do you interpret a p-value < 0.05?
Answer:
Reject the null hypothesis; the result is statistically significant.
18. Question: Calculate Cohen’s d for effect size in Python.
Answer:
Group1 = [2.1, 3.5, 2.8, 3.6]
Group2 = [3.3, 3.9, 4.0, 3.7]
Mean_diff = np.mean(group1) – np.mean(group2)
Pooled_sd = np.sqrt((np.std(group1, ddof=1)**2 + np.std(group2,
ddof=1)**2) / 2)
Cohen_d = mean_diff / pooled_sd
Print(“Cohen’s d:”, cohen_d)
Correlation and Regression
19. Question: How do you calculate Pearson’s correlation
coefficient?
Answer:
X = [1, 2, 3, 4, 5]
Y = [2, 4, 6, 8, 10]
Corr, p_value = stats.pearsonr(x, y)
Print(“Correlation:”, corr, “P-value:”, p_value)
20. Question: Perform a simple linear regression in Python.
Answer:
From sklearn.linear_model import LinearRegression
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2, 4, 6, 8, 10])
Model = LinearRegression().fit(x, y)
Print(“Slope:”, model.coef_[0], “Intercept:”, model.intercept_)
Confidence Intervals and Effect Sizes
21. Question: Compute a 99% confidence interval for a
dataset.
Answer:
Mean = np.mean(data)
Se = stats.sem(data)
Ci = stats.t.interval(0.99, len(data)-1, loc=mean, scale=se)
Print(“99% Confidence Interval:”, ci)
22. Question: Calculate the effect size (Cohen’s d) for a one-
sample test.
Answer:
Mean_diff = np.mean(data) – 3.0
Std_dev = np.std(data, ddof=1)
Cohen_d = mean_diff / std_dev
Print(“Cohen’s d:”, cohen_d)
DATA VISUALIZATION QUESTIONS:-
Basic Level Questions
1. How do you import Matplotlib for data visualization?
Import matplotlib.pyplot as plt
2. How do you create a basic line plot in Matplotlib?
X = [1, 2, 3, 4]
Y = [10, 20, 25, 30]
Plt.plot(x, y)
Plt.show()
3. How can you add a title to a Matplotlib plot?
Plt.title(“My Plot Title”)
4. How do you label axes in Matplotlib?
Plt.xlabel(“X-axis Label”)
Plt.ylabel(“Y-axis Label”)
5. How do you create a scatter plot in Matplotlib?
X = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]
Y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]
Plt.scatter(x, y)
Plt.show()
6. How do you change the color of a line in Matplotlib?
Plt.plot(x, y, color=’red’)
7. How do you create a bar chart in Matplotlib?
Categories = [‘A’, ‘B’, ‘C’]
Values = [3, 7, 5]
Plt.bar(categories, values)
Plt.show()
8. How do you create a histogram in Matplotlib?
Data = [7, 8, 5, 7, 6, 9, 8, 6, 5]
Plt.hist(data, bins=5)
Plt.show()
9. How do you save a plot to a file in Matplotlib?
Plt.savefig(“plot.png”)
10. How do you create subplots in Matplotlib?
Fig, ax = plt.subplots(1, 2)
Ax[0].plot(x, y)
Ax[1].bar(categories, values)
Plt.show()
Intermediate Level Questions
11. How do you adjust the figure size in Matplotlib?
Plt.figure(figsize=(10, 5))
12. How do you add a grid to a Matplotlib plot?
Plt.grid(True)
13. How do you create a pie chart in Matplotlib?
Sizes = [20, 30, 50]
Labels = [‘Category A’, ‘Category B’, ‘Category C’]
Plt.pie(sizes, labels=labels, autopct=’%1.1f%%’)
Plt.show()
14. How do you create a box plot in Matplotlib?
Data = [7, 8, 5, 6, 9, 8, 6, 5]
Plt.boxplot(data)
Plt.show()
15. How do you change the style of a Matplotlib plot?
Plt.style.use(‘seaborn-darkgrid’)
16. How do you add a legend to a Matplotlib plot?
Plt.plot(x, y, label=”Line 1”)
Plt.legend()
17. How do you create a heatmap in Seaborn?
Import seaborn as sns
Import numpy as np
Data = np.random.rand(10, 10)
Sns.heatmap(data, annot=True)
Plt.show()
18. How do you load a dataset in Seaborn?
Import seaborn as sns
Data = sns.load_dataset(‘iris’)
Print(data.head())
19. How do you create a pairplot in Seaborn?
Sns.pairplot(data, hue=’species’)
Plt.show()
20. How do you create a regression plot in Seaborn?
Sns.regplot(x=’sepal_length’, y=’sepal_width’, data=data)
Plt.show()
Plot Customization Questions
21. How do you adjust the ticks in Matplotlib?
Plt.xticks(rotation=45)
Plt.yticks(fontsize=10)
22. How do you create a horizontal bar chart in Matplotlib?
Plt.barh(categories, values)
Plt.show()
23. How do you set a logarithmic scale for an axis in
Matplotlib?
Plt.xscale(‘log’)
24. How do you add annotations to a Matplotlib plot?
Plt.annotate(‘Peak’, xy=(2, 25), xytext=(3, 30),
Arrowprops=dict(facecolor=’black’, shrink=0.05))
25. How do you create a violin plot in Seaborn?
Sns.violinplot(x=’species’, y=’sepal_length’, data=data)
Plt.show()
26. How do you create a strip plot in Seaborn?
Sns.stripplot(x=’species’, y=’sepal_length’, data=data)
Plt.show()
27. How do you create a jointplot in Seaborn?
Sns.jointplot(x=’sepal_length’, y=’sepal_width’, data=data,
kind=’hex’)
Plt.show()
28. How do you create a swarm plot in Seaborn?
Sns.swarmplot(x=’species’, y=’petal_length’, data=data)
Plt.show()
29. How do you overlay multiple plots in Matplotlib?
Plt.plot(x, y1, label=’Line 1’)
Plt.plot(x, y2, label=’Line 2’)
Plt.legend()
Plt.show()
30. How do you set axis limits in Matplotlib?
Plt.xlim(0, 10)
Plt.ylim(0, 100)
Plotly Questions
31. How do you install Plotly?
Pip install plotly
32. How do you create a basic scatter plot in Plotly?
Import plotly.express as px
Fig = px.scatter(x=[1, 2, 3], y=[4, 5, 6])
Fig.show()
33. How do you create an interactive line chart in Plotly?
Fig = px.line(x=[1, 2, 3], y=[3, 1, 6])
Fig.show()
34. How do you create a bar chart in Plotly?
Fig = px.bar(x=[‘A’, ‘B’, ‘C’], y=[4, 7, 3])
Fig.show()
35. How do you create a histogram in Plotly?
Fig = px.histogram(x=[1, 2, 2, 3, 3, 3])
Fig.show()
Advanced Customization
36. How do you combine multiple plots in Plotly?
From plotly.subplots import make_subplots
Import plotly.graph_objects as go
Fig = make_subplots(rows=1, cols=2)
Fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]), row=1, col=1)
Fig.add_trace(go.Bar(x=[‘A’, ‘B’], y=[3, 5]), row=1, col=2)
Fig.show()
37. How do you customize axis labels in Plotly?
Fig.update_layout(xaxis_title=”X Axis”, yaxis_title=”Y Axis”)
38. How do you create a 3D scatter plot in Plotly?
Fig = px.scatter_3d(x=[1, 2, 3], y=[4, 5, 6], z=[7, 8, 9])
Fig.show()
39. How do you add tooltips in Plotly?
Fig = px.scatter(x=[1, 2, 3], y=[4, 5, 6], hover_name=[‘Point 1’,
‘Point 2’, ‘Point 3’])
Fig.show()
40. How do you create a bubble chart in Plotly?
Fig = px.scatter(x=[1, 2, 3], y=[4, 5, 6], size=[10, 20, 30])
Fig.show()
Challenge Questions
41. How do you create a time series plot in Matplotlib?
42. How do you use a colormap in Matplotlib scatter plots?
43. How do you visualize missing data with Seaborn?
44. How do you customize figure margins in Plotly?
45. How do you create a sunburst chart in Plotly?
46. How do you create a density plot in Seaborn?
47. How do you add multiple legends in Matplotlib?
48. How do you use categorical data in Seaborn?
49. How do you create annotations in Plotly?
50. How do you create a waterfall chart in Plotly?
EDA QUESTIONS:-
Basic EDA Questions
1. How do you import essential libraries for EDA in Python?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Answer: This imports libraries for data handling (pandas), numerical
computations (numpy), and visualization (matplotlib and seaborn).
2. How do you read a CSV file into a DataFrame?
df = pd.read_csv('file_name.csv')
Answer: pd.read_csv loads the file into a pandas DataFrame.
3. How do you display the first five rows of a DataFrame?
print(df.head())
Answer: df.head() returns the first five rows.
4. How do you display the last five rows of a DataFrame?
print(df.tail())
Answer: df.tail() returns the last five rows.
5. How do you check the shape (rows and columns) of a DataFrame?
print(df.shape)
Answer: Returns a tuple with the number of rows and columns.
6. How do you get a summary of the DataFrame?
print(df.info())
Answer: df.info() provides details about the DataFrame, including
column names, data types, and non-null counts.
7. How do you describe the numerical columns of a DataFrame?
print(df.describe())
Answer: df.describe() provides statistics like mean, median, and
standard deviation for numerical columns.
8. How do you list all column names in a DataFrame?
print(df.columns)
Answer: Returns an index object with column names.
9. How do you check for missing values in a DataFrame?
print(df.isnull().sum())
Answer: Shows the total number of missing values in each column.
10. How do you fill missing values with the mean of a column?
df['column_name'].fillna(df['column_name'].mean(), inplace=True)
---
Intermediate Data Manipulation
11. How do you drop rows with missing values?
df.dropna(inplace=True)
12. How do you drop columns with missing values?
df.dropna(axis=1, inplace=True)
13. How do you rename a column in a DataFrame?
df.rename(columns={'old_name': 'new_name'}, inplace=True)
14. How do you filter rows based on a condition?
filtered_df = df[df['column_name'] > 50]
15. How do you sort a DataFrame by a specific column?
sorted_df = df.sort_values(by='column_name', ascending=True)
16. How do you create a new column based on existing columns?
df['new_column'] = df['column1'] + df['column2']
17. How do you group data by a column and calculate an aggregate?
grouped_df = df.groupby('group_column')['target_column'].mean()
18. How do you reset the index of a DataFrame?
df.reset_index(drop=True, inplace=True)
19. How do you find duplicate rows in a DataFrame?
duplicates = df[df.duplicated()]
20. How do you drop duplicate rows?
df.drop_duplicates(inplace=True)
---
Visualization Questions
21. How do you plot a histogram of a column?
df['column_name'].hist(bins=30)
plt.show()
22. How do you create a boxplot for a column?
sns.boxplot(x=df['column_name'])
plt.show()
23. How do you create a scatter plot for two columns?
plt.scatter(df['column1'], df['column2'])
plt.show()
24. How do you create a heatmap of correlations?
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()
25. How do you plot a line graph of a column?
df['column_name'].plot()
plt.show()
26. How do you create a bar plot for categorical data?
df['column_name'].value_counts().plot(kind='bar')
plt.show()
27. How do you create a pairplot for the DataFrame?
sns.pairplot(df)
plt.show()
28. How do you visualize missing data using a heatmap?
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.show()
29. How do you plot a KDE (Kernel Density Estimation) plot?
sns.kdeplot(df['column_name'])
plt.show()
30. How do you create a count plot for categorical columns?
sns.countplot(x='column_name', data=df)
plt.show()
---
Advanced EDA Operations
31. How do you calculate correlation between two columns?
correlation = df['column1'].corr(df['column2'])
32. How do you find the unique values in a column?
unique_values = df['column_name'].unique()
33. How do you count occurrences of unique values in a column?
value_counts = df['column_name'].value_counts()
34. How do you sample random rows from a DataFrame?
sample_df = df.sample(n=10)
35. How do you detect outliers using IQR?
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['column_name'] < Q1 - 1.5 * IQR) |
(df['column_name'] > Q3 + 1.5 * IQR)]
36. How do you normalize a column?
df['normalized'] = (df['column_name'] - df['column_name'].min()) /
(df['column_name'].max() - df['column_name'].min())
37. How do you standardize a column?
df['standardized'] = (df['column_name'] - df['column_name'].mean())
/ df['column_name'].std()
38. How do you merge two DataFrames?
merged_df = pd.merge(df1, df2, on='common_column')
39. How do you concatenate two DataFrames?
concatenated_df = pd.concat([df1, df2])
40. How do you pivot a DataFrame?
pivot_df = df.pivot(index='index_column', columns='columns_name',
values='values_column')
---
Case Study-Like Questions
41. How do you detect skewness in a column?
skewness = df['column_name'].skew()
42. How do you calculate kurtosis for a column?
kurtosis = df['column_name'].kurt()
43. How do you split data into train and test sets?
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, random_state=42)
44. How do you encode categorical variables?
df['encoded_column'] = pd.factorize(df['column_name'])[0]
45. How do you convert a column to datetime?
df['date_column'] = pd.to_datetime(df['date_column'])
46. How do you extract the year from a datetime column?
df['year'] = df['date_column'].dt.year
47. How do you handle highly correlated features?
corr_matrix = df.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape),
k=1).astype(bool))
to_drop = [column for column in upper.columns if
any(upper[column] > 0.9)]
df.drop(to_drop, axis=1, inplace=True)
48. How do you calculate z-scores for outlier detection?
from scipy.stats import zscore
df['z_score'] = zscore(df['column_name'])
49. How do you perform one-hot encoding on a column?
df = pd.get_dummies(df, columns=['column_name'])
50. How do you save a DataFrame to a CSV file?
df.to_csv('output.csv', index=False)
MySQL ( BASIC TO ADVANCED ) QUESTIONS:-
Basic MySQL Coding Questions
1. How to create a database in MySQL?
CREATE DATABASE my_database;
2. How to show all databases?
SHOW DATABASES;
3. How to create a table?
CREATE TABLE employees (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
age INT,
department VARCHAR(50)
);
4. How to insert data into a table?
INSERT INTO employees (name, age, department)
VALUES ('John Doe', 30, 'Sales');
5. How to retrieve all data from a table?
SELECT * FROM employees;
6. How to retrieve specific columns from a table?
SELECT name, age FROM employees;
7. How to update data in a table?
UPDATE employees
SET age = 31
WHERE name = 'John Doe';
8. How to delete data from a table?
DELETE FROM employees
WHERE name = 'John Doe';
9. How to delete a table?
DROP TABLE employees;
10. How to add a new column to a table?
ALTER TABLE employees
ADD COLUMN salary DECIMAL(10, 2);
---
Intermediate MySQL Coding Questions
11. How to find the number of rows in a table?
SELECT COUNT(*) FROM employees;
12. How to filter data using conditions?
SELECT * FROM employees
WHERE age > 25 AND department = 'Sales';
13. How to sort data in ascending or descending order?
SELECT * FROM employees
ORDER BY age ASC; -- or DESC
14. How to use LIKE for pattern matching?
SELECT * FROM employees
WHERE name LIKE 'J%';
15. How to find unique values in a column?
SELECT DISTINCT department FROM employees;
16. How to calculate the sum of a column?
SELECT SUM(salary) FROM employees;
17. How to calculate the average of a column?
SELECT AVG(age) FROM employees;
18. How to find the maximum and minimum values in a column?
SELECT MAX(age) AS max_age, MIN(age) AS min_age FROM
employees;
19. How to group data by a column?
SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department;
20. How to filter grouped data using HAVING?
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING avg_salary > 50000;
---
Joins in MySQL
21. How to use an inner join?
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department = d.id;
22. How to use a left join?
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department = d.id;
23. How to use a right join?
SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department = d.id;
24. How to use a cross join?
SELECT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;
25. How to join more than two tables?
SELECT e.name, d.department_name, m.manager_name
FROM employees e
INNER JOIN departments d ON e.department = d.id
INNER JOIN managers m ON d.manager_id = m.id;
---
Advanced MySQL Coding Questions
26. How to create a stored procedure?
DELIMITER //
CREATE PROCEDURE GetEmployeeCount()
BEGIN
SELECT COUNT(*) FROM employees;
END //
DELIMITER ;
27. How to call a stored procedure?
CALL GetEmployeeCount();
28. How to create a trigger?
CREATE TRIGGER before_insert_employee
BEFORE INSERT ON employees
FOR EACH ROW
SET NEW.created_at = NOW();
29. How to use a subquery in the WHERE clause?
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
30. How to use a subquery in the FROM clause?
SELECT department, avg_salary
FROM (SELECT department, AVG(salary) AS avg_salary FROM
employees GROUP BY department) AS sub;
31. How to use a correlated subquery?
SELECT name, salary
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE
e1.department = e2.department);
32. How to create an index?
CREATE INDEX idx_department ON employees(department);
33. How to find the second highest salary?
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
34. How to delete duplicate rows from a table?
DELETE FROM employees
WHERE id NOT IN (
SELECT MIN(id)
FROM employees
GROUP BY name, department
);
35. How to handle NULL values in queries?
SELECT name, IFNULL(salary, 0) AS salary FROM employees;
---
Performance and Optimization
36. How to check query execution time?
SET PROFILING = 1;
SELECT * FROM employees;
SHOW PROFILES;
37. How to optimize a slow query?
EXPLAIN SELECT * FROM employees WHERE department = 'Sales';
38. How to create a view?
CREATE VIEW employee_view AS
SELECT name, department FROM employees WHERE age > 30;
39. How to use a view?
SELECT * FROM employee_view;
40. How to drop a view?
DROP VIEW employee_view;
---
Data Integrity and Constraints
41. How to add a primary key to a table?
ALTER TABLE employees ADD PRIMARY KEY (id);
42. How to add a foreign key constraint?
ALTER TABLE employees
ADD CONSTRAINT fk_department
FOREIGN KEY (department) REFERENCES departments(id);
43. How to add a unique constraint?
ALTER TABLE employees
ADD CONSTRAINT unique_name UNIQUE (name);
44. How to drop a foreign key constraint?
ALTER TABLE employees DROP FOREIGN KEY fk_department;
45. How to enforce NOT NULL on a column?
ALTER TABLE employees MODIFY COLUMN name VARCHAR(50) NOT
NULL;
---
Miscellaneous
46. How to export a table to a file?
SELECT * FROM employees INTO OUTFILE '/tmp/employees.csv'
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
47. How to import data from a file?
LOAD DATA INFILE '/tmp/employees.csv'
INTO TABLE employees
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
48. How to perform a full-text search?
SELECT * FROM employees WHERE MATCH(name, department)
AGAINST('Sales');
49. How to use the CASE statement?
SELECT name,
CASE
WHEN age < 30 THEN 'Young'
WHEN age BETWEEN 30 AND 50 THEN 'Middle-aged'
ELSE 'Senior'
END AS age_group
FROM employees;
50. How to back up a database?
mysqldump -u root -p my_database > backup.sql
DATABASE DESIGN QUESTIONS:-
Here are 50 important and commonly asked coding questions
related to Database Design, ranging from basic to intermediate
levels, along with their answers:
Basic Questions
1. Create a Database in SQL.
CREATE DATABASE School;
2. Create a Table in SQL.
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT,
Grade CHAR(1)
);
3. Insert Data into a Table.
INSERT INTO Students (StudentID, Name, Age, Grade)
VALUES (1, ‘John Doe’, 15, ‘A’);
4. Retrieve All Data from a Table.
SELECT * FROM Students;
5. Retrieve Specific Columns.
SELECT Name, Grade FROM Students;
6. Update Data in a Table.
UPDATE Students
SET Age = 16
WHERE StudentID = 1;
7. Delete Data from a Table.
DELETE FROM Students
WHERE StudentID = 1;
8. Add a New Column to an Existing Table.
ALTER TABLE Students
ADD Address VARCHAR(255);
9. Delete a Column from a Table.
ALTER TABLE Students
DROP COLUMN Address;
10. Create a Table with a Foreign Key.
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100),
StudentID INT,
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);
Intermediate Questions
11. Write a Query to Retrieve Data Using a Join.
SELECT Students.Name, Courses.CourseName
FROM Students
INNER JOIN Courses
ON Students.StudentID = Courses.StudentID;
12. Write a Query to Count the Number of Rows in a Table.
SELECT COUNT(*) AS TotalStudents
FROM Students;
13. Find Duplicate Records in a Table.
SELECT Name, COUNT(*)
FROM Students
GROUP BY Name
HAVING COUNT(*) > 1;
14. Write a Query Using GROUP BY.
SELECT Grade, COUNT(*) AS StudentCount
FROM Students
GROUP BY Grade;
15. Write a Query Using HAVING.
SELECT Grade, COUNT(*) AS StudentCount
FROM Students
GROUP BY Grade
HAVING COUNT(*) > 2;
16. Find Students with a Specific Pattern in Their Name.
SELECT *
FROM Students
WHERE Name LIKE ‘J%’;
17. Find Students Older than a Certain Age.
SELECT *
FROM Students
WHERE Age > 15;
18. Delete Duplicate Rows from a Table.
DELETE FROM Students
WHERE StudentID NOT IN (
SELECT MIN(StudentID)
FROM Students
GROUP BY Name, Age, Grade
);
19. Write a Query to Create an Index.
CREATE INDEX idx_name
ON Students (Name);
20. Write a Query to Drop an Index.
DROP INDEX idx_name ON Students;
Design and Normalization
21. Explain First Normal Form (1NF) with an Example.
A table is in 1NF if all columns contain atomic values.
Example: A table where a column stores multiple phone numbers
violates 1NF.
22. Explain Second Normal Form (2NF) with an Example.
A table is in 2NF if it is in 1NF and has no partial dependencies.
Example: Remove attributes dependent only on part of a composite
key.
23. Explain Third Normal Form (3NF) with an Example.
A table is in 3NF if it is in 2NF and has no transitive dependencies.
Example: Move non-primary-key attributes dependent on other non-
primary-key attributes to a separate table.
24. Design a Table for Many-to-Many Relationships.
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
25. Normalize a Table with Repeating Groups.
Convert:
| OrderID | Product1 | Product2 |
To:
| OrderID | Product |
Queries for Real-World Problems
26. Write a Query to Find the Second Highest Age.
SELECT MAX(Age)
FROM Students
WHERE Age < (SELECT MAX(Age) FROM Students);
27. Write a Query to Find Students Without Courses.
SELECT *
FROM Students
WHERE StudentID NOT IN (
SELECT DISTINCT StudentID
FROM Courses
);
28. Write a Query to Find the Number of Students in Each
Course.
SELECT CourseName, COUNT(StudentID) AS StudentCount
FROM Courses
GROUP BY CourseName;
29. Write a Query to Retrieve the Top 3 Oldest Students.
SELECT *
FROM Students
ORDER BY Age DESC
LIMIT 3;
30. Write a Query to Find Students Enrolled in Multiple
Courses.
SELECT StudentID, COUNT(CourseID) AS CourseCount
FROM Courses
GROUP BY StudentID
HAVING COUNT(CourseID) > 1;
Database Administration
31. Grant Privileges to a User.
GRANT SELECT, INSERT ON Students TO ‘user1’@’localhost’;
32. Revoke Privileges from a User.
REVOKE INSERT ON Students FROM ‘user1’@’localhost’;
33. Create a View.
CREATE VIEW StudentGrades AS
SELECT Name, Grade
FROM Students;
34. Drop a View.
DROP VIEW StudentGrades;
35. Backup a Database.
Mysqldump -u username -p School > backup.sql
Intermediate Query Problems
36. Write a Query Using a Subquery.
SELECT *
FROM Students
WHERE Age = (SELECT MAX(Age) FROM Students);
37. Write a Query to Find Overlapping Courses.
SELECT StudentID, COUNT(DISTINCT CourseID)
FROM StudentCourses
GROUP BY StudentID
HAVING COUNT(DISTINCT CourseID) > 1;
38. Write a Query Using EXISTS.
SELECT *
FROM Students
WHERE EXISTS (
SELECT * FROM Courses WHERE Students.StudentID =
Courses.StudentID
);
39. Write a Query to Create a Trigger.
CREATE TRIGGER UpdateTimestamp
BEFORE UPDATE ON Students
FOR EACH ROW
SET NEW.updated_at = NOW();
40. Write a Query to Drop a Trigger.
DROP TRIGGER UpdateTimestamp;
Miscellaneous Questions
41. Find Students with No Grades.
SELECT *
FROM Students
WHERE Grade IS NULL;
42. Write a Query to Remove Duplicate Rows.
DELETE FROM Students
WHERE StudentID NOT IN (
SELECT MIN(StudentID)
FROM Students
GROUP BY Name, Age, Grade
);
43. List All Table Names in a Database.
SHOW TABLES;
44. Find the Size of a Database.
SELECT table_schema AS DatabaseName,
SUM(data_length + index_length) / 1024 / 1024 AS SizeInMB
FROM information_schema.tables
GROUP BY table_schema;
45. Write a Query to Add a Composite Key.
ALTER TABLE StudentCourses
ADD PRIMARY KEY (StudentID, CourseID);
46. Find Tables Without Primary Keys.
SELECT TABLE_NAME
FROM information_schema.tables
WHERE TABLE_SCHEMA = ‘School’
AND TABLE_NAME NOT IN (
SELECT TABLE_NAME
FROM information_schema.key_column_usage
WHERE TABLE_SCHEMA = ‘School’
);
47. Write a Query to Create a Recursive Relationship.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
ManagerID INT,
FOREIGN KEY (ManagerID) REFERENCES Employees(EmployeeID)
);
48. Delete All Rows Without Dropping the Table.
DELETE FROM Students;
49. Find Unused Indexes.
SELECT *
FROM sys.dm_db_index_usage_stats
WHERE user_seeks = 0 AND user_scans = 0 AND user_lookups = 0;
50. Write a Query to Optimize a Table.
OPTIMIZE TABLE Students;
PYTHON PROGRAMMING QUESTIONS:-
Basic Python Questions
1. Write a Python program to check if a number is even or odd.
Num = int(input(“Enter a number: “))
If num % 2 == 0:
Print(“Even”)
Else:
Print(“Odd”)
2. Write a Python program to swap two variables.
A, b = 5, 10
A, b = b, a
Print(a, b)
3. Write a Python program to find the factorial of a number.
Def factorial(n):
Return 1 if n == 0 else n * factorial(n-1)
Print(factorial(5)) # Output: 120
4. Write a Python program to generate the Fibonacci series up to
n terms.
Def fibonacci(n):
A, b = 0, 1
For _ in range(n):
Print(a, end=” “)
A, b = b, a + b
Fibonacci(10)
5. Write a Python program to reverse a string.
S = “hello”
Print(s[::-1]) # Output: “olleh”
6. Write a Python program to check if a string is a palindrome.
S = “madam”
Print(s == s[::-1]) # Output: True
7. Write a Python program to count vowels in a string.
S = “hello world”
Vowels = “aeiou”
Count = sum(1 for char in s if char in vowels)
Print(count) # Output: 3
8. Write a Python program to calculate the sum of digits of a
number.
Num = 1234
Print(sum(int(digit) for digit in str(num))) # Output: 10
9. Write a Python program to check if a number is prime.
Def is_prime(num):
If num < 2:
Return False
For I in range(2, int(num**0.5) + 1):
If num % I == 0:
Return False
Return True
Print(is_prime(7)) # Output: True
10. Write a Python program to find the largest number in a
list.
Nums = [1, 5, 2, 8, 3]
Print(max(nums)) # Output: 8
Intermediate Python Questions
11. Write a Python program to remove duplicates from a list.
Nums = [1, 2, 2, 3, 4, 4, 5]
Print(list(set(nums))) # Output: [1, 2, 3, 4, 5]
12. Write a Python program to find the second largest
number in a list.
Nums = [1, 5, 2, 8, 3]
Nums.sort()
Print(nums[-2]) # Output: 5
13. Write a Python program to check if two strings are
anagrams.
S1, s2 = “listen”, “silent”
Print(sorted(s1) == sorted(s2)) # Output: True
14. Write a Python program to calculate the GCD of two
numbers.
Import math
Print(math.gcd(12, 18)) # Output: 6
15. Write a Python program to find the missing number in a
list.
Nums = [1, 2, 4, 5]
Print(set(range(1, 6)) – set(nums)) # Output: {3}
16. Write a Python program to count the frequency of
elements in a list.
From collections import Counter
Nums = [1, 2, 2, 3, 3, 3]
Print(Counter(nums)) # Output: Counter({3: 3, 2: 2, 1: 1})
17. Write a Python program to merge two dictionaries.
D1 = {‘a’: 1, ‘b’: 2}
D2 = {‘b’: 3, ‘c’: 4}
D1.update(d2)
Print(d1) # Output: {‘a’: 1, ‘b’: 3, ‘c’: 4}
18. Write a Python program to check if a number is an
Armstrong number.
Num = 153
Print(num == sum(int(digit)**3 for digit in str(num))) # Output:
True
19. Write a Python program to find the intersection of two
lists.
L1 = [1, 2, 3]
L2 = [2, 3, 4]
Print(list(set(l1) & set(l2))) # Output: [2, 3]
20. Write a Python program to find the longest word in a
string.
S = “Python programming is fun”
Print(max(s.split(), key=len)) # Output: “programming”
Algorithmic Questions
21. Write a Python program to implement binary search.
Def binary_search(nums, target):
Left, right = 0, len(nums) – 1
While left <= right:
Mid = (left + right) // 2
If nums[mid] == target:
Return mid
Elif nums[mid] < target:
Left = mid + 1
Else:
Right = mid – 1
Return -1
Print(binary_search([1, 2, 3, 4, 5], 3)) # Output: 2
22. Write a Python program to implement bubble sort.
Def bubble_sort(nums):
N = len(nums)
For I in range(n):
For j in range(0, n-i-1):
If nums[j] > nums[j+1]:
Nums[j], nums[j+1] = nums[j+1], nums[j]
Nums = [64, 34, 25, 12, 22]
Bubble_sort(nums)
Print(nums) # Output: [12, 22, 25, 34, 64]
23. Write a Python program to calculate the power of a
number without using **.
Def power(base, exp):
Result = 1
For _ in range(exp):
Result *= base
Return result
Print(power(2, 3)) # Output: 8
24. Write a Python program to find all prime numbers up to a
given number.
Def primes_upto(n):
Primes = []
For I in range(2, n + 1):
If all(I % p != 0 for p in primes):
Primes.append(i)
Return primes
Print(primes_upto(10)) # Output: [2, 3, 5, 7]
25. Write a Python program to generate all permutations of a
list.
From itertools import permutations
Print(list(permutations([1, 2, 3])))