0% found this document useful (0 votes)

6 views16 pages

Top 100 Data Analyst Interview Questions

1. What is the difference between INNER JOIN and LEFT JOIN in SQL?
INNER JOIN returns only matching rows; LEFT JOIN returns all rows from the left table and matches from the right,
with NULLs if no match.

2. How would you find duplicate records in a SQL table?

SELECT column, COUNT(*)

FROM table
GROUP BY column
HAVING COUNT(*) > 1;

3. Explain normalization in SQL and why it’s important.

Normalization organizes data to reduce redundancy. Common forms:

● 1NF: Atomic columns

● 2NF: Remove partial dependencies

● 3NF: Remove transitive dependencies

It improves integrity and efficiency.

4. What’s the difference between WHERE and HAVING in SQL?

● WHERE: filters rows before grouping

● HAVING: filters groups after aggregation

5. Write a query to get the second highest salary from a table.

SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

6. What is a window function in SQL?

Performs calculations across rows related to the current row (e.g., ROW_NUMBER(), RANK(), LAG(), SUM()
OVER())
7. What is a CTE in SQL and when do you use it?
Common Table Expression (WITH clause) improves readability and simplifies nested queries.

8. How do you use VLOOKUP in Excel?

VLOOKUP is a function in Excel that searches for a value in the first column of a range and returns a value in
the same row from a specified column.
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Used to find corresponding data from another column.

9. What is a Pivot Table used for in Excel?

Summarizes large datasets dynamically using drag-and-drop rows, columns, values, filters.

10. How do you remove duplicates from a column in Excel?

Select column → Data tab → Remove Duplicates

11. What is the difference between COUNT, COUNT(*), and COUNT(column) in SQL?

● COUNT(*): total rows

● COUNT(column): non-NULL values

● COUNT(DISTINCT column): unique non-NULL values

12. How do you calculate the median in SQL?

Use PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column) in databases that support it.

13. In pandas, how do you drop rows with missing values?

df.dropna()

14. What is the use of .groupby() in pandas?

Groups data based on one or more columns to apply aggregation like sum, count, mean, etc.
15. How do you merge two datasets in pandas?
pd.merge(df1, df2, on='key', how='inner')

16. What is the difference between supervised and unsupervised learning?

● Supervised: labeled data (e.g., regression, classification)

● Unsupervised: no labels (e.g., clustering, PCA)

17. What is linear regression used for?

Predicts a continuous value by modeling the relationship between dependent and independent variables.

18. What is overfitting in machine learning?

When a model learns the training data too well, including noise, leading to poor generalization on unseen
data.

19. How would you handle missing values in a dataset?

Options:

● Drop rows (dropna())

● Fill with mean/median/mode (fillna())

● Use prediction models or interpolation

20. How do you visualize distributions in Python?

Using seaborn.histplot() or sns.boxplot() for outliers and spread.

21. What is the difference between correlation and covariance?

● Covariance: direction of relationship

● Correlation: direction + strength (scaled -1 to 1)

22. How would you detect outliers in a dataset?

● Boxplot (IQR method)

● Z-score method

● Visualization (scatter/histogram)

23. What does the COUNTIFS function do in Excel?

Counts rows that meet multiple criteria.
=COUNTIFS(range1, criteria1, range2, criteria2, ...)

24. What is data wrangling?

Transforming and cleaning raw data into a usable format: handling NA, renaming, reshaping, type
conversion.

25. What is the purpose of exploratory data analysis (EDA)?

To understand patterns, detect anomalies, check assumptions, and build intuition before modeling.

26. What is a JOIN in SQL?

A JOIN combines rows from two or more tables based on a related column. Common types include:

● INNER JOIN

● LEFT JOIN

● RIGHT JOIN

● FULL OUTER JOIN

27. What is a subquery in SQL?

A subquery is a query within another query, used to return a result for the main query, often in WHERE or
FROM clauses.

28. How would you find the unique values in a SQL column?

SELECT DISTINCT column_name FROM table;

29. What are aggregate functions in SQL?
Functions that summarize data, such as:

● COUNT()

● SUM()

● AVG()

● MIN()

● MAX()

30. Explain what a Normal Distribution is in statistics.

A symmetrical, bell-shaped probability distribution where most observations cluster around the mean.

31. What is the purpose of GROUP BY in SQL?

Groups rows sharing the same value into summary rows, like SUM(), COUNT(), AVG(), etc.

32. How do you perform data cleaning in pandas?

Steps might include:

● Handling missing values (fillna(), dropna())

● Removing duplicates (drop_duplicates())

● Correcting data types (astype())

33. What are primary and foreign keys in SQL?

● Primary Key: Unique identifier for a record in a table

● Foreign Key: Links to a primary key in another table

34. How do you calculate the mode in SQL?

SELECT column, COUNT(*) AS freq

FROM table

GROUP BY column

ORDER BY freq DESC

LIMIT 1;

35. What is a UNION in SQL?

Combines results from two or more queries and eliminates duplicates.

SELECT column FROM table1

UNION

SELECT column FROM table2;

36. How do you use RANK() in SQL?

The RANK() function assigns a unique rank to each row in a partition of the result set.

SELECT column1, RANK() OVER (ORDER BY column2 DESC) AS rank

FROM table;

37. Explain the difference between a clustered and non-clustered index in SQL.

● Clustered Index: Determines the physical order of data in the table. Only one per table.

● Non-Clustered Index: A separate structure pointing to the data, multiple allowed.

38. How do you calculate cumulative sum in SQL?

SELECT column1,

SUM(column2) OVER (ORDER BY column1) AS cumulative_sum

FROM table;
39. What is the difference between a procedure and a function in SQL?

● Procedure: Executes a set of SQL statements, can modify data, no return value.

● Function: Returns a single value, used in queries.

40. How would you handle an imbalanced dataset in machine learning?

Techniques include:

● Resampling (over-sampling or under-sampling)

● Synthetic data generation (SMOTE)

● Adjusting class weights in models

41. What is a decision tree algorithm in machine learning?

A tree-like structure used for classification and regression tasks. Splits data into subsets based on feature
values.

42. How do you handle categorical variables in machine learning?

● Label Encoding

● One-Hot Encoding

● Frequency Encoding

43. What is cross-validation in machine learning?

Cross-validation splits the dataset into multiple folds to train and test the model on different subsets,
helping prevent overfitting.

44. What is precision and recall in classification problems?

● Precision: Measures the accuracy of positive predictions.

● Recall: Measures the ability to find all positive instances.

45. Explain the difference between linear regression and logistic regression.

● Linear Regression: Predicts continuous values.

● Logistic Regression: Predicts probabilities of categorical outcomes (usually binary).

46. How do you detect multicollinearity in a dataset?

Using:

● Correlation matrix

● Variance Inflation Factor (VIF)

47. What is the use of .apply() in pandas?

It allows you to apply a function along a specific axis (rows/columns) of a DataFrame.

48. How would you handle missing values in Excel?

● Manual fill: Replace with mean, median, or custom values.

● Data tools: Use Find & Replace or Go To Special for handling blanks.

49. How do you visualize a correlation matrix in Python?

Using seaborn.heatmap()

sns.heatmap(df.corr(), annot=True)

50. What is a confusion matrix in machine learning?

A table used to evaluate classification algorithms, showing the actual vs predicted classifications:

● True Positive (TP)

● False Positive (FP)

● True Negative (TN)

● False Negative (FN)

51. What is the purpose of using a "WHERE" clause in SQL?

The WHERE clause is used to filter records that meet specific conditions. It is applied before any aggregation
or grouping.

52. What is normalization in the context of data cleaning?

Normalization adjusts the scale of numeric data to a standard range (e.g., 0 to 1), which is important for
many machine learning models.

53. What is a histogram and how do you interpret it?

A histogram shows the frequency distribution of a continuous variable. It helps identify patterns like
skewness, spread, or outliers.

54. How would you identify trends in a dataset?

By:

● Plotting time series data (using line charts)

● Using moving averages

● Checking correlations over time

55. How do you use "GROUP BY" with an aggregate function in SQL?
You use GROUP BY to aggregate data based on a specified column. Example

SELECT department, AVG(salary)

FROM employees
GROUP BY department;

56. What is a pivot chart and when would you use it?
A Pivot Chart is a graphical representation of data summarized in a Pivot Table. It’s useful for visualizing
trends and patterns in grouped data.

57. How do you perform outlier detection in Excel?

● Use Conditional Formatting to highlight data points above or below a certain threshold.

● Use Z-Score or IQR method in formulas to find outliers.

58. What is the difference between a LEFT JOIN and an OUTER JOIN in SQL?

● LEFT JOIN: Returns all records from the left table, matched with records from the right table, filling
unmatched rows with NULL.

● OUTER JOIN: A general term for both LEFT and RIGHT joins.

59. Explain the difference between covariance and correlation.

● Covariance: Measures the degree to which two variables change together.

● Correlation: A standardized version of covariance, scaled between -1 and 1.

60. What is a "case statement" in SQL?

A CASE statement allows conditional logic within SQL queries to return different values based on conditions.
Example:

SELECT name,
CASE
WHEN age >= 18 THEN 'Adult'
ELSE 'Minor'
END AS status
FROM users;

61. What is the role of a "foreign key" in SQL?

A foreign key is a column in one table that uniquely identifies a row in another table, establishing a
relationship between them.

62. How do you calculate the percentile in SQL?

In SQL, you can use PERCENTILE_CONT() for calculating percentiles.
Example:

SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY column) AS percentile_90

FROM table;
63. Explain the concept of data wrangling.
Data wrangling is the process of cleaning, structuring, and transforming raw data into a usable format for
analysis, often involving missing data handling, type conversion, and reshaping.

64. What are some common aggregation functions used in SQL?

● SUM()

● AVG()

● COUNT()

● MAX()

● MIN()

65. What is the difference between "UNION" and "UNION ALL" in SQL?

● UNION: Combines results and removes duplicates.

● UNION ALL: Combines results and includes duplicates.

66. What is the use of "HAVING" in SQL?

HAVING is used to filter data after it’s grouped, often with aggregate functions like SUM(), COUNT(), or
AVG().

67. How do you perform linear regression in Python?

Using the LinearRegression class from sklearn:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

68. How do you handle categorical data in machine learning?

● Label Encoding: Converts categories into numbers.

● One-Hot Encoding: Creates binary columns for each category.

69. What is feature scaling and why is it important?

Feature scaling standardizes the range of features (using methods like MinMax Scaling or Standardization).
It’s important for algorithms like K-means or SVM that rely on distance.

70. Explain the purpose of using "RANK()" and "DENSE_RANK()" in SQL.

● RANK(): Assigns a rank but leaves gaps in ranking when there are ties.

● DENSE_RANK(): Assigns a rank without gaps, even when there are ties.

71. What is the difference between supervised and unsupervised learning?

● Supervised Learning: Uses labeled data for training. Common algorithms: Decision Trees, Linear
Regression.

● Unsupervised Learning: No labeled data, used to identify patterns or groupings. Common

algorithms: K-means, PCA.

72. How do you calculate the Z-score of a dataset?

Z-score is calculated as:

Z=(X−mean)/standarddeviationZ = (X - mean) / standard deviation Z=(X−mean)/standarddeviation

It shows how far a data point is from the mean in terms of standard deviations.

73. What is the difference between classification and regression?

● Classification: Predicts discrete labels (e.g., spam vs. not spam).

● Regression: Predicts continuous values (e.g., house prices).

74. How do you create a scatter plot in Python?

Using matplotlib:

import matplotlib.pyplot as plt

plt.scatter(x, y)
plt.show()

75. What is the purpose of "Logistic Regression"?

Used for binary classification tasks where the output is either 0 or 1, it predicts probabilities based on a
logistic function.

76. What is the difference between bagging and boosting?

● Bagging: Reduces variance by training multiple models independently and averaging predictions
(e.g., Random Forest).

● Boosting: Reduces bias by sequentially training models, each correcting the previous one's errors
(e.g., XGBoost, AdaBoost).

77. What is a box plot and how do you interpret it?

A box plot shows the distribution of data based on quartiles. It highlights the median, interquartile range
(IQR), and outliers.

78. What are the key assumptions of linear regression?

1. Linearity

2. Independence

3. Homoscedasticity (constant variance)

4. Normality of errors

79. What is the purpose of cross-validation in machine learning?

Cross-validation splits the data into multiple subsets to evaluate the model’s performance across different
data sets, helping to detect overfitting.

80. How do you perform a Chi-Square test in statistics?

The Chi-Square test evaluates if there is a significant association between categorical variables.
Formula:

χ2=Σ[(O−E)2/E]χ² = Σ[(O - E)² / E] χ2=Σ[(O−E)2/E]

Where O = observed, E = expected frequency.

81. What are the benefits of using Power BI over Excel for reporting?
Power BI provides more powerful data processing, visualization, and interactive dashboards, allowing
integration from multiple data sources with real-time updates.

82. What is the purpose of a confusion matrix in machine learning?

A confusion matrix is a table that helps evaluate the performance of a classification model by comparing
actual and predicted values.

83. How would you deal with missing values in a time series dataset?

● Interpolate missing values using linear or forward fill.

● Use time-based imputation methods to fill in missing data points.

84. How do you calculate precision, recall, and F1-score in a classification problem?

● Precision = TP / (TP + FP)

● Recall = TP / (TP + FN)

● F1-score = 2 * (Precision * Recall) / (Precision + Recall)

85. How would you visualize a time series dataset in Python?

Using matplotlib or seaborn to plot data over time

import matplotlib.pyplot as plt

plt.plot(time, data)
plt.show()

86. What is an ROC curve and what does it represent?

An ROC (Receiver Operating Characteristic) curve is used to evaluate classification models. It plots the True
Positive Rate (TPR) vs. False Positive Rate (FPR) at various thresholds.
87. What is multicollinearity and how do you detect it?
Multicollinearity occurs when two or more predictor variables are highly correlated. It can be detected using
the Variance Inflation Factor (VIF).

88. What is the difference between deep learning and machine learning?

● Machine Learning: Uses algorithms to learn patterns from data.

● Deep Learning: A subset of ML with neural networks that learn from large amounts of data,
especially for complex tasks like image recognition.

89. What is PCA (Principal Component Analysis)?

PCA is a dimensionality reduction technique that transforms correlated variables into a smaller set of
uncorrelated variables called principal components.

90. What are precision-recall curves, and when should they be used?
Precision-recall curves are used in classification problems with imbalanced datasets. They help evaluate the
trade-off between precision and recall at different thresholds.

91. How would you handle large datasets in Python?

● Use dask for parallel processing of large datasets.

● Use chunksize while reading large CSV files to avoid memory overload.

92. How do you handle categorical variables in Excel?

By creating dummy variables (one-hot encoding) or using VLOOKUP to map categories to numeric values.

93. What is a time series analysis and when would you use it?
Time series analysis is used to analyze data points collected or recorded at specific time intervals. It’s useful
for forecasting and detecting trends.

94. What is the difference between L1 and L2 regularization?

● L1 Regularization: Adds the absolute value of the coefficients to the loss function, promoting
sparsity (Lasso).
● L2 Regularization: Adds the squared value of the coefficients, reducing large coefficients but not
zeroing them out (Ridge).

95. What is clustering in machine learning?

Clustering is an unsupervised learning technique used to group similar data points together based on
certain characteristics (e.g., K-means, DBSCAN).

96. What is the difference between a sample and a population in statistics?

● Population: Entire set of data points you are interested in.

● Sample: A subset of the population used for analysis.

97. What is the significance of the p-value in hypothesis testing?

The p-value measures the strength of evidence against the null hypothesis. A lower p-value indicates
stronger evidence that the null hypothesis can be rejected.

98. How do you identify the best-fit line in a scatter plot?

By applying linear regression or visually plotting the line that minimizes the residuals between actual and
predicted data points.

99. How do you evaluate the performance of a regression model?

● R² (R-squared) value indicates the proportion of variance explained by the model.

● Mean Absolute Error (MAE) and Mean Squared Error (MSE) for error measurement.

100. How would you interpret a correlation coefficient of 0.8?

A correlation coefficient of 0.8 indicates a strong positive linear relationship between two variables.

Top 100 Data Analyst Q A For Freshers 1755501520
No ratings yet
Top 100 Data Analyst Q A For Freshers 1755501520
9 pages
100 Interview Questions
No ratings yet
100 Interview Questions
15 pages
Questions
No ratings yet
Questions
4 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
DS - Sample Questions (Practical)
No ratings yet
DS - Sample Questions (Practical)
8 pages
? Data Analysis Interview Questions & Answers
No ratings yet
? Data Analysis Interview Questions & Answers
7 pages
ACKO MOCKDRIVEQuestions and Answers
No ratings yet
ACKO MOCKDRIVEQuestions and Answers
7 pages
Easy Interview Questions
No ratings yet
Easy Interview Questions
8 pages
Complete 50 Data Analyst Questions
No ratings yet
Complete 50 Data Analyst Questions
7 pages
50 Common Data Analyst Interview Questions
No ratings yet
50 Common Data Analyst Interview Questions
3 pages
Data Analyst Q&A
No ratings yet
Data Analyst Q&A
3 pages
Data Analyst Interview Answers
No ratings yet
Data Analyst Interview Answers
4 pages
Real Data Analyst Interview Questions Detailed
No ratings yet
Real Data Analyst Interview Questions Detailed
14 pages
Top 100 Data Analyst Questions 1 To 60
No ratings yet
Top 100 Data Analyst Questions 1 To 60
14 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
9 pages
IP Class 12 UT2
No ratings yet
IP Class 12 UT2
2 pages
Text 4
No ratings yet
Text 4
1 page
10 Most Commonly Asked DA Interview Questions and Answers
No ratings yet
10 Most Commonly Asked DA Interview Questions and Answers
3 pages
Day 2 Python Interview QnA
No ratings yet
Day 2 Python Interview QnA
15 pages
Most Asked Interview Questions For Data Analyst
No ratings yet
Most Asked Interview Questions For Data Analyst
10 pages
Data Analysis Interview Questions
No ratings yet
Data Analysis Interview Questions
2 pages
Full Data Analyst Fresher Interview QA
No ratings yet
Full Data Analyst Fresher Interview QA
4 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
3 pages
Recently Asked Data Analyst Interview Questions-2
No ratings yet
Recently Asked Data Analyst Interview Questions-2
4 pages
A Complete Data Science Interview With 100 Questions
100% (1)
A Complete Data Science Interview With 100 Questions
57 pages
Question With Sample Answer
No ratings yet
Question With Sample Answer
8 pages
Capgemini Data Analyst Interview Prep
No ratings yet
Capgemini Data Analyst Interview Prep
3 pages
Frequently Asked Interview Questions For Data Analyst Role
No ratings yet
Frequently Asked Interview Questions For Data Analyst Role
12 pages
All SQL Interviews
No ratings yet
All SQL Interviews
84 pages
Real Data Analyst Interview Questions Answers
No ratings yet
Real Data Analyst Interview Questions Answers
15 pages
Ip MS
No ratings yet
Ip MS
6 pages
Final Exam Sample
No ratings yet
Final Exam Sample
8 pages
Python & SQL Exam Paper
No ratings yet
Python & SQL Exam Paper
9 pages
SQL Questions
No ratings yet
SQL Questions
25 pages
Top Advanced SQL Interview Questions & Answers
No ratings yet
Top Advanced SQL Interview Questions & Answers
6 pages
Data Science Course for Professionals
No ratings yet
Data Science Course for Professionals
21 pages
Internal Mock Ques
No ratings yet
Internal Mock Ques
6 pages
Deloitte Data Analyst Interview Guide
No ratings yet
Deloitte Data Analyst Interview Guide
34 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
7 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
6 pages
Data Analyst Interview Q
No ratings yet
Data Analyst Interview Q
14 pages
Todays Assessment Questions
No ratings yet
Todays Assessment Questions
14 pages
Amazon Data Analyst Interview Prep
No ratings yet
Amazon Data Analyst Interview Prep
24 pages
Informatics Practices-Sahodaya QP New
No ratings yet
Informatics Practices-Sahodaya QP New
15 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
XIIInfo Pract S E 273
No ratings yet
XIIInfo Pract S E 273
8 pages
Pandas Test
No ratings yet
Pandas Test
6 pages
Interview Questions
No ratings yet
Interview Questions
29 pages
Viva
No ratings yet
Viva
7 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
3 pages
Interview Questions For Data Analyst
No ratings yet
Interview Questions For Data Analyst
31 pages
TCS Data Analyst Interview Questions and Answers (2025)
No ratings yet
TCS Data Analyst Interview Questions and Answers (2025)
5 pages
Questions and Answers
No ratings yet
Questions and Answers
7 pages
100+ Data Analyst Interview QnA PDF
No ratings yet
100+ Data Analyst Interview QnA PDF
19 pages
Data Analytics & Science USING Machine Learning and AI
No ratings yet
Data Analytics & Science USING Machine Learning and AI
12 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Data Analyst Interview Questions 1748954226
No ratings yet
Data Analyst Interview Questions 1748954226
4 pages
DLT Unit-5
No ratings yet
DLT Unit-5
10 pages
Rapid Revision Cloze Test Class-1 Notes
No ratings yet
Rapid Revision Cloze Test Class-1 Notes
18 pages
Email Spam Detection
No ratings yet
Email Spam Detection
5 pages
ReactJs Syllabus
No ratings yet
ReactJs Syllabus
17 pages
SMS Spam Prediction
No ratings yet
SMS Spam Prediction
18 pages
Campus Sustainability Project
No ratings yet
Campus Sustainability Project
2 pages
11/7/2024 Sigma Coal Traders NAD-007 Balochistan Coal 234-SIGMA-L-1029 (200 MT) 37.500 TON 6.30 X GCV (ARB)
No ratings yet
11/7/2024 Sigma Coal Traders NAD-007 Balochistan Coal 234-SIGMA-L-1029 (200 MT) 37.500 TON 6.30 X GCV (ARB)
4 pages
Complete Bundle Solutions Manual For History of Mathematics 3rd Edition by Katz
100% (1)
Complete Bundle Solutions Manual For History of Mathematics 3rd Edition by Katz
408 pages
Boiler Name
No ratings yet
Boiler Name
15 pages
SATURN v11.3.12 Manual (Main) PDF
No ratings yet
SATURN v11.3.12 Manual (Main) PDF
1,009 pages
FH344 GB
100% (1)
FH344 GB
20 pages
Manila Electric Company V NLRC, Signo GR 78763
No ratings yet
Manila Electric Company V NLRC, Signo GR 78763
2 pages
Top 10 Life-Changing Leadership Books
No ratings yet
Top 10 Life-Changing Leadership Books
14 pages
Solasa
No ratings yet
Solasa
2 pages
SMK3
No ratings yet
SMK3
25 pages
Call For Applications For Admission Into ODeL Undergraduate Programmes-1
No ratings yet
Call For Applications For Admission Into ODeL Undergraduate Programmes-1
3 pages
Media Production & Coordination Expert
No ratings yet
Media Production & Coordination Expert
1 page
Actions in Case of Breach of Contract of Carriage
No ratings yet
Actions in Case of Breach of Contract of Carriage
11 pages
Keng Hua Products v. CA (Digest)
No ratings yet
Keng Hua Products v. CA (Digest)
3 pages
Overview of Soviet UCG 172541
No ratings yet
Overview of Soviet UCG 172541
60 pages
Marketing Aspects: Bruce R. Barringer R. Duane Ireland
No ratings yet
Marketing Aspects: Bruce R. Barringer R. Duane Ireland
31 pages
Filter Implementation and Evaluation Using Matlab
100% (1)
Filter Implementation and Evaluation Using Matlab
14 pages
GP Elec 2011
No ratings yet
GP Elec 2011
225 pages
Background Investigation Form
100% (1)
Background Investigation Form
2 pages
FPJ International SCHOOL Epaper-08!10!2025
No ratings yet
FPJ International SCHOOL Epaper-08!10!2025
35 pages
Is 13730 0 6 2012
No ratings yet
Is 13730 0 6 2012
21 pages
Power BI Data Modeling Guide
No ratings yet
Power BI Data Modeling Guide
47 pages
Digital Meter User Guide
100% (1)
Digital Meter User Guide
17 pages
Fraud Methods and Resources
No ratings yet
Fraud Methods and Resources
4 pages
Auditing Inventory Challenges
No ratings yet
Auditing Inventory Challenges
4 pages
Core Space FAQ 1.3
No ratings yet
Core Space FAQ 1.3
10 pages
Barangay Budget
100% (2)
Barangay Budget
39 pages
Hindu Women's Rights To Property Act, 1937
No ratings yet
Hindu Women's Rights To Property Act, 1937
2 pages
GST Invoice
No ratings yet
GST Invoice
13 pages
Honey Bunz Bakery Business Plan
50% (2)
Honey Bunz Bakery Business Plan
12 pages
Hva Intro Video Guidelines
No ratings yet
Hva Intro Video Guidelines
1 page

Top 100 Data Analyst Interview Questions

Uploaded by

Top 100 Data Analyst Interview Questions

Uploaded by

Top 100 Data Analyst Interview Questions

2. How would you find duplicate records in a SQL table?

SELECT column, COUNT(*)

3. Explain normalization in SQL and why it’s important.​

●​ 1NF: Atomic columns​

●​ 2NF: Remove partial dependencies​

●​ 3NF: Remove transitive dependencies​

4. What’s the difference between WHERE and HAVING in SQL?

●​ WHERE: filters rows before grouping​

●​ HAVING: filters groups after aggregation​

5. Write a query to get the second highest salary from a table.

6. What is a window function in SQL?​

8. How do you use VLOOKUP in Excel?

9. What is a Pivot Table used for in Excel?​

10. How do you remove duplicates from a column in Excel?​

●​ COUNT(*): total rows​

●​ COUNT(column): non-NULL values​

●​ COUNT(DISTINCT column): unique non-NULL values​

12. How do you calculate the median in SQL?​

13. In pandas, how do you drop rows with missing values?​

14. What is the use of .groupby() in pandas?​

16. What is the difference between supervised and unsupervised learning?

●​ Supervised: labeled data (e.g., regression, classification)​

●​ Unsupervised: no labels (e.g., clustering, PCA)​

17. What is linear regression used for?​

18. What is overfitting in machine learning?​

19. How would you handle missing values in a dataset?​

●​ Drop rows (dropna())​

●​ Fill with mean/median/mode (fillna())​

●​ Use prediction models or interpolation​

20. How do you visualize distributions in Python?​

21. What is the difference between correlation and covariance?

●​ Covariance: direction of relationship​

●​ Correlation: direction + strength (scaled -1 to 1)​

22. How would you detect outliers in a dataset?

23. What does the COUNTIFS function do in Excel?​

24. What is data wrangling?​

25. What is the purpose of exploratory data analysis (EDA)?​

26. What is a JOIN in SQL?​

●​ FULL OUTER JOIN​

27. What is a subquery in SQL?​

SELECT DISTINCT column_name FROM table;

30. Explain what a Normal Distribution is in statistics.​

31. What is the purpose of GROUP BY in SQL?​

32. How do you perform data cleaning in pandas?​

●​ Handling missing values (fillna(), dropna())​

●​ Removing duplicates (drop_duplicates())​

●​ Correcting data types (astype())​

33. What are primary and foreign keys in SQL?

●​ Primary Key: Unique identifier for a record in a table​

●​ Foreign Key: Links to a primary key in another table​

34. How do you calculate the mode in SQL?

SELECT column, COUNT(*) AS freq

ORDER BY freq DESC

35. What is a UNION in SQL?​

SELECT column FROM table1

SELECT column FROM table2;

36. How do you use RANK() in SQL?​

SELECT column1, RANK() OVER (ORDER BY column2 DESC) AS rank

●​ Non-Clustered Index: A separate structure pointing to the data, multiple allowed.​

38. How do you calculate cumulative sum in SQL?

SUM(column2) OVER (ORDER BY column1) AS cumulative_sum

●​ Function: Returns a single value, used in queries.​

40. How would you handle an imbalanced dataset in machine learning?​

●​ Resampling (over-sampling or under-sampling)​

●​ Synthetic data generation (SMOTE)​

●​ Adjusting class weights in models​

41. What is a decision tree algorithm in machine learning?​

42. How do you handle categorical variables in machine learning?

43. What is cross-validation in machine learning?​

44. What is precision and recall in classification problems?

●​ Precision: Measures the accuracy of positive predictions.​

●​ Recall: Measures the ability to find all positive instances.​

●​ Linear Regression: Predicts continuous values.​

●​ Logistic Regression: Predicts probabilities of categorical outcomes (usually binary).​

46. How do you detect multicollinearity in a dataset?​

3. Explain normalization in SQL and why it’s important.

● 1NF: Atomic columns

● 2NF: Remove partial dependencies

● 3NF: Remove transitive dependencies

● WHERE: filters rows before grouping

● HAVING: filters groups after aggregation

6. What is a window function in SQL?

9. What is a Pivot Table used for in Excel?

10. How do you remove duplicates from a column in Excel?

● COUNT(*): total rows

● COUNT(column): non-NULL values

● COUNT(DISTINCT column): unique non-NULL values

12. How do you calculate the median in SQL?

13. In pandas, how do you drop rows with missing values?

14. What is the use of .groupby() in pandas?

● Supervised: labeled data (e.g., regression, classification)

● Unsupervised: no labels (e.g., clustering, PCA)

17. What is linear regression used for?

18. What is overfitting in machine learning?

19. How would you handle missing values in a dataset?

● Drop rows (dropna())

● Fill with mean/median/mode (fillna())

● Use prediction models or interpolation

20. How do you visualize distributions in Python?

● Covariance: direction of relationship

● Correlation: direction + strength (scaled -1 to 1)

23. What does the COUNTIFS function do in Excel?

24. What is data wrangling?

25. What is the purpose of exploratory data analysis (EDA)?

26. What is a JOIN in SQL?

● FULL OUTER JOIN

27. What is a subquery in SQL?

30. Explain what a Normal Distribution is in statistics.

31. What is the purpose of GROUP BY in SQL?

32. How do you perform data cleaning in pandas?

● Handling missing values (fillna(), dropna())

● Removing duplicates (drop_duplicates())

● Correcting data types (astype())

● Primary Key: Unique identifier for a record in a table

● Foreign Key: Links to a primary key in another table

35. What is a UNION in SQL?

36. How do you use RANK() in SQL?

● Non-Clustered Index: A separate structure pointing to the data, multiple allowed.

● Function: Returns a single value, used in queries.

40. How would you handle an imbalanced dataset in machine learning?

● Resampling (over-sampling or under-sampling)

● Synthetic data generation (SMOTE)

● Adjusting class weights in models

41. What is a decision tree algorithm in machine learning?

43. What is cross-validation in machine learning?

● Precision: Measures the accuracy of positive predictions.

● Recall: Measures the ability to find all positive instances.

● Linear Regression: Predicts continuous values.

● Logistic Regression: Predicts probabilities of categorical outcomes (usually binary).

46. How do you detect multicollinearity in a dataset?

● Variance Inflation Factor (VIF)

47. What is the use of .apply() in pandas?

● Manual fill: Replace with mean, median, or custom values.

49. How do you visualize a correlation matrix in Python?

50. What is a confusion matrix in machine learning?

● True Positive (TP)

● False Positive (FP)

● True Negative (TN)

51. What is the purpose of using a "WHERE" clause in SQL?

52. What is normalization in the context of data cleaning?

53. What is a histogram and how do you interpret it?

54. How would you identify trends in a dataset?

● Plotting time series data (using line charts)

● Using moving averages

● Checking correlations over time

● Use Z-Score or IQR method in formulas to find outliers.

● Covariance: Measures the degree to which two variables change together.

● Correlation: A standardized version of covariance, scaled between -1 and 1.

60. What is a "case statement" in SQL?

61. What is the role of a "foreign key" in SQL?

62. How do you calculate the percentile in SQL?

● UNION: Combines results and removes duplicates.

● UNION ALL: Combines results and includes duplicates.

66. What is the use of "HAVING" in SQL?

67. How do you perform linear regression in Python?