0% found this document useful (0 votes)

23 views24 pages

Data Science Interview 2025

The document outlines the differences between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Data Science, highlighting their scopes, key traits, and examples. It also discusses various types of data analysis, including descriptive, diagnostic, predictive, and prescriptive analysis, along with the importance of Exploratory Data Analysis (EDA) in data science projects. Additionally, it covers the lifecycle of a data science project and the distinctions between descriptive and inferential statistics.

Uploaded by

Areesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views24 pages

Data Science Interview 2025

Uploaded by

Areesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

DIFFERENCE BETWEEN ML/DL/DATA SCIENCE/AI

✅ 1. Artificial Intelligence (AI)

AI is the science of creating systems that simulate human intelligence — enabling machines to reason,
learn, adapt, and act autonomously.

 Scope: Broadest field — includes logic, rules-based systems, ML, DL, robotics, etc.

 Key Trait: Decision-making ability like humans.

 Examples: Self-driving cars, fraud detection, AI-powered chatbots.

💡 Think of AI as the goal: making machines behave smartly.

✅ 2. Machine Learning (ML)

ML is a subset of AI that uses algorithms to learn patterns from data and make predictions or
decisions without being explicitly programmed.

 Scope: Statistical models trained on historical data.

 Key Trait: Learns from data, improves over time.

 Examples: Spam filters, churn prediction, recommendation engines.

💡 ML is the engine that powers AI behavior from data.

✅ 3. Deep Learning (DL)

DL is an advanced subset of ML that uses multi-layered neural networks to model complex patterns in
high-dimensional data. In much simpler terms, it replicates just like the human brain as all the neural
networks are connected in the brain

 Scope: Uses artificial neural networks (ANNs), particularly useful in unstructured data (images,
audio, text).

 Key Trait: Learns features automatically — needs big data + compute.

 Examples: Language translation, facial recognition, autonomous driving perception systems.

💡 DL mimics how the human brain works — with multiple layers of abstraction.

✅ 4. Data Science

Data Science is an interdisciplinary field that blends statistics, machine learning, domain expertise,
and programming to extract actionable insights from data.

 Scope: End-to-end data workflow — from collection to modeling and storytelling.

 Key Trait: Business impact from data-driven decisions.

 Examples: Sales forecasting, customer segmentation, A/B testing.

💡 Data Science = Insight + Impact, not just models.

WHERE ML,DEEP LEARING USED?

Criteria Machine Learning Deep Learning

Data Excellent performance on small/medium Excellent performance on large

Dependencies datasets datasets

Requires powerful machines with

Hardware
Works on low-end machines GPUs due to intensive matrix
Dependencies
computations

Feature Requires manual feature selection and Automatically learns relevant

Engineering understanding features from data

Can take days to weeks; neural

Execution Time Ranges from minutes to hours networks compute a large
number of weights

Some models (e.g., logistic regression, decision

Often difficult to interpret;
Interpretability trees) are interpretable; others (e.g., SVM,
considered a black-box approach
XGBoost) are harder to interpret

Types of Data Analysis

Descriptive analysis

Descriptive analysis, as the name suggests, describes or summarizes raw data and makes it
interpretable. It involves analyzing historical data to understand what has happened in the past. This
type of analysis is used to identify patterns and trends over time.

For example, a business might use descriptive analysis to understand the average monthly sales for the
past year.

Diagnostic analysis

Diagnostic analysis goes a step further than descriptive analysis by determining why something
happened. It involves more detailed data exploration and comparing different data sets to understand
the cause of a particular outcome.

For instance, if a company's sales dropped in a particular month, diagnostic analysis could be used to
find out why.

Predictive analysis

Predictive analysis uses statistical models and forecasting techniques to understand the future. It
involves using data from the past to predict what could happen in the future. This type of analysis is
often used in risk assessment, marketing, and sales forecasting.

For example, a company might use predictive analysis to forecast the next quarter's sales based on
historical data.

Prescriptive analysis
Prescriptive analysis is the most advanced type of data analysis. It not only predicts future outcomes
but also suggests actions to benefit from these predictions. It uses sophisticated tools and technologies
like machine learning and artificial intelligence to recommend decisions.

For example, a prescriptive analysis might suggest the best marketing strategies to increase future sales.

How to Explain EDA in an Interview:

“EDA is the process of exploring and understanding the data before applying any modeling. It helps
detect data quality issues, understand distributions, spot patterns or anomalies, and generate
hypotheses. I usually break it into five stages:”

✅ Complete Explanation with Practical & Interview-Ready Detail

🔹 1. Mean (Average)

➤ What it is:

The arithmetic average — add up all values and divide by the number of values.

➤ Why it's important:

Gives a single number that represents the center of the data. It’s used in many statistical methods like
regression.

➤ When to use it:

 When data is symmetrically distributed

 No extreme outliers or heavy tails

 For quick central tendency

➤ Insight from our example:

python
[30,000, 35,000, 38,000, 40,000, 5,000,000]
Mean = 1,028,600

💡 The mean is not representative here — it's misleading because the last value (5M) skews it
drastically.

➤ Interview Line:

“I always compare mean with median. If they differ a lot, the data is probably skewed — I may
need to use median instead.”

🔹 2. Median
➤ What it is:

The middle value in a sorted dataset. If the number of values is even, it’s the average of the two middle
values.

➤ Why it's important:

It’s robust to extreme values. It tells you what a typical individual looks like even when the data is
skewed.

➤ When to use:

 If you suspect outliers or skewed distribution

 When comparing typical behavior of users, prices, transactions, etc.

➤ Insight from example:

python

CopyEdit

Sorted: [30K, 35K, 38K, 40K, 5M]

Median = 38,000

💡 The median gives a much more accurate central income than the mean.

➤ Interview Line:

“In skewed distributions like income or house prices, I prefer median since it’s resistant to extreme
values.”

🔹 3. Standard Deviation (std)

➤ What it is:

It tells how much the data varies from the mean. It’s the square root of variance.

➤ Why it's important:

 Shows consistency or volatility

 Useful in understanding spread in time, price, or behavior

➤ When to use:

 To measure spread or variability

 To identify if data points are close to mean or highly scattered

➤ Insight from example:

Here, std will be very large because 5M is so far from the mean → high variability.
💡 If you’re building a model, such high std means you need to either normalize or handle outliers.

➤ Interview Line:

“High std tells me the data is inconsistent — so I may need to segment it or transform it.”

🔹 4. Min

➤ What it is:

The smallest value in the column.

➤ Why it's important:

 Helps identify errors, e.g., age = -10

 Understands lower bound of real-world values

➤ When to use:

 During data validation or range checking

 To detect possible invalid entries

➤ Insight from example:

python

Min = 30,000 → reasonable in income

But if it was -3000, that would indicate data entry error.

➤ Interview Line:

“I always check min/max to make sure no values fall outside acceptable real-world ranges.”

🔹 5. Max

➤ What it is:

The largest value in the column.

➤ Why it's important:

 Highlights potential outliers

 Reveals maximum bounds, which may require capping or transformation

➤ When to use:

 In fraud detection (e.g., suspiciously large transactions)

 In business planning to understand best-case scenarios

➤ Insight from example:

python

CopyEdit

Max = 5,000,000 → clearly an outlier

➤ Interview Line:

“If max is too far from median, I investigate whether it's an error, outlier, or VIP customer.”

🔹 6. Count

➤ What it is:

The number of non-null entries in a column.

➤ Why it's important:

 Shows data completeness

 Helps determine if a feature can be used as-is or needs imputation

➤ When to use:

 Before modeling or visualizing

 To decide whether to drop or fill missing values

➤ Insight:

If your dataset has 1000 rows, and income count is 850 → you have 150 missing values.

➤ Interview Line:

“I always check count early in EDA to find missing values that may break my model later.”

🔹 7. Skewness

➤ What it is:

Describes asymmetry of the distribution.

Value Interpretation

Skew = 0 Symmetrical

Skew > 0 Right-skewed (long right tail)

Skew < 0 Left-skewed (long left tail)

➤ When to use:

 To decide whether to transform features

 When choosing model assumptions (e.g., regression)

➤ Insight from example:

python

CopyEdit

Skew = high positive → long tail on right

💡 Could signal income disparity. You may want to use log(income) in modeling.

➤ Interview Line:

“I apply log/square-root transformations when I see strong skewness in numeric variables.”

Metric Best Used For What It Tells You What to Watch Out For

Central value (sensitive to

Mean Symmetric data Skewed by large values
outliers)

Median Skewed data Middle value (robust to outliers) Doesn’t reflect spread

Std Spread of data High = inconsistency Inflated by outliers

Validity check or outlier

Min/Max Range and anomalies May not be realistic
detection

Helps with missing value

Count Completeness check Nulls need attention
handling

Skewness Distribution shape Guides transformation decisions High skew = likely outliers

Tailedness (outlier High = unstable model

Kurtosis Shows risk of extreme values
detection) behavior

📚 EDA & Descriptive Statistics Interview Questions

1. General EDA Concepts

What is EDA? What is its purpose?

Exploratory Data Analysis (EDA) is the process of examining and visualizing data to extract insights,
identify patterns, detect anomalies, test assumptions, and prepare for modeling.
The primary goal of EDA is to understand the structure and quality of the data before applying any
machine learning algorithms. It acts as a diagnostic phase where we assess:

 Data distribution (e.g., normal vs skewed)

 Missing values and outliers

 Relationships between variables

 Trends, clusters, and anomalies

📊 Key Techniques Include:

 Descriptive statistics: mean, median, std, skew, kurtosis

 Visualizations: histograms, boxplots, scatter plots, correlation heatmaps

 Bivariate/multivariate analysis

Describe the lifecycle of a data science project.

Phase 1: Problem Definition (Business Understanding)

 Identify a business problem or opportunity that can be addressed through data analysis

 Define project goals, objectives, and key performance indicators (KPIs)

 Develop a clear understanding of the problem domain and stakeholders' needs

 Create a project proposal and obtain stakeholder buy-in

Phase 2: Data Collection and Ingestion

 Identify relevant data sources (internal and external)

 Collect, store, and process data from various sources (e.g., databases, APIs, files)

 Ensure data quality, integrity, and security

 Perform initial data exploration and cleaning

Phase 3: Data Exploration and Analysis (Data Understanding)

 Explore data distributions, relationships, and trends

 Perform statistical analysis and data visualization

 Identify correlations, patterns, and anomalies

 Develop a deeper understanding of the data and its limitations

Phase 4: Data Preprocessing and Feature Engineering

 Clean and preprocess data (e.g., handling missing values, outliers)

 Transform and normalize data (e.g., scaling, encoding)

 Create new features through feature engineering (e.g., dimensionality reduction, feature
extraction)

 Prepare data for modeling

Phase 5: Modeling and Algorithm Development

 Select suitable machine learning algorithms and techniques

 Train and test models using various evaluation metrics (e.g., accuracy, precision, recall)

 Perform hyperparameter tuning and model selection

 Develop a robust and accurate predictive model

Phase 6: Model Evaluation and Validation

 Evaluate model performance on unseen data (e.g., validation set, cross-validation)

 Assess model interpretability and explainability

 Validate model assumptions and limitations

 Compare model performance to baseline models or benchmarks

Phase 7: Deployment and Integration

 Deploy the model in a production-ready environment (e.g., API, container, cloud)

 Integrate the model with existing systems and workflows

 Ensure model scalability, reliability, and maintainability

 Monitor model performance and data drift

Phase 8: Monitoring and Maintenance

 Continuously monitor model performance and data quality

 Update and retrain models as needed (e.g., concept drift, data changes)

 Address model interpretability and explainability concerns

 Refine and improve the model over time

Phase 9: Communication and Storytelling

 Present findings and insights to stakeholders

 Communicate model results and recommendations

 Visualize and summarize complex data insights

 Drive business decisions and actions through data-driven storytelling

Keep in mind that this is a general outline, and the specifics may vary depending on the project's
scope, complexity, and requirements. A data science project lifecycle may be iterative, and some
phases may overlap or repeat. Effective project management, collaboration, and communication
are essential to ensure successful project outcomes.

Explain the difference between descriptive and inferential statistics.

Descriptive Statistics
Descriptive statistics aim to describe and summarize the basic features of a dataset. This type of
statistics provides an overview of the data, including:

1. Measures of Central Tendency: mean, median, mode

2. Measures of Variability: range, variance, standard deviation

3. Data Distribution: histograms, box plots, density plots

Descriptive statistics help you:

 Understand the data's shape and distribution

 Identify patterns and outliers

 Summarize large datasets

Examples of descriptive statistics:

 Calculating the average age of customers

 Creating a histogram to visualize the distribution of exam scores

Inferential Statistics
Inferential statistics aim to make inferences or conclusions about a population based on a
sample of data. This type of statistics helps you:

1. Test Hypotheses: determine if a relationship exists between variables

2. Estimate Population Parameters: make educated estimates about population characteristics

3. Predict Outcomes: forecast future events or trends

Inferential statistics involve:

1. Sampling: collecting a representative sample from a population

2. Statistical Modeling: using statistical techniques to analyze the sample data

3. Inference: drawing conclusions about the population based on the sample data

Examples of inferential statistics:

 Conducting a t-test to compare the average scores of two groups

 Using regression analysis to predict house prices based on features like location and size

Key differences:

1. Purpose: Descriptive statistics describe and summarize data, while inferential statistics make
inferences about a population.

2. Scope: Descriptive statistics focus on the sample data, while inferential statistics aim to
generalize findings to a larger population.

3. Methodology: Descriptive statistics involve simple calculations and visualizations, while

inferential statistics require more complex statistical techniques and modeling.

By understanding the difference between descriptive and inferential statistics, you'll be better
equipped to analyze and interpret data, make informed decisions, and communicate your
findings effectively.

2. Univariate / Bivariate / Multivariate Analysis

Define and provide examples of univariate, bivariate, and multivariate analysis.

🔹 1. Univariate Analysis

Definition:
Analysis of a single variable to understand its distribution, central tendency, and variability.

Usage:

 Check data quality: spot outliers, missing values.

 Feature selection: determine if a variable has useful variance.

 Business reporting: e.g., average customer age, most frequent product sold.

Techniques:

 Numeric: mean, median, std, histogram, boxplot

 Categorical: value_counts(), bar plot

Example:
Understanding the age distribution of customers.
🔹 2. Bivariate Analysis

Definition:
Analysis of two variables to discover relationships or dependencies.

Usage:

 Explore feature-target relationships before modeling

 Detect correlation (e.g., does experience influence salary?)

 Find interactions (e.g., does gender affect churn?)

Techniques:

 Numeric vs Numeric: scatter plot, correlation

 Categorical vs Numeric: boxplot, grouped mean

 Categorical vs Categorical: crosstab, stacked bar chart

Example:
Analyzing how experience impacts salary.

python

CopyEdit

df = pd.DataFrame({'experience': [1, 2, 3, 4, 5], 'salary': [30, 40, 45, 60, 80]})

sns.scatterplot(x='experience', y='salary', data=df)

plt.title("Bivariate Analysis - Experience vs Salary")

plt.show()
🔹 3. Multivariate Analysis

Definition:
Analysis of three or more variables to understand combined effects, patterns, and interactions.

Usage:

 Build predictive models (regression, classification)

 Reduce dimensionality (e.g., using PCA)

 Detect hidden patterns (e.g., clustering, segmentations)

 Multicollinearity checks before modeling

Techniques:

 pairplot, heatmap, multiple regression, PCA, clustering, decision trees

How do you analyze numerical vs categorical variables
Numerical plot

Method Purpose What It Shows

Frequency of values,
Histogram Understand distribution
skewness, modality

Min, Q1, Median, Q3, Max,

Boxplot Detect outliers and spread
outliers

Combine distribution +
Violin Plot Density + IQR by group
spread

Distribution shape (useful

Density Plot (KDE) Smoothed histogram
for comparing)

Changes in numeric value

Line Plot Trend over time (time series)
over time

Relationship between two Correlation, clusters,

Scatter Plot
numeric variables patterns

Heatmap (Correlation Numeric feature Correlation values among

Matrix) relationships features

Categorical plot
Method Purpose What It Shows

Show count or average per

Bar Plot Height = frequency or summary value
category

Count Plot Quick count of categories Good for class imbalance

Relative % (only for small category

Pie Chart (less preferred) Proportion of categories
counts)

Distribution within/between
Stacked Bar Plot Categorical variable comparisons
categories

Boxplot (if mixed with

Numeric stats per category e.g., income by gender
numeric)

3. Basic Descriptive Metrics

What is variance vs. standard deviation?

Concept Definition

The average of the squared differences from the mean. It measures how far each
Variance (σ² or Var)
data point is from the mean on average, but in squared units.

Standard Deviation The square root of the variance. It brings the measure of spread back to the
(σ or std) original units of the data, making it easier to interpret.

I use standard deviation when I want to understand or explain the spread in real-world units — for
example, saying that salaries typically vary by 10,000 PKR is meaningful.

But I use variance when I’m doing internal calculations or modeling — it’s mathematically convenient
because it’s additive and shows up in many algorithms like PCA, clustering, and Gaussian models

Define range, IQR (interquartile range), skewness, and kurtosis — what do they
tell us?
Range, IQR, Skewness, and Kurtosis

Range
Definition The difference between the maximum and minimum value in a dataset.

Formula Range = Max − Min

Use Gives a quick estimate of how spread out the values are.

Limitation Highly sensitive to outliers — doesn't show how values are distributed in between.

🔍 Example:
If income = [30k, 35k, 40k, 200k] → Range = 200k - 30k = 170k

2️⃣ IQR (Interquartile Range)

The range between the 25th percentile (Q1) and 75th percentile (Q3) — the middle 50%
Definition
of the data.

Formula IQR = Q3 − Q1

Measures spread of central values, robust to outliers. Used in boxplots and outlier
Use
detection.

Outliers Rule Common rule: anything below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR is an outlier.

🔍 Example:
If Q1 = 40k and Q3 = 70k → IQR = 30k

Example Meaning

25th percentile (Q1) 25% of the data falls below this value

50th percentile (Median) 50% of the data falls below this value

75th percentile (Q3) 75% of the data falls below this value

Formula (Conceptual)

Percentiles are calculated by:

1. Sorting the data in ascending order

2. Finding the rank using:

Rank=P100×(n+1)\text{Rank} = \frac{P}{100} \times (n + 1)Rank=100P×(n+1)

where P is the percentile and n is the number of data points.

3️⃣ Skewness

Definition Measures the asymmetry of the distribution.

Skew = 0 Data is symmetrical (normal distribution)

Skew > 0 Right-skewed (long tail on the right)

Skew < 0 Left-skewed (long tail on the left)

Use Tells if you should apply transformations (e.g., log) before modeling.

🔍 Example:
Income data is usually right-skewed — most people earn around the average, but a few earn a lot more.

✅ Fix right skew: Apply log, sqrt, or Box-Cox transformation.

 Positive skew: Mean > Median

 Negative skew: Mean < Median

4. Missing Data & Outliers

 How do you detect and handle missing values? (e.g., MCAR, MAR, MNAR, imputation)

Can be fixed by
Type Meaning Example
Imputation?

MCAR (Missing Missingness has A sensor fails

Completely at no pattern — it's to transmit ✅ Yes
Random) random some readings

Missing depends Income

MAR (Missing at on other missing more
✅ Yes
Random) observed often for
variables younger users

❌ Hard to fix
Missing depends People with
MNAR (Missing (requires modeling
on unobserved high income
Not at Random) or domain
data don’t report it
knowledge)

Handling Missing Values (Imputation Strategies)

Strategy When to Use Notes

When few rows/cols are Use only when impact is

Drop rows/columns
missing negligible

Mean/Median MCAR or MAR for numerical Median is more robust to

Imputation values outliers

Fill with most frequent

Mode Imputation For categorical variables
value

When data has patterns

KNN Imputation Uses neighboring points
between features

Regression Predict missing value from More accurate but adds

Imputation other variables complexity

Indicator for Create a new feature: Especially useful for tree-

missing was_missing based models

When missing is MAR and you Multiple Imputation with

Advanced: MICE
want statistical reliability Chained Equations

I first analyze the missingness type — whether it’s MCAR, MAR, or MNAR — using .isnull(), heatmaps,
and checking if missingness correlates with other features.

For MCAR or MAR, I often use mean/median/mode imputation depending on data type. For more
accurate models, I might use KNN imputation or predictive models. If it’s MNAR, I consult domain
experts or use techniques like creating missing indicators.

I always assess the impact of imputation on the distribution and model performance using visualizations
and cross-validation

How do you detect outliers? (e.g., IQR, z-scores, boxplots)

1. IQR (Interquartile Range) Method

📌 Concept:

 Based on the middle 50% of the data

 Values below or above a certain range are flagged as outliers

🧮 Formula:

IQR=Q3−Q1IQR = Q3 - Q1 IQR=Q3−Q1

 Lower bound = Q1 − 1.5 × IQR

 Upper bound = Q3 + 1.5 × IQR

 Any data point outside this range is an outlier

✅ Best for:

 Non-normal, skewed data

 Easy and interpretable

2. Z-Score Method (Standard Deviation Method)

📌 Concept:

 Measures how many standard deviations a point is from the mean

 Z > 3 or Z < -3 → outlier (assuming normal distribution)

🧮 Formula:

Z=(x−μ)σZ = \frac{(x - \mu)}{\sigma}Z=σ(x−μ)

✅ Best for:

 Normally distributed data

 Fast and simple method

3️⃣ Boxplot – Visualization-Based Detection

🔹 Concept:

 Boxplots display:

o Median (Q2)

o Q1 and Q3 (box)

o Whiskers (min/max within 1.5×IQR)

o Outliers (points outside the whiskers)

📊 Use:

 Quickly visualize the spread

 Spot outliers by eye

 Great for comparing distributions by category

What are strategies to treat outliers? (capping, winsorization, removal)
trategies to Treat Outliers

(Capping, Winsorization, Removal & More)

1️⃣ Remove Outliers

🔹 What it is Completely delete rows containing outliers

✅ Best When Outliers are errors, or make up <5% of data

❌ Avoid When Data is small or outliers are important (e.g., fraud)

🧪 Python:

# Using IQR

Q1 = df['value'].quantile(0.25)

Q3 = df['value'].quantile(0.75)

IQR = Q3 - Q1

lower = Q1 - 1.5 * IQR

upper = Q3 + 1.5 * IQR

df_clean = df[(df['value'] >= lower) & (df['value'] <= upper)]

2️⃣ Capping (Truncation)

🔹 What it is Replace outliers with upper or lower thresholds (based on IQR or percentiles)

✅ Best When You want to keep dataset size but reduce extreme influence

❌ Avoid When Business logic demands raw extreme values

🧪 Python (95% cap):

lower_cap = df['value'].quantile(0.05)

upper_cap = df['value'].quantile(0.95)

df['value'] = df['value'].clip(lower_cap, upper_cap)

3️⃣ Winsorization

Like capping, but replaces top/bottom X% with percentile values instead of just
🔹 What it is
thresholds

✅ Best When You want a robust, statistical way to reduce influence

🔧 Tool Use scipy.stats.mstats.winsorize()

🧪 Python:

from scipy.stats.mstats import winsorize

df['value_winsor'] = winsorize(df['value'], limits=[0.05, 0.05]) # Cap bottom/top 5%

4️⃣ Transformation (Log / Sqrt)

🔹 What it is Apply a mathematical transformation to reduce skew

✅ Best When Right-skewed distributions or outliers that need soft adjustment

❌ Avoid When Data contains zeros or negatives (for log)

🧪 Python:

import numpy as np

df['log_value'] = np.log1p(df['value']) # log1p = log(1 + x)

5. Distribution Analysis

How do you check if data is normally distributed?

 What is the empirical rule (68-95-99.7)? Medium

6. Skewness & Kurtosis

 Define skewness and kurtosis, and explain what they imply about data distribution.
GeeksforGeeks+15Analytics Vidhya+15365 Data Science+15

 How can skewness or kurtosis impact your model?

 What transformations help address skewness/kurtosis issues?

7. Correlation & Multicollinearity

 What is the difference between covariance and correlation?

 How do you detect and handle multicollinearity? (e.g., correlation matrix, VIF)
Medium+15AmanXai+15aiquest.org+15Exponent+5Medium+5Analytics Vidhya+5
8. Feature Reduction

 How does PCA (Principal Component Analysis) work for dimensionality reduction?

9. Statistical Testing & Confidence

 Explain hypothesis testing (null/alternative), t-tests, chi-square, ANOVA, p-values, and

confidence intervals.

 What is the Central Limit Theorem and why is it important? Medium

10. Advanced & Miscellaneous

 What is autocorrelation, and how does it differ from correlation?

 Explain sampling distribution vs. probability distribution. 365 Data Science

 What is the difference between one-tailed vs two-tailed hypothesis testing?

ListenData+15Analytics Vidhya+15Analytics Vidhya+15

 Define type I vs type II errors. Wikipedia+3Analytics Vidhya+3Analytics Vidhya+3

Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Data Science - PPT
No ratings yet
Data Science - PPT
45 pages
Crash Course - Introduction To Data Science
No ratings yet
Crash Course - Introduction To Data Science
121 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
25 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Unit I and Unit II Dev
No ratings yet
Unit I and Unit II Dev
36 pages
Data Science Analytics Module
No ratings yet
Data Science Analytics Module
5 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
Wa0000.
No ratings yet
Wa0000.
63 pages
3 UNIT-3 Big Data Analytics
No ratings yet
3 UNIT-3 Big Data Analytics
200 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Data Mining and Preprocessing Guide
No ratings yet
Data Mining and Preprocessing Guide
40 pages
Unit 1
No ratings yet
Unit 1
9 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Week 1
No ratings yet
Week 1
50 pages
23SC3201 Data Science and Challenges-2
No ratings yet
23SC3201 Data Science and Challenges-2
28 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Abhijitya Midsem
No ratings yet
Abhijitya Midsem
6 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
Research Assignment 02burhan Ul Din
No ratings yet
Research Assignment 02burhan Ul Din
8 pages
Summer Training
No ratings yet
Summer Training
8 pages
UNIT 2 DT
No ratings yet
UNIT 2 DT
8 pages
AI Module3 CH2
No ratings yet
AI Module3 CH2
13 pages
Data Analytics
No ratings yet
Data Analytics
22 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
13 pages
Interview Prep Guide
No ratings yet
Interview Prep Guide
31 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
? Data Analytics Revision
No ratings yet
? Data Analytics Revision
7 pages
Analytical Decision Making
No ratings yet
Analytical Decision Making
27 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
UNIT-2 - Data Science (Partial)
No ratings yet
UNIT-2 - Data Science (Partial)
21 pages
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
No ratings yet
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
63 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
Unit 2
No ratings yet
Unit 2
48 pages
Imp Mcs226
No ratings yet
Imp Mcs226
321 pages
Unit 1
No ratings yet
Unit 1
36 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Data Mining Introduction & Techniques
No ratings yet
Data Mining Introduction & Techniques
9 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Cami16 Data Analytics
No ratings yet
Cami16 Data Analytics
37 pages
Data Science Day 1
No ratings yet
Data Science Day 1
22 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Introduction To Data Mining and Analytics
No ratings yet
Introduction To Data Mining and Analytics
1,668 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Project Report
No ratings yet
Project Report
29 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
Pant D. Statistics For Data Scientists and Analysts... Using Python 2025
No ratings yet
Pant D. Statistics For Data Scientists and Analysts... Using Python 2025
508 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Data Science
No ratings yet
Data Science
17 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Updated Poway Resume For Usc-2
No ratings yet
Updated Poway Resume For Usc-2
2 pages
Class 3 Maths
No ratings yet
Class 3 Maths
3 pages
Project Management Course Work
100% (3)
Project Management Course Work
8 pages
Tax Evasion Factors in Kunduz
No ratings yet
Tax Evasion Factors in Kunduz
5 pages
Evaluation Exam For NCM 101
No ratings yet
Evaluation Exam For NCM 101
4 pages
Invoice 2025PSLCE ANNASTAZIYA KASIYA 11401692515545
No ratings yet
Invoice 2025PSLCE ANNASTAZIYA KASIYA 11401692515545
1 page
Minerva Course Catalog
No ratings yet
Minerva Course Catalog
71 pages
DLP - 2nd Conditionals Expressing Argument
No ratings yet
DLP - 2nd Conditionals Expressing Argument
3 pages
Deep Learning for Stroke Classification
No ratings yet
Deep Learning for Stroke Classification
60 pages
NSTP1 MODULE-10 Answer
No ratings yet
NSTP1 MODULE-10 Answer
2 pages
????????????
No ratings yet
????????????
5 pages
STLS Revised Generic Session Record Sheet Nov 2020 Q 13 1789925 595
No ratings yet
STLS Revised Generic Session Record Sheet Nov 2020 Q 13 1789925 595
8 pages
20131a05a9 AICTE
No ratings yet
20131a05a9 AICTE
30 pages
ANHS GAD Plan 2021
No ratings yet
ANHS GAD Plan 2021
21 pages
GI - B1 - U3 Pracice Modals, To, Ing
No ratings yet
GI - B1 - U3 Pracice Modals, To, Ing
1 page
Swayam PDF
No ratings yet
Swayam PDF
1 page
Reflection Paper About The Movie "Every Child Is Special"
100% (1)
Reflection Paper About The Movie "Every Child Is Special"
2 pages
Annex Application of The Klic Leader Training Course 2025 1
No ratings yet
Annex Application of The Klic Leader Training Course 2025 1
13 pages
Resume (Karan Vishwakarma)
No ratings yet
Resume (Karan Vishwakarma)
1 page
Portofolio Bahasa Inggris KLS 12
No ratings yet
Portofolio Bahasa Inggris KLS 12
5 pages
Anshul Resume
No ratings yet
Anshul Resume
1 page
Grade 5 Science Test Analysis
100% (1)
Grade 5 Science Test Analysis
42 pages
Introduction To Fracture Mechanics Robert O. Ritchie Instant Download
No ratings yet
Introduction To Fracture Mechanics Robert O. Ritchie Instant Download
92 pages
Blended Learning Challenges in Philippines
No ratings yet
Blended Learning Challenges in Philippines
21 pages
Temporomandibular Disorders Guide
No ratings yet
Temporomandibular Disorders Guide
17 pages
Crete Shipping Interview Answers
No ratings yet
Crete Shipping Interview Answers
8 pages
10 Tips To Improve Public Speaking Skills
No ratings yet
10 Tips To Improve Public Speaking Skills
6 pages
College List
No ratings yet
College List
3 pages
Jackie Robinson Project Rubric
No ratings yet
Jackie Robinson Project Rubric
2 pages
Writing Activities
No ratings yet
Writing Activities
10 pages