0% found this document useful (0 votes)

37 views4 pages

Basic Statistics Concepts For Data Science

statistics concept

Uploaded by

nitdrjothilakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views4 pages

Basic Statistics Concepts For Data Science

statistics concept

Uploaded by

nitdrjothilakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Basic Statistics Concepts for Data

Science
1. Descriptive Statistics

It is used to describe the basic features of data that provide a summary of the

given data set which can either represent the entire population or a sample of

the population.

It is derived from calculations that include:

 Mean: It is the central value which is commonly known as arithmetic

average.

 Mode: It refers to the value that appears most often in a data set.

 Median: It is the middle value of the ordered set that divides it in exactly half .

2. Variability

Variability includes the following parameters:

 Standard Deviation: It is a statistic that calculates the dispersion of a data

set as compared.

 Variance: It refers to a statistical measure of the spread between the

numbers in a data set. In general terms, it means the difference from the

mean. A large variance indicates that numbers are far apart from average

value. Small variance indicates that the numbers are closer to the average

values. Zero variance indicates that the values are identical to the given set.
 Range: This is defined as the difference between the largest and smallest

value of a dataset.

 Percentile: It refers to the measure used in statistics that indicates the value

below which the given percentage of observation in the dataset falls.

 Quartile: It is defined as the value that divides the data points into quarters .

 Interquartile Range: It measures the middle half of your data . In general

terms, it is the middle 50% of the dataset.

3. Correlation

It is one of the major statistical techniques that measure the relationship

between two variables. The correlation coefficient indicates the strength of the

linear relationship between two variables.

 A correlation coefficient that is more than zero indicates a positive

relationship.

 A correlation coefficient that is less than zero indicates a negative

relationship.

 Correlation coefficient zero indicates that there is no relationship between

the two variables.

4. Probability Distribution

It specifies of all possible events. In simple terms, an event refers to the result

of an experiment. Events are of two types dependent and independent .

 Independent event: The event is said to be an Independent event when it is

not affected by the earlier events .

 Dependent event: The event is said to be dependent when the occurrence

of the event is dependent on the earlier events

The probability of independent events is calculated by simply multiplying the

probability of each event and for a dependent event is calculated by conditional

probability.

5. Regression

It is a method that is used to determine the relationship between one or more

independent variables and a dependent variable. Regression is mainly of two

types:

 Linear regression: It is used to fit the regression model that explains the

relationship between a numeric predictor variable and one or more predictor

variables.

 Logistic regression: It is used to fit a regression model that explains the

relationship between the binary response variable and one or more predictor

variables.
6. Normal Distribution

Normal is used to define the probability density function for a continuous

random variable in a system . The standard normal distribution has two

parameters – mean and standard deviation . When the distribution of random

variables is unknown, the normal distribution is used. The central limit theorem

justifies why normal distribution is used in such cases.

7. Bias

In statistical terms, it means when a model is representative of a complete

population. This needs to be minimized to get the desired outcome .

The three most common types of bias are:

 Selection bias: It is a phenomenon of selecting a group of data for statistical

analysis, the selection in such a way that data is not randomized resulting in

the data being unrepresentative of the whole population.

 Confirmation bias: It occurs when the person performing the statistical

analysis has some predefined assumption.

 Time interval bias: It is caused intentionally by specifying a certain time

range to favor a particular outcome.

7 Basic Statistics
No ratings yet
7 Basic Statistics
2 pages
1.1 CS3352-FDS - Unit 1
No ratings yet
1.1 CS3352-FDS - Unit 1
42 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Statistics
No ratings yet
Statistics
3 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Type II Error
No ratings yet
Type II Error
6 pages
Statistics - Material
No ratings yet
Statistics - Material
12 pages
Unit IV
No ratings yet
Unit IV
22 pages
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
No ratings yet
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
11 pages
EDA - Reviewer Midterm
No ratings yet
EDA - Reviewer Midterm
9 pages
Statistics For Business Analytics - Keywords List - Explanation With Example - SVN
No ratings yet
Statistics For Business Analytics - Keywords List - Explanation With Example - SVN
15 pages
Types of Statistics
No ratings yet
Types of Statistics
3 pages
Sebenta - Empirical Methods For Finance - Vasco Tamen Master's Course
No ratings yet
Sebenta - Empirical Methods For Finance - Vasco Tamen Master's Course
34 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Document 8
No ratings yet
Document 8
10 pages
Statistics - Compendium - DMS IIT DELHI - 2025
No ratings yet
Statistics - Compendium - DMS IIT DELHI - 2025
18 pages
DS Unit 2
No ratings yet
DS Unit 2
6 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Basics of Statistics
No ratings yet
Basics of Statistics
1 page
Ii Bba
No ratings yet
Ii Bba
16 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Statics Imp Answer
No ratings yet
Statics Imp Answer
14 pages
Statistics
No ratings yet
Statistics
12 pages
Basic Statistics Involve Analyzing
No ratings yet
Basic Statistics Involve Analyzing
2 pages
STATICS
No ratings yet
STATICS
12 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Buisness Statitics 2025 Bcom 1,2 MRK BIT
No ratings yet
Buisness Statitics 2025 Bcom 1,2 MRK BIT
4 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
38 pages
Ssmda End Sem
No ratings yet
Ssmda End Sem
152 pages
3 4 Research 8 2
No ratings yet
3 4 Research 8 2
54 pages
Q. Bank Final
No ratings yet
Q. Bank Final
9 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Intro to Data Analysis Stats
No ratings yet
Intro to Data Analysis Stats
7 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
STATISTICS
No ratings yet
STATISTICS
2 pages
Statistics 1: 2 Marks
No ratings yet
Statistics 1: 2 Marks
5 pages
Statistics
No ratings yet
Statistics
152 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
Solution Manual For Statistics Data Analysis and Decision Modeling 5th Edition Evans 0132744287 9780132744287
100% (70)
Solution Manual For Statistics Data Analysis and Decision Modeling 5th Edition Evans 0132744287 9780132744287
7 pages
View
No ratings yet
View
4 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
10 pages
Day 5 Statistics (1 of 3) - Basics
No ratings yet
Day 5 Statistics (1 of 3) - Basics
19 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
SAS 2130 Statistics 2021
No ratings yet
SAS 2130 Statistics 2021
212 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Ebook - Statistics Fundamentals For Business Analytics
No ratings yet
Ebook - Statistics Fundamentals For Business Analytics
9 pages
Data Science Module 3 Q & A
No ratings yet
Data Science Module 3 Q & A
7 pages
BSC First Year Syllabus
100% (1)
BSC First Year Syllabus
6 pages
Introduction To Statistics Final
No ratings yet
Introduction To Statistics Final
30 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Descriptive & Inferential Statistics
No ratings yet
Descriptive & Inferential Statistics
11 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Prelim Coverage
No ratings yet
Prelim Coverage
6 pages
Introduction To Satistics .Doc1
No ratings yet
Introduction To Satistics .Doc1
7 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Concept of Data Warehouse
No ratings yet
Concept of Data Warehouse
4 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Dara Mining
No ratings yet
Dara Mining
3 pages
Basic Data Science
No ratings yet
Basic Data Science
2 pages
CS1 Paper-A April 2019 Examiners Report
No ratings yet
CS1 Paper-A April 2019 Examiners Report
12 pages
Lecture 103 PDF Scatter Diagram
No ratings yet
Lecture 103 PDF Scatter Diagram
16 pages
(FIXED) Creating Healthy Workplaces
100% (1)
(FIXED) Creating Healthy Workplaces
384 pages
Mat220 Final Exam Review Sheet
No ratings yet
Mat220 Final Exam Review Sheet
6 pages
Wukari Journalon Finance MGTDiscovery June 2022
No ratings yet
Wukari Journalon Finance MGTDiscovery June 2022
19 pages
Steve Humble - Quantitative Analysis of Questionnaires - Techniques To Explore Structures and Relationships-Routledge - Taylor & Francis Group (2020)
No ratings yet
Steve Humble - Quantitative Analysis of Questionnaires - Techniques To Explore Structures and Relationships-Routledge - Taylor & Francis Group (2020)
234 pages
Ai 2
No ratings yet
Ai 2
14 pages
Output-No 4
No ratings yet
Output-No 4
45 pages
Gender and Writing in Literature
No ratings yet
Gender and Writing in Literature
115 pages
Research Methods For Engineering
100% (1)
Research Methods For Engineering
89 pages
XSTK
No ratings yet
XSTK
36 pages
.archSTA1504 2023 TL 102 0 E
No ratings yet
.archSTA1504 2023 TL 102 0 E
5 pages
CS3361 Lab Exp
No ratings yet
CS3361 Lab Exp
9 pages
Statistics For Business Economics 12th Edition David R. Anderson Et Al. Complete Edition
100% (4)
Statistics For Business Economics 12th Edition David R. Anderson Et Al. Complete Edition
176 pages
Statistics Notes
No ratings yet
Statistics Notes
3 pages
(1968) Dimensions of Organization Structure - Pugh
No ratings yet
(1968) Dimensions of Organization Structure - Pugh
43 pages
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 (FULL VERSION DOWNLOAD)
No ratings yet
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 (FULL VERSION DOWNLOAD)
16 pages
Vertical Geometric Irregularity in Stepped Building Frames
No ratings yet
Vertical Geometric Irregularity in Stepped Building Frames
9 pages
Mathematical Expectation: Variance and Covariance of Random Variables
No ratings yet
Mathematical Expectation: Variance and Covariance of Random Variables
3 pages
General Statistics 4th Edition Warren Chase Fred Bown Full
100% (1)
General Statistics 4th Edition Warren Chase Fred Bown Full
106 pages
Cope 1
No ratings yet
Cope 1
12 pages
BR-II Project Proposal PDF
No ratings yet
BR-II Project Proposal PDF
15 pages
RAJIV RANJAN - 19!02!2023 - Predictive Modelling Project Report - Final
100% (1)
RAJIV RANJAN - 19!02!2023 - Predictive Modelling Project Report - Final
34 pages
Cost Estimation Methods and Tools 1st Edition Gregory K. Mislick PDF Download
100% (1)
Cost Estimation Methods and Tools 1st Edition Gregory K. Mislick PDF Download
61 pages
AdvanceTS1handson - Jupyter Notebook
100% (2)
AdvanceTS1handson - Jupyter Notebook
3 pages
CISA Practise Questions
No ratings yet
CISA Practise Questions
25 pages
1st Puc 2024
No ratings yet
1st Puc 2024
4 pages
1 s2.0 S2214845024000462 Main
No ratings yet
1 s2.0 S2214845024000462 Main
9 pages
Accountancy SDL Manual
No ratings yet
Accountancy SDL Manual
129 pages
PPT1
No ratings yet
PPT1
93 pages

Basic Statistics Concepts For Data Science

Uploaded by

Basic Statistics Concepts For Data Science

Uploaded by

Basic Statistics Concepts for Data

It is derived from calculations that include:

 Mean: It is the central value which is commonly known as arithmetic

Variability includes the following parameters:

 Variance: It refers to a statistical measure of the spread between the

below which the given percentage of observation in the dataset falls.

 Interquartile Range: It measures the middle half of your data . In general

terms, it is the middle 50% of the dataset.

It is one of the major statistical techniques that measure the relationship

linear relationship between two variables.

 A correlation coefficient that is more than zero indicates a positive

 A correlation coefficient that is less than zero indicates a negative

 Correlation coefficient zero indicates that there is no relationship between

the two variables.

of an experiment. Events are of two types dependent and independent .

 Independent event: The event is said to be an Independent event when it is

not affected by the earlier events .

 Dependent event: The event is said to be dependent when the occurrence

of the event is dependent on the earlier events

The probability of independent events is calculated by simply multiplying the

probability of each event and for a dependent event is calculated by conditional

It is a method that is used to determine the relationship between one or more

independent variables and a dependent variable. Regression is mainly of two

relationship between a numeric predictor variable and one or more predictor

 Logistic regression: It is used to fit a regression model that explains the

Normal is used to define the probability density function for a continuous

random variable in a system . The standard normal distribution has two

parameters – mean and standard deviation . When the distribution of random

justifies why normal distribution is used in such cases.

In statistical terms, it means when a model is representative of a complete

population. This needs to be minimized to get the desired outcome .

The three most common types of bias are:

 Selection bias: It is a phenomenon of selecting a group of data for statistical

the data being unrepresentative of the whole population.

 Confirmation bias: It occurs when the person performing the statistical

analysis has some predefined assumption.

 Time interval bias: It is caused intentionally by specifying a certain time

range to favor a particular outcome.

You might also like