0% found this document useful (0 votes)

58 views9 pages

DSC Unit 3 Cse

This document covers Unit-3 of a Computer Science and Engineering course, focusing on data analysis, statistics, and machine learning concepts. It explains central tendencies, variance, standard deviation, sampling distributions, and the Central Limit Theorem, along with basic machine learning algorithms such as linear regression, SVM, and Naive Bayes. Additionally, it discusses various types of statistical analysis including descriptive, inferential, predictive, prescriptive, exploratory, and causal analysis.

Uploaded by

aiswaryaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views9 pages

DSC Unit 3 Cse

Uploaded by

aiswaryaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

lOMoARcPSD|55371821

DSC-UNIT-3 - CSE

Computer Science and Engineering (Swarna Bharathi Institute of Science and

Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)
lOMoARcPSD|55371821

UNIT-III
Data analysis: Introduction, Terminology and concepts, Introduction to statistics, Central tendencies
and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine
learning algorithms, Linear regression, SVM, Naive Bayes.

1. a) What is the role of statistics in Data Analysis

b) Describe central tendencies and distributions
2. a) How does the shape of a distribution influence the Measures of Central Tendency? Explain.
b) Explain briefly about SVM.
3. What is the importance of Machine learning in Data Science? Explain with an example
4. a) Write about sampling distribution.
b) Discuss the basic machine learning algorithm.
_______________________________________________________________________________________

Explain Central tendencies and various distribution techniques

Central tendency is a central value for a probability distribution. There are three main
measures of central tendency: the mode, the median and the mean

Mean: Mean is the “Average” value of the dataset.

Mean = Sum of all data values (s)/Total number of data values(n)

Median: The middle value of the sorted dataset is called the median.

Step 1. The dataset is arranged in either increasing or decreasing order.

Step 2. If the data set has an odd number of data values (n=odd), the data at
(n + 1)/2 place is the median of the dataset.

Step 3. If the dataset has an even number of data values (n = even), the average
of two middle values is computed as the median.

Mode: The most frequently occurring value in the dataset is called mode.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

For example the weight (in kg) of 5 children as 36, 40, 32, 42, 30.

Mean = (36 + 40 + 32 + 42 + 30)/5 = 180/5 = 36kg

Median: Arrange the data in ascending order: 30, 32, 36, 40, 42

The middle value is 36. So, median = 36kg.

Mode: 36 kg occurs most number of times, so mode = 36 kg

Discuss on variance and standard deviation:

Variance and Standard Deviation are the two important measurements in statistics.

Variance: Variance is a measure of how data points vary from the mean. Variance is
the square deviation from the mean. It is denoted as „σ2‟.

Properties of Variance
 It is always non-negative since the variance sum is squared and therefore the
result is either positive or zero.
 Variance always has squared units. For example, the variance of a set of weights
estimated in kilograms will be given in kg squared.

Standard Deviation

Standard deviation is the measure of the distribution of statistical data. Standard

Deviation is the square root of the variance. Standard deviation is denoted by the
symbol, „σ‟.

Properties of Standard Deviation

 It describes the square root of the mean of the squares of all values in a data set
and is also called the root-mean-square deviation.
 The smallest value of the standard deviation is 0 since it cannot be negative.
 When the data values of a group are similar, then the standard deviation will be
very low or close to zero. But when the data values vary with each other, then the
standard variation is high or far from zero.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

Example: Let there be two cricket players: Pant and Kartik, and you have to select
one for the cricket world cup. The score of both the players in the last five T-20
matches are as follows:

Kartik Pant
23 34
28 85
45 02
59 15
63 77

Answer: Now, we will find the SD, and one who has the lesser value of SD will be
more consistent.

Case -1: Kartik

Runs (xi) Squared Deviation (xi– mean)2

23 (23 – 43.6)2
28 (28 – 43.6) 2
45 (45 – 43.6) 2
59 (59 – 43.6) 2
63 (63 – 43.6) 2
Mean = (23 + 38 + 45 + 59 + 63) / 5 Sum of Squared Deviation = 1283.2
= 43.6

Explain type’s statistical data science analysis:

Statistical analysis is done on data sets, and the analysis process can create different output
types from the input data. For example, the process can give output data from the input,
present input data characteristics are prove a null hypothesis, etc. The output type and format
vary with other.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

The types of statistical:

 Descriptive statistics: It refers to collecting, organizing, analyzing, and summarizing

data sets in an understandable format, like charts, graphs, and tables. It makes a large
data set and eliminates complexity, to help analysts understand it.
 Inferential statistics: Inferential statistics used to large population. It is based on the
analysis and findings produced for sample data from the large population. it makes the
process cost-efficient and time-efficient.
 Predictive analysis: This analysis is used to predict future events based on past and
present data. It uses machine learning tools, data mining, big data, predictive modeling,
artificial intelligence, and simulations.
 Prescriptive analysis: This analysis used to find the best possible outcome based on the
data. It helps make decisions and encourages efficient decision-making.
 Exploratory data analysis (EDA): In statistics, this method studies data sets to highlight
their major features, which is frequently used in statistical graphics and data visualization
methods.
 Causal analysis: It focuses on the cause and effect. In simple terms, it focuses on the
cause and the reason behind them; based on data analysis, its understanding why
something didn‟t work out and failures in business and professional activities.

Sample/CLT:

The selection of some of the population from the whole population list is known

as Sample.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

 Total number of population, Population Size = N

 Mean of the population, Population Mean(μ) = (ΣX)/N

 Variance of the population, Population Variance(σ²) = Σ( Xi — μ )²/ N

 Sample Size = n

 Mean of the samples, Sample Mean(x¯) = (Σ x)/n

 Variance of the samples, Sample Variance(S²) = Σ( xi — x¯)²/ n-1

Sampling Distribution:

Sampling Distribution is a probability of distribution get from many samples drawn

from a population list.

The sampling distribution‟s mean is denoted by μₓ¯.

μₓ¯ = (Sum of all the sample means)/(Total number of samples)

Sampling distribution‟s standard deviation (Standard error) = σ/√n,

simple random sampling, where we have a complete list of elements of the general

population and we select elements randomly.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

the same probability of obtaining each element of the population in the sample.

Another option is stratified sampling. Here we know something about our population

- we know that it consists of several homogeneous clusters that should be represented

in our sample.

Central Limit Theorem:

The Central Limit Theorem (CLT) states that for any data, provided a high number of

samples have been taken. The following properties hold:

1. Sampling Distribution Mean(μₓ¯) = Population Mean(μ)

2. Sampling distribution‟s standard deviation (Standard error) = σ/√n ≈S/√n

3. For n > 30, the sampling distribution becomes a normal distribution.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

Normal distribution:
 In this distribution (if it is a perfect normal distribution), the mean of the data is
0 while standard deviation equates 1.
 It forms a bell-shaped structure.
 It tells us that most of the data is around the mean only & the values move
away from the mean,
 The two major parameters are mean & standard deviation.
 Mean, Median & Mode for such distribution are equal

Basic machine learning algorithms: Refer ML

Linear regression: Refer ML

SVM: Refer ML

Naive Bayes: Refer ML

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

lOMoARcPSD|55371821

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

DSP Unit-I
No ratings yet
DSP Unit-I
19 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
719 Final Syllabus Merged
No ratings yet
719 Final Syllabus Merged
200 pages
Lecture 2 Foundations of Inference
No ratings yet
Lecture 2 Foundations of Inference
23 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
Statistics
No ratings yet
Statistics
21 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
Final SRB Unit 2
No ratings yet
Final SRB Unit 2
162 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
Stats 2024
No ratings yet
Stats 2024
14 pages
Introduction to Statistical Analysis
No ratings yet
Introduction to Statistical Analysis
10 pages
Chapter 4 Basic Statistics
No ratings yet
Chapter 4 Basic Statistics
22 pages
4 - Stat - Measures of Variation 2021
No ratings yet
4 - Stat - Measures of Variation 2021
26 pages
Fybsc Stats Syllabus
No ratings yet
Fybsc Stats Syllabus
21 pages
Basic Stats Session
No ratings yet
Basic Stats Session
16 pages
ECO2004 Ch3
No ratings yet
ECO2004 Ch3
16 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Intro to Statistics for Beginners
No ratings yet
Intro to Statistics for Beginners
42 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Unit 3
No ratings yet
Unit 3
36 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
Statistics
No ratings yet
Statistics
14 pages
Statistics - Material
No ratings yet
Statistics - Material
12 pages
Module 3 - Branches of Statistics
No ratings yet
Module 3 - Branches of Statistics
50 pages
LM 09
No ratings yet
LM 09
7 pages
Sampling Distributions of Sample Means
No ratings yet
Sampling Distributions of Sample Means
7 pages
Intro To Data: Science
No ratings yet
Intro To Data: Science
156 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Lecture 6
No ratings yet
Lecture 6
84 pages
Statistics
No ratings yet
Statistics
13 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Math Written Reportgroup 4 PDF
No ratings yet
Math Written Reportgroup 4 PDF
18 pages
FDS CH 2
No ratings yet
FDS CH 2
2 pages
Statistical Analysis of Lab Data
100% (1)
Statistical Analysis of Lab Data
80 pages
U3 IntroSummaryStatistics
No ratings yet
U3 IntroSummaryStatistics
47 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Statistics & Probability Guide
No ratings yet
Statistics & Probability Guide
23 pages
MMW PPT Weeks 9 12
No ratings yet
MMW PPT Weeks 9 12
31 pages
PS Solved Problems
No ratings yet
PS Solved Problems
39 pages
Math
No ratings yet
Math
6 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Descriptive Statistics & Data Analysis
No ratings yet
Descriptive Statistics & Data Analysis
48 pages
Quantitative Analysis and Business Development (UNIT-1)
No ratings yet
Quantitative Analysis and Business Development (UNIT-1)
31 pages
Sampling CLT CI
No ratings yet
Sampling CLT CI
81 pages
SPM Additional Math Project: Data Analysis
No ratings yet
SPM Additional Math Project: Data Analysis
25 pages
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
No ratings yet
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
11 pages
Lecture 3 - Sampling-Distribution & Central Limit Theorem
No ratings yet
Lecture 3 - Sampling-Distribution & Central Limit Theorem
5 pages
BBA Statistics
No ratings yet
BBA Statistics
4 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
Central Limit Theorem Grade 11 Group 4
No ratings yet
Central Limit Theorem Grade 11 Group 4
7 pages
Location) .: Distribution Is The Purpose of Measure of Central
No ratings yet
Location) .: Distribution Is The Purpose of Measure of Central
13 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Midterms Notes (MMW)
No ratings yet
Midterms Notes (MMW)
8 pages
Lab One Sample Hypothesis Test
No ratings yet
Lab One Sample Hypothesis Test
15 pages
Statistical Analysis for Engineers
No ratings yet
Statistical Analysis for Engineers
22 pages
Statistic Data
No ratings yet
Statistic Data
2 pages
Excel Regression Analysis Output Explained
No ratings yet
Excel Regression Analysis Output Explained
14 pages
MBA Statistics Exam Guide
No ratings yet
MBA Statistics Exam Guide
87 pages
The Laymans Guide To Volatility Forecasting
No ratings yet
The Laymans Guide To Volatility Forecasting
17 pages
Response Table For Analyze Taguchi Design: Learn More About Minitab 18
No ratings yet
Response Table For Analyze Taguchi Design: Learn More About Minitab 18
11 pages
Normal Distribution 67ByCTbQJVmmyq7V
No ratings yet
Normal Distribution 67ByCTbQJVmmyq7V
39 pages
Unit 3 - Probability and Probability Distributions Vs2-Merged
No ratings yet
Unit 3 - Probability and Probability Distributions Vs2-Merged
28 pages
Regression: Variables Entered/Removed
No ratings yet
Regression: Variables Entered/Removed
1 page
Descriptive Statistics & Data Types
No ratings yet
Descriptive Statistics & Data Types
13 pages
Moments and Moment Generating Functions
No ratings yet
Moments and Moment Generating Functions
17 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
33 pages
Weibull Distribution Overview
No ratings yet
Weibull Distribution Overview
11 pages
AP Stats Chapter 11: Significance Tests
No ratings yet
AP Stats Chapter 11: Significance Tests
10 pages
STAT 2006 Chapter 1 - 2022 - v2 - Polished
No ratings yet
STAT 2006 Chapter 1 - 2022 - v2 - Polished
135 pages
Lecture 6 Hidden Markov and Maximum Entropy Models
No ratings yet
Lecture 6 Hidden Markov and Maximum Entropy Models
28 pages
Measures of Central Tendency and Dispersion/Variability: Range, Variance and Standard Deviation
100% (1)
Measures of Central Tendency and Dispersion/Variability: Range, Variance and Standard Deviation
15 pages
CertDA WS
No ratings yet
CertDA WS
13 pages
The Gamma Statistic Converges To The Noise Relative To An Unknown Nonlinear Function
No ratings yet
The Gamma Statistic Converges To The Noise Relative To An Unknown Nonlinear Function
7 pages
PLS Algorithm and SmartPLS Guide
No ratings yet
PLS Algorithm and SmartPLS Guide
3 pages
A Ybx: Scatter Diagram Correlation Coefficient
No ratings yet
A Ybx: Scatter Diagram Correlation Coefficient
7 pages
SB11 - Group 1
100% (1)
SB11 - Group 1
33 pages
MAS291 Chapter 10 Summary
No ratings yet
MAS291 Chapter 10 Summary
3 pages
Cambridge International AS & A Level: Mathematics 9709/62
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/62
12 pages
Computerised Handwriting Speed Test System CHSTS Validation of A Handwriting Assessment For Chinese Secondary Students
No ratings yet
Computerised Handwriting Speed Test System CHSTS Validation of A Handwriting Assessment For Chinese Secondary Students
9 pages
Finance Students: Serial Correlation
0% (1)
Finance Students: Serial Correlation
49 pages
Statistical Rethinking A Bayesian Course With Examples in R and STAN Draft 2nd Edition Richard Mcelreath PDF Download
No ratings yet
Statistical Rethinking A Bayesian Course With Examples in R and STAN Draft 2nd Edition Richard Mcelreath PDF Download
93 pages
Tutorial Session 10 Autocorrelation Solution
No ratings yet
Tutorial Session 10 Autocorrelation Solution
4 pages
Practice Sample Questions STA404
100% (1)
Practice Sample Questions STA404
5 pages

DSC Unit 3 Cse

Uploaded by

DSC Unit 3 Cse

Uploaded by

lOMoARcPSD|55371821

Computer Science and Engineering (Swarna Bharathi Institute of Science and

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

1. a) What is the role of statistics in Data Analysis

Explain Central tendencies and various distribution techniques

Mean: Mean is the “Average” value of the dataset.

Mean = Sum of all data values (s)/Total number of data values(n)

Step 1. The dataset is arranged in either increasing or decreasing order.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

Mean = (36 + 40 + 32 + 42 + 30)/5 = 180/5 = 36kg

The middle value is 36. So, median = 36kg.

Mode: 36 kg occurs most number of times, so mode = 36 kg

Discuss on variance and standard deviation:

Standard deviation is the measure of the distribution of statistical data. Standard

Properties of Standard Deviation

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

Case -1: Kartik

Runs (xi) Squared Deviation (xi– mean)2

Explain type’s statistical data science analysis:

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

The types of statistical:

 Descriptive statistics: It refers to collecting, organizing, analyzing, and summarizing

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

 Total number of population, Population Size = N

 Mean of the population, Population Mean(μ) = (ΣX)/N

 Variance of the population, Population Variance(σ²) = Σ( Xi — μ )²/ N

 Mean of the samples, Sample Mean(x¯) = (Σ x)/n

 Variance of the samples, Sample Variance(S²) = Σ( xi — x¯)²/ n-1

Sampling Distribution is a probability of distribution get from many samples drawn

from a population list.

The sampling distribution‟s mean is denoted by μₓ¯.

μₓ¯ = (Sum of all the sample means)/(Total number of samples)

Sampling distribution‟s standard deviation (Standard error) = σ/√n,

population and we select elements randomly.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

- we know that it consists of several homogeneous clusters that should be represented

Central Limit Theorem:

samples have been taken. The following properties hold:

1. Sampling Distribution Mean(μₓ¯) = Population Mean(μ)

2. Sampling distribution‟s standard deviation (Standard error) = σ/√n ≈S/√n

3. For n > 30, the sampling distribution becomes a normal distribution.

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

Basic machine learning algorithms: Refer ML

Linear regression: Refer ML

Naive Bayes: Refer ML

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

Downloaded by AISWARYA J (aiswaryaj@skasc.ac.in)

You might also like