0% found this document useful (0 votes)

38 views20 pages

Data Analysis

This document provides an overview of data analysis in Python, focusing on file handling and statistical calculations using libraries like NumPy and Pandas. It covers reading and writing files, as well as computing statistical parameters such as mean, median, variance, and standard deviation. Additionally, it includes practical examples and Python code snippets for performing these tasks.

Uploaded by

Aditya Nayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views20 pages

Data Analysis

Uploaded by

Aditya Nayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data analysis in python

Dr. Santosh Prasad Gupta

Assistant Professor
Department of Physics
Patna University, Patna

6/27/2021 Department of Physics, PU: SP Gupta 1

In this document, we learn about:

 File handling: reading and writing files, along with many other file handling
options, to operate on files.

 Statistics of a data: such as mean, median, variance, standard deviation and

other parameters using numpy and pandas
 Normal data directly written
 Imported normal data

 Statistics of a data: such as mean, standard deviation using numpy and

pandas along with data visualization
 Imported frequency data
 Imported frequency group data

6/27/2021 Department of Physics, PU: SP Gupta 2

Create a file and handling using Python
File handling is a very important concept for any programmer. It can be used for
creating, deleting, moving files or to store application data, user configurations, videos,
images, etc. Python too supports file handling and allows users to handle files i.e., to
read and write files, along with many other file handling options, to operate on files.

Write Only (‘r’): Open the file for reading.

Write Only (‘w’): Open the file for writing. For an existing file, the data is truncated
and over-written.
Write and Read (‘w+’): Open the file for reading and writing. For an existing file,
data is truncated and over-written.
Append Only (‘a’): Open the file for writing. The data being written will be inserted at
the end, after the existing data.
Append and Read (‘a+’): Open the file for reading and writing. The data being written
will be inserted at the end, after the existing data.

6/27/2021 Department of Physics, PU: SP Gupta 3

Python code
# script file for creating a file and writing
data = open("D:\\PWC\\data_analysis\\test.txt", "w") Out put
data.write("ram \t shyam \n 1 \t 2 \n 3 \t 4") ram shyam
data.close() 1 2
# reading the file after writing 3 4
data = open("D:\\PWC\\data_analysis\\test.txt", “r")
print(data.read())

# script file for creating a file and writing

data = open("D:\\PWC\\data_analysis\\test.txt", "w") ram shyam
data.write("ram \t shyam \n 1 \t 2 \n 3 \t 4") 1 2
data.close() 3 4
# script file for opening a file in appending mode Hello! I have added
data = open("D:\\PWC\\data_analysis\\test.txt", “a")
data.write("""\n Hello! I have added….
this is one way of
\n this is one way of
\n multi-line writing""")
data.close() multi-line writing
# reading the file after writing and appending
data = open("D:\\PWC\\data_analysis\\test.txt", “r")
print(data.read())
6/27/2021 Department of Physics, PU: SP Gupta 4
Statistics of a data using numpy
Mean or Average: Average a number expressing the central or typical value in a set of
data, in particular the mode, median, or (most commonly) the mean, which is calculated
by dividing the sum of the values in the set by their number. The basic formula for the
average of n numbers x1, x2, ……xn is # Python program to get average of a list
𝑥1:𝑥2: …….:𝑥𝑛
𝑥𝑚𝑒𝑎𝑛 = # Importing the NumPy module
𝑛
import numpy as np
# Taking a list of elements Out put
Use: np.average
list = [2, 4, 4, 4, 5, 5, 7, 9] Mean is: 5.0
# Calculating average using average()
print(‘mean is:’, np.average(list))
Median: Median is the value that separates the higher half of a data sample or
probability distribution from the lower half. For odd set of elements, the median
value is the middle one. For even set of elements, the median value is the mean of
two middle elements. # Python program to get average of a list
# Importing the NumPy module
import numpy as np Out put
Use: np.median # Taking a list of elements Median is: 4.5
list = [2, 4, 4, 4, 5, 5, 7, 9]
# Calculating median using median()
print(‘median is:’, np.median(list))

6/27/2021 Department of Physics, PU: SP Gupta 5

Variance
Variance is the sum of squares of differences between all numbers and means.
The mathematical formula for variance is as follows,
𝑁
# Python program to get variance of a list 𝑥𝑚𝑒𝑎𝑛 ;𝑥1 2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = σ2 = 𝑖=1
# Importing the NumPy module 𝑁
import numpy as np Out put
# Taking a list of elements
variance is: 18133.359999999997
list = [212, 231, 234, 564, 235]
# Calculating variance using var() Use: np.var
print(‘variance is:’, np.var(list))

Standard Deviation
Standard Deviation is the square root of variance. It is a measure of the extent to which
data varies from the mean. The mathematical formula for calculating standard deviation
is as follows,
Standard deviation = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = σ
# Python program to get standard deviation of a list
import numpy as np Out put
# Taking a list of elements std. dev. is: 2.0
list = [2, 4, 4, 4, 5, 5, 7, 9]
# Calculating standard deviation using std()
print(‘std. dev. is:’, np.std(list)) Use: np.std

6/27/2021 Department of Physics, PU: SP Gupta 6

Statistics of a data using pandas and numpy
We learn how to import a normal data file then how to calculate the statistical parameters
such mean, median, variance and standard variance
Multicolumn data
Suppose, we have a multicolumn (y1, y2) data, saved in a location in your laptop. Our
objective is to calculate the mean, median, variance, standard deviation of each
column of the data and also the same parameter along with each row.

Here, I have saved the data in the location : D:\PWC\data_analysis\set1.csv, having

name set1 with csv extension. Let us first display the data
Out put
Python script for displaying the file y1,y2
0,100
# reading and printing the file
10,200
data1 = open("D:\\PWC\\data_analysis\\set1.csv", "r")
print(data1.read())
20,400
30,700
40,1200
50,1500
60,1800
70,2000
80,2200
6/27/2021 Department of Physics, PU: SP Gupta 7
Displaying the some information of data using Pandas

Python script Out put

import pandas as pd 9, 2)
import numpy as np <class 'pandas.core.frame.DataFrame'>
import statistics as st RangeIndex: 9 entries, 0 to 8
Data columns (total 2 columns):
# Load the data or importing a data file # Column Non-Null Count Dtype
data1 = pd.read_csv("D:\\PWC\\data_analysis\\set1.csv") --- ------ -------------- -----
# print some information of data 0 y1 9 non-null int64
print(data1.shape) 1 y2 9 non-null int64
print(data1.info()) dtypes: int64(2)
# print few lines of data memory usage: 272.0 bytes
print(data1.head()) None
y1 y2
0 0 100
data.info gives information about the file 1 10 200
2 20 400
and data.head give the first five line of the data
3 30 700
4 40 1200

6/27/2021 Department of Physics, PU: SP Gupta 8

Statistics of a data using pandas and numpy: mean
We learn how to import a normal data file then how to calculate the statistical
parameters such mean, variance and standard variance
Out put
import pandas as pd
import numpy as np mean of y1.: 40.0
import statistics as st mean of y2.: 1122.22222
row-wise mean
# Load the data or importing a data file 0 50.0
data1 = pd.read_csv("D:\\PWC\\data_analysis\\set1.csv") 1 105.0
# calculating mean of y1 and y2 column-wise using mean 2 210.0
print('mean of y1.:', data1.loc[:,‘y1'].mean()) 3 365.0
print('mean of y2.:', data1.loc[:,‘y2'].mean()) 4 620.0
5 775.0
# calculating mean of y1 and y2 row-wise 6 930.0
print('row-wise mean\n', data1.mean(axis = 1)[0:7]) dtype: float64

calculate the mean of the rows by specifying the (axis = 1) argument. The code below
[0:7] calculates the mean of the first seven rows.

6/27/2021 Department of Physics, PU: SP Gupta 9

Statistics of a data using pandas and numpy: median, variance, standard deviation using attributes
median, var, std
Out put
import pandas as pd
import numpy as np median of y1.: 40.0
import statistics as st median of y2.: 1200.0
# Load the data or importing a data file variance of y1: 750.0
data1 = pd.read_csv("D:\\PWC\\data_analysis\\set1.csv") variance of y2: 641944.4444444445
std. dev. of y1: 27.386127875258307
# calculating median of y1 and y1 column-wise using median
std. dev. of y2: 801.2143561147943
print('median of y1.:', data1.loc[:,‘y1'].median())
row-wise median:
print('median of y2.:', data1.loc[:,‘y2'].median()) 0 50.0
# calculating variance of y1 and y2 column-wise using var 1 105.0
print('variance of y1:', data1.loc[:,‘y1'].var()) dtype: float64
print('variance of y2:', data1.loc[:,‘y2'].var()) row-wise variance:
# calculating std. dev. of y1 and y2 column-wise using std 0 5000.0
print('std. dev. of y1:', data1.loc[:,‘y1'].std()) 1 18050.0
print('std. dev. of y2:', data1.loc[:,‘y2'].std()) dtype: float64
row-wise std. dev.:
# calculating median of y1 and y2 row-wise first two rows 0 70.710678
print('row-wise median:\n', data1.median(axis = 1)[0:2]) 1 134.350288
# calculating variance of y1 and y2 row-wise first two rows dtype: float64
print('row-wise variance:\n', data1.var(axis = 1)[0:2])
# calculating variance of y1 and y2 row-wise first two row
print('row-wise std. dev.:\n', data1.std(axis = 1)[0:2])

6/27/2021 Department of Physics, PU: SP Gupta 10

Statistics of a data using pandas and numpy: All values together by using describe

import pandas as pd
import numpy as np
import statistics as st

# Load the data or importing a data file

data1 = pd.read_csv("D:\\PWC\\data_analysis\\set1.csv")
# calculated important statistical parameter at once
print(data1.describe())
Out put
y1 y2
count 9.000000 9.000000
mean 40.000000 1122.222222
std 27.386128 801.214356
min 0.000000 100.000000
25% 20.000000 400.000000
50% 40.000000 1200.000000
75% 60.000000 1800.000000
max 80.000000 2200.000000

6/27/2021 Department of Physics, PU: SP Gupta 11

Calculation of mean and standard deviation of a data
Suppose we have a x-ray diffraction data; variation of intensity with angle as shown in the table
below. For that we want to calculate mean and standard deviation.
Theta (T) (in degree) Intensity (I) (in counts)
20 1
30 5
40 10
50 15
60 11
70 9
80 2

In order to calculate mean and standard deviation. We will follow the following steps.
 First calculate (theta x intensity) that is (T I)
𝑻𝑰
 Calculate mean: 𝑻𝒎𝒆𝒂𝒏 = 𝑻𝒎 =
𝑰
 Calculate 𝑻 − 𝑻𝒎 and then calculate 𝑻 − 𝑻𝒎 𝟐 𝑰
𝑻 ; 𝑻𝒎 𝟐𝑰
 Calculate standard deviation: σ =
𝐼

6/27/2021 Department of Physics, PU: SP Gupta 12

Table for calculating the various terms

T I TI T - Tm (T - Tm)^2 I
20 1 20 -32.26 1040.7076
30 5 150 -22.26 2477.538
40 10 400 -12.26 1503.076
50 15 750 -2.26 76.614
60 11 660 7.74 658.9836
70 9 630 17.74 2832.3684
80 2 160 27.74 1539.0152

𝐼 = 53
𝑇𝐼 = 𝑇 − 𝑇𝑚 2 𝐼
2770 = 10128.3028

𝑻 𝑰 2770
𝑻𝒎𝒆𝒂𝒏 = 𝑻𝒎 = = ≈ 52.26
𝑰 53
𝑻 ; 𝑻𝒎 𝟐𝑰 10128.3028
𝝈= = = 191.10 ≈ 13.82
𝑰 53

6/27/2021 Department of Physics, PU: SP Gupta 13

Calculation of mean and standard deviation of the same data using python
Let us first visualize the data
Out put
Python script for displaying the data file T,I
# reading and printing the file 20,1
da2 = open("D:\\PWC\\data_analysis\\set2.csv", "r") 30,5
print(da2.read()) 40,10
50,15
displaying the data file using bar plot 60,11
import pandas as pd 70,9
import numpy as np 80,2
import matplotlib.pyplot as plt

# Load the data or importing a data file

da2 = pd.read_csv("D:\\PWC\\data_analysis\\set2.csv")
#Creating the bar plot
plt.bar(da2['T'], da2['I'], color='orange', width=1)
# Labeling the X and Y axis
plt.xlabel("Theta(in degree)")
plt.ylabel("Intensity (in counts)")
plt.title("Variation of Intensity(I) with Theta(T)")
plt.show()
6/27/2021 Department of Physics, PU: SP Gupta 14
Calculation of different components as shown in the previous calculation table and also
calculation of mean and standard deviation using python

import pandas as pd Out put

import numpy as np mean (Tm) is: 52.2642
import matplotlib.pyplot as plt The column T-Tm is:
from math import sqrt 0 -32.264151
# Load the data or importing a data file 1 -22.264151
2 -12.264151
da2 = pd.read_csv("D:\\PWC\\data_analysis\\set2.csv")
3 -2.264151
#calculation of multiplication of theta (T) and Intensity (I): T I 4 7.735849
da2['TI']= da2['T']*da2['I'] 5 17.735849
#calculation of mean Tm: sum of da2['TI']/sum of da2['I'] and printing 6 27.735849
Tm=da2['TI'].sum()/da2['I'].sum() Name: T-Tm, dtype: float64
print('mean (Tm) is:', Tm) The column (T-Tm)^2I is:
#calculation of T-Tm and printing 0 1040.975436
1 2478.462086
da2['T-Tm']= da2['T']-Tm 2 1504.093984
print('The column T-Tm is:\n', da2['T-Tm']) 3 76.895692
#calculation of (T-Tm)^2I and printing 4 658.276967
da2['(T-Tm)^2I']= (da2['T-Tm'])*(da2['T-Tm'])*da2['I'] 5 2831.043076
print('The column (T-Tm)^2I is:\n', da2['(T-Tm)^2I']) 6 1538.554646
#calculation of sigma: sqrt(sum of(T-Tm)^2I /sum of da2['I'])and printing Name: (T-Tm)^2I, dtype: float64
Standard deviation (sigma) is: 13.8238
sigma=sqrt(da2['(T-Tm)^2I'].sum()/da2['I'].sum())
print('Standard deviation (sigma) is:',sigma)

6/27/2021 Department of Physics, PU: SP Gupta 15

Calculation of standard deviation for a group data

Problem: In a class of students, 9 students scored 50 to 60, 7 students scored 61 to 70,

9 students scored 71 to 85, 12 students scored 86 to 95 and 8 students scored 96 to 100
in the subject of mathematics. Estimate the standard deviation?
Solution: The variation of number of students with their score is summarized in the
following
Score (M) No. of students (S)
50-60 9
61-70 7
71-85 9
86-95 12
96-100 8

We will estimate the standard deviation by using the following steps.

Step1: find the mid-point (Md) for each group or range of the score.
step 2: calculate the number of samples of a data set by summing up the no. of students (sum of S).
step 3: find the mean for the grouped data (Mm) by dividing the addition of multiplication of each
group mid-point and no. of students of the data set by the number of samples.
step 5: Estimate standard deviation for the frequency table by taking square root of the variance as
𝑴𝒅 − 𝑴𝒎 𝟐 S
σ=
𝑆
.6/27/2021 Department of Physics, PU: SP Gupta 16
Table for calculating the various terms

M S Md Md S Md - Mm (Md - Mm)^2 S
50-60 9 55.0 495.0 -23.34 4904.67
61-70 7 65.0 458.5 -12.84 1154.86
71-85 9 78.0 702.0 0.34 1.07
86-95 12 90.5 1086.0 12.16 1773.09
96-100 8 98 784.0 19.66 3090.73

𝑆 = 45
𝑀𝑑 𝑆 = 𝑀𝑑 − 𝑀𝑚 2 𝑆
3525.5 = 10924.41

𝑴𝒅 𝑺 3525.5
𝑴𝒎 = = ≈ 78.34
𝑺 45
𝑴𝒅; 𝑴𝒎 𝟐 S 10924.41
𝝈= = = 191.10 ≈ 15.58
𝑺 45

6/27/2021 Department of Physics, PU: SP Gupta 17

Calculation of mean and standard deviation of the same data using python
Let us first visualize the data Out put
Python script for displaying the data file M,S
# reading and printing the file 50-60,9
da3 = open("D:\\PWC\\data_analysis\\set3.csv", "r") 61-70,7
print(da3.read()) 71-85,9
86-95,12
displaying the data file using bar plot 96-100,8
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the data or importing a data file

da3 = pd.read_csv("D:\\PWC\\data_analysis\\set3.csv")
#Creating the bar plot (you can use bar plot also)
#plt.bar(da3['M'], da3['S'], color='blue', width=1)
#Creating the scatter plot
plt.scatter(da3['M'], da3['S'], color='blue')
# Labeling the X and Y axis
plt.xlabel("Score obtained by student")
plt.ylabel("Number of student")
plt.title("Variation of number of students with their score ")
plt.show()
6/27/2021 Department of Physics, PU: SP Gupta 18
Calculation of different components as shown in the previous calculation table and also
calculation of mean and standard deviation using python
import pandas as pd The column Md is:
: 0 55.0
import numpy as np
1 65.5
import matplotlib.pyplot as plt 2 78.0
from math import sqrt 3 90.5
# Load the data or importing a data file 4 98.0
da3 = pd.read_csv("D:\\PWC\\data_analysis\\set3.csv") Name: Md, dtype: float64
#creating the column corresponding to mid of the range and printing The column MdS is:
: 0 495.0
da3[['U','L']]=da3['M'].str.split('-',expand=True)
1 458.5
da3['Md']=(da3['U'].astype(float)+ da3['L'].astype(float))/2 2 702.0
print('The column Md is:\n:', da3['Md']) 3 1086.0
#calculation of multiplication of score(Md) and students (S): Md S 4 784.0
da3['MdS']= da3['Md']*da3['S'] Name: MdS, dtype: float64
print('The column MdS is:\n:', da3['MdS']) mean (Mm) is: 78.3444
The column Md-Mm is:
#calculation of mean Mm: sum of da3['MdS']/sum of da3['S'] and printing
0 -23.344444
Mm=da3['MdS'].sum()/da3['S'].sum() 1 -12.844444
print('mean (Mm) is:', Mm) 2 -0.344444
#calculation of Md-Mm and printing 3 12.155556
da3['Md-Mm']= da3['Md']-Mm 4 19.655556
print('The column Md-Mm is:\n', da3['Md-Mm']) Name: Md-Mm, dtype: float64
The column (Md-Mm)^2S is:
#calculation of (Md-Mm)^2S and printing
0 4904.667778
da3['(Md-Mm)^2S']= (da3['Md-Mm'])*(da3['Md-Mm'])*da3['S'] 1 1154.858272
print('The column (Md-Mm)^2S is:\n', da3['(Md-Mm)^2S']) 2 1.067778
#calculation of sigma: sqrt(sum of(Md-Mm)^2S /sum of da3['S'])and printing 3 1773.090370
sigma=sqrt(da3['(Md-Mm)^2S'].sum()/da3['S'].sum()) 4 3090.726914
print('Standard deviation (sigma) is:',sigma) Name: (Md-Mm)^2S, dtype: float64
Standard deviation (sigma) is: 15.5809
6/27/2021 Department of Physics, PU: SP Gupta 19
Assignments
Question 1. Distribution of marks obtained by M.Sc. Students are given below.
40, 65, 45, 50, 80, 55, 76, 72, 62, 82, 59, 51, 61. Find the mean, median, variance,
standard deviation of this distribution.

Question 2: Estimate the standard deviation for the data of single slit pattern, given
below.
Theta (in degree) Intensity (in counts)
-50 1
-30 5
-10 10
0 15
10 11
30 9
50 2

Question 3: In a village, 200 peoples are in the age group (year) 20 to 30, 300 peoples
are in the age group 31 to 40, 600 peoples are in the age group 41 to 60, and only 100
peoples are in the age group 61 to 90. Estimate the standard deviation.

6/27/2021 Department of Physics, PU: SP Gupta 20

Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
ML Programs
No ratings yet
ML Programs
41 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Ad3411 - Dsa Lab Manual
No ratings yet
Ad3411 - Dsa Lab Manual
34 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
Python Practical File
No ratings yet
Python Practical File
9 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
FDSA Lab Manual 1
No ratings yet
FDSA Lab Manual 1
34 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Experiment - 1 csd201
No ratings yet
Experiment - 1 csd201
19 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
ML Lab Manual
No ratings yet
ML Lab Manual
27 pages
DSF Lab Manual (OCS353T)
No ratings yet
DSF Lab Manual (OCS353T)
36 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
Python
No ratings yet
Python
32 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Data Science Programs
No ratings yet
Data Science Programs
6 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
01 Statistics With Python
No ratings yet
01 Statistics With Python
8 pages
ML Lab Mala Reddy CLG
No ratings yet
ML Lab Mala Reddy CLG
23 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
Unit 3
No ratings yet
Unit 3
110 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Pandas 2
No ratings yet
Pandas 2
17 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Week3 ML
No ratings yet
Week3 ML
15 pages
Python EDA Workshop with Olympics Data
No ratings yet
Python EDA Workshop with Olympics Data
12 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
ML Manual New
No ratings yet
ML Manual New
38 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Random Variable
No ratings yet
Random Variable
10 pages
ML Exp-2 22
No ratings yet
ML Exp-2 22
18 pages
4 Compressed
No ratings yet
4 Compressed
18 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Python Data Import/Export with Pandas
No ratings yet
Python Data Import/Export with Pandas
6 pages
IDC-101 Introduction To Computers
No ratings yet
IDC-101 Introduction To Computers
5 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Matrix Inverse - Determinant Method
No ratings yet
Matrix Inverse - Determinant Method
8 pages
Introduction Python
No ratings yet
Introduction Python
24 pages
Introduction To Numpy
No ratings yet
Introduction To Numpy
13 pages
Data Interpolation
No ratings yet
Data Interpolation
16 pages
Manifest Desires: 2 Principles & 1 Process
No ratings yet
Manifest Desires: 2 Principles & 1 Process
5 pages
ChamSys MagicQ Shortcuts Guide
No ratings yet
ChamSys MagicQ Shortcuts Guide
2 pages
1824 - Young - Bridge PDF
No ratings yet
1824 - Young - Bridge PDF
28 pages
6 Months To 6 Figures - Peter Voogd
78% (9)
6 Months To 6 Figures - Peter Voogd
431 pages
Customer Service Executive
No ratings yet
Customer Service Executive
2 pages
Go Put Your Strengths To Work
100% (8)
Go Put Your Strengths To Work
55 pages
Form JSA Mass Concrete Work
No ratings yet
Form JSA Mass Concrete Work
37 pages
C Program for Doubly Linked List
No ratings yet
C Program for Doubly Linked List
7 pages
1.introduction To Operations Management PDF
67% (3)
1.introduction To Operations Management PDF
7 pages
Task A-EFI
No ratings yet
Task A-EFI
4 pages
S 002 Norsok PDF
No ratings yet
S 002 Norsok PDF
60 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
17 pages
BusyAnt - Year2 - PNL - 7 2
No ratings yet
BusyAnt - Year2 - PNL - 7 2
2 pages
Science of The Heart
100% (1)
Science of The Heart
20 pages
CC 1352 P
No ratings yet
CC 1352 P
61 pages
Science Articles 1
No ratings yet
Science Articles 1
13 pages
EC-211 Object Oriented: Programming and Data Structures Using C++
100% (1)
EC-211 Object Oriented: Programming and Data Structures Using C++
45 pages
Bank Management Report
100% (1)
Bank Management Report
18 pages
General Information and Features For HeartGold Rebalanced
No ratings yet
General Information and Features For HeartGold Rebalanced
7 pages
UI Lab Manual
No ratings yet
UI Lab Manual
47 pages
Wk2.a Q3 CESC Worksheet3
No ratings yet
Wk2.a Q3 CESC Worksheet3
5 pages
By: Prof. A.S.Mohanty: Lesson Notes On Organizational Behaviour Semester - 3 Under BPUT Syllabus NOTE - 19
No ratings yet
By: Prof. A.S.Mohanty: Lesson Notes On Organizational Behaviour Semester - 3 Under BPUT Syllabus NOTE - 19
2 pages
The Screwtape Letters: C.S. Lewis
No ratings yet
The Screwtape Letters: C.S. Lewis
1 page
Activity 1: Break It Down Activity 3: The Nerves!!!
No ratings yet
Activity 1: Break It Down Activity 3: The Nerves!!!
2 pages
IA Graphic Organizers
No ratings yet
IA Graphic Organizers
10 pages
Generating Images in Chat GPT
100% (1)
Generating Images in Chat GPT
46 pages
QAQC - M - Non-Conforming Work Procedure
No ratings yet
QAQC - M - Non-Conforming Work Procedure
4 pages
US Nuclear Power Policy - Nuclear Energy Policy USA - World Nuclear Association
No ratings yet
US Nuclear Power Policy - Nuclear Energy Policy USA - World Nuclear Association
22 pages
EE460: Power Electronics: King Fahd University of Petroleum and Minerals
No ratings yet
EE460: Power Electronics: King Fahd University of Petroleum and Minerals
3 pages

Data Analysis

Uploaded by

Data Analysis

Uploaded by

Data analysis in python

Dr. Santosh Prasad Gupta

6/27/2021 Department of Physics, PU: SP Gupta 1

 Statistics of a data: such as mean, median, variance, standard deviation and

 Statistics of a data: such as mean, standard deviation using numpy and

6/27/2021 Department of Physics, PU: SP Gupta 2

Write Only (‘r’): Open the file for reading.

6/27/2021 Department of Physics, PU: SP Gupta 3

# script file for creating a file and writing

6/27/2021 Department of Physics, PU: SP Gupta 5

6/27/2021 Department of Physics, PU: SP Gupta 6

Here, I have saved the data in the location : D:\PWC\data_analysis\set1.csv, having

Python script Out put

6/27/2021 Department of Physics, PU: SP Gupta 8

6/27/2021 Department of Physics, PU: SP Gupta 9

6/27/2021 Department of Physics, PU: SP Gupta 10

# Load the data or importing a data file

6/27/2021 Department of Physics, PU: SP Gupta 11

6/27/2021 Department of Physics, PU: SP Gupta 12

6/27/2021 Department of Physics, PU: SP Gupta 13

# Load the data or importing a data file

import pandas as pd Out put

6/27/2021 Department of Physics, PU: SP Gupta 15

Problem: In a class of students, 9 students scored 50 to 60, 7 students scored 61 to 70,

We will estimate the standard deviation by using the following steps.

6/27/2021 Department of Physics, PU: SP Gupta 17

# Load the data or importing a data file

6/27/2021 Department of Physics, PU: SP Gupta 20

You might also like