0% found this document useful (0 votes)

13 views33 pages

Section 7

This document provides an overview of NumPy and Pandas, two popular Python libraries for working with data. NumPy is introduced as a library for working with arrays and numerical data that aims to provide fast array operations. Key features of NumPy like ndarrays, array access, operations, and random number generation are demonstrated. Pandas is then introduced as a library built on NumPy for working with structured and labeled data. The basics of pandas Series and DataFrame objects are covered, including creation, accessing data, and common operations.

Uploaded by

emadelkhashab1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views33 pages

Section 7

Uploaded by

emadelkhashab1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Big Data

Section 7

By : Yosra Attaher
Agenda

●
NumPy `
●
Pandas
●
Talk about the project
What is NumPy?
●
NumPy is a Python library used for working with arrays.
●
It also has functions for working in domain of linear
algebra, fourier transform, and matrices.
●
NumPy was created in 2005 by Travis Oliphant. It is an
open source project and you can use it freely.
●
NumPy stands for Numerical Python.
Why Use NumPy?
●
In Python we have lists that serve the purpose of arrays,
but they are slow to process.
●
NumPy aims to provide an array object that is up to 50x
faster than traditional Python lists.
●
The array object in NumPy is called ndarray, it provides
a lot of supporting functions that make working with
ndarray very easy.
●
Arrays are very frequently used in data science, where
speed and resources are very important.
NumPy, Arrays
import numpy as np
# create a NumPy ndarray object by using the array() function.
a = np.array(45)
b = np.array([1, 2, 3])
c = np.array((1, 2, 3)) #Use a tuple to create a NumPy array
d = np.array([[1, 2, 3], [3, 4, 5],[2, 3, 4]])
e = np.array((1, 2, 3), ndmin=5)
print(a.ndim, a)
print(a)
Access Arrays
print(b.ndim, b) # 1 [1 2 3]
print(b[0], ', ', b[2]) #1, 3
print(c.ndim, c) # 1 [1 2 3]
print(c[0], ', ', c[1], ', ', c[-1]) #1, 2, 3
print(d.ndim, d) # 2 [[1 2 3] [3 4 5] [2 3 4]]
print(d[0,0], ', ', d[1,2]) #1, 5
print(e.ndim, e) # 5 [[[[[1 2 3]]]]]
print(e[0,0,0,0,2]) #3
Access Arrays
x = np.array([1,2,3,4,5,6,7,8,9,10])
print(x[1:5]) # [2 3 4 5]
print(x[4:]) # [ 5 6 7 8 9 10]
print(x[:4]) # [1 2 3 4]
print(x[-3:-1]) # [8 9]
print(x[1:5:2]) # [2 4]
print(x[::2]) # [1 3 5 7 9]
Slice
x = np.array([[1, 2, 3, 5], [3, 4, 5, 9],[2, 3, 4, 12],[4, 2, 3, 1]])

RUN:
print(x[1, 1:4])
[4 5 9]
print ("--------------------------") --------------------------

print(x[0:2, 2:4]) [[3 5]

[5 9]]
print ("--------------------------")
--------------------------
print(x[1:3, 1:3]) [[4 5]
[3 4]]
Conversion
x = np.array([1.2, 2.4, 3.5], 'f8') RUN...
print(x.dtype); print(x) float64

y = x.astype('i4') [1.2 2.4 3.5]

print(y.dtype); print(y) int32

x = np.array([1.2, 2.4, 3.5], 'i4') [1 2 3]

print(x.dtype); print(x) int32

x = np.array([1.2, 2.4, 3.5], 'i1') [1 2 3]

print(x.dtype); print(x) int8

[1 2 3]
x = np.array(["1.2", "2.4", "3.5"], 'f8')
float64
print(x.dtype); print(x)
[1.2 2.4 3.5]
Copy and View
x = np.array([1,2,3,4,5,6,7,8,9,10]) RUN….
y = x.copy()
[ 1 2 3 4 5 6 7 8 9 10]
x[1] = 12
[ 1 12 3 4 5 6 7 8 9 10]
print(y)
---------------
y = x.view()
x[1] = 12 [ 1 15 3 4 5 6 7 8 9 10]
print(y) None [ 1 15 3 4 5 6 7 8 9 10]
y[1] = 15
print ("---------------")
print(x)
print(x.base, y.base)
Reshape
x = np.array([[1, 2, 3, 5], [3, 4, 5, 9],[2, 3, 4, 12],
[4, 2, 3, 1]]) RUN…...
--------------------
print("--------------------")
(4, 4)
print(x.shape)
[[ 1 2 3 5 3 4 5 9]
y = x.reshape(2, 8) [ 2 3 4 12 4 2 3 1]]
print(y) --------------------

print("--------------------") [ 1 2 3 5 3 4 5 9 2 3 4 12 4 2 3 1]
--------------------
y = x.reshape(-1)
[[ 1 2 3 5]
print(y) [ 3 4 5 9]
print("--------------------") [ 2 3 4 12]

print(y.base) [ 4 2 3 1]]
Join
x = np.array([[1, 1, 1], [2, 2, 2]]) Run…

(4, 3) [[1 1 1]
y = np.array([[3, 3, 3], [4, 4, 4]])
[2 2 2]
z = np.concatenate((x,y)) [3 3 3]

print(z.shape, z) [4 4 4]]

--------------
print("--------------")
(4, 3) [[1 1 1]
z = np.concatenate((x,y), axis=0) [2 2 2]

print(z.shape, z) [3 3 3]

[4 4 4]]
print("--------------")
--------------
z = np.concatenate((x,y), axis=1)
(2, 6) [[1 1 1 3 3 3]
print(z.shape, z) [2 2 2 4 4 4]]
Search, Sort, Filter
#search
x = np.array([11, 31, 87, 19, 23, 43])
y = np.where(x==19); print(y) RUN…
#sort
(array([3]),)
x = np.array([11, 31, 87, 19, 23, 43])
[11 19 23 31 43 87]
y = np.sort(x); print(y)
#filter [11 23 43]

x = np.array([11, 31, 87, 19, 23, 43])

s = [True, False, False, False, True, True]
y = x[s]; print(y)
NUMPY, RANDOM
import numpy as np RUN…

0.7537900893332695

from numpy import random 86

#basics [30 79 10 14 94]

[[ 7 91 46]

x = random.rand(); print(x) [ 0 65 56]

x = random.randint(100); print(x) [62 64 28]

[91 72 18]

x = random.randint(100, size=5); [16 37 24]]

print(x) [0.89308242 0.11235977 0.57879863 0.63562923 0.68296079]

x = random.randint(100, size=(5, 3)); [[0.28630843 0.87333319 0.07027453]

print(x) [0.82643457 0.81043574 0.47318528]

[0.38990336 0.267552 0.23475348]

x = random.rand(5); print(x) [0.28870442 0.82799002 0.85453119]

x = random.rand(5,3); print(x) [0.55594484 0.29363382 0.97318952]]

Random Choice
x = random.choice([5,3,7,8]); RUN…
print(x)
5
x = random.choice([5,3,7,8],
size=(10)); print(x) [7 3 3 5 5 7 5 5 8 5]

x = random.choice([5,3,7,8], [[8 8 3]
size=(2,3)); print(x) [7 8 3]]
x = random.choice([5,3,7,8], [7 7 7 7 3 3 7 7 7 7]
p=[0.1, 0.3, 0.6, 0.0],
size=(10));
print(x)
Shuffel
x = np.array([1,2,3,4,5,6,7,8]) RUN…
o = x.copy() [1 2 3 4 5 6 7 8]
random.shuffle(x) [7 6 3 5 1 4 8 2]
print('\n', o, '\n', x)
x = np.array([1,2,3,4,5,6,7,8]) [1 2 3 4 5 6 7 8]
y = random.permutation(x) [2 4 3 6 5 1 7 8]
print('\n', x, '\n', y)
Random Distribution
import numpy as np
from numpy import random
import matplotlib.pyplot as plt

# We can plot Normal Distribution, Binomial Distribution,

Poisson Distribution, Uniform Distribution, Logarithmic
Distribution, Multinomial Distribution, Exponential
Distribution, Chi-Square Distribution
What is Pandas?
●
Pandas is a Python library used for working with
data sets.
●
It has functions for analyzing, cleaning, exploring,
and manipulating data.
●
The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
Why Use Pandas?
●
Pandas allows us to analyze big data and make
conclusions based on statistical theories.
●
Pandas can clean messy data sets, and make
them readable and relevant.
●
Relevant data is very important in data science.
Series, Creation
import pandas as pd
import numpy as np
#creating series
s = pd.Series([22, 32, 31, 42, 51]); print(s)
data = np.array(['a', 'b', 'c', 'd'])
s = pd.Series(data); print(s)
s = pd.Series(data,
index=[100,101,102,103]); print(s)
Series , Creation
data = {'a':100, 'b':120, 'c':99}
s = pd.Series(data); print(s)
data = {'c':99, 'a':100, 'b':120}
s = pd.Series(data, index=['a', 'b', 'c', 'd']);
print(s)
s = pd.Series(5, index=['a', 'b', 'c', 'd']);
print(s)
Series, Accessing
s = pd.Series([1,2,3,4,5],index =
['a','b','c','d','e'])
print(s[0])
print(s[1:3])
print(s[:3])
print(s[1:])
print(s[:])
print(s[-1])
print(s[-3:-1])
print(s['a'])
print(s[['a', 'c', 'e']])
print(s[[2, 4]])
Series, Basic Functions
calories = {'day1': 200, 'day2': 380,
'day3': 480, 'day4': 290}
s = pd.Series(calories); print(s.axes)
print(s.empty)
print(s.ndim)
print(s.size)
print(s.values)
print(s.head(2))
print(s.tail(2))
DataFrame, Creation
data = [12, 12, 13, 14, 15]
df = pd.DataFrame(data); print(df)
df = pd.DataFrame(data, columns =
['Temprature']);
print(df)
df = pd.DataFrame(data, columns =
['Temprature'],
dtype=float); print(df)
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age
']); print(df)
DataFrame, Creation
data = {
"calories": [200, 380, 480, 290],
"duration": [50, 40, 45, 30]
}
df = pd.DataFrame(data); print(df)
df = pd.DataFrame(data, index=['sat', 'sun',
'mon','tus']); print(df)
df = pd.DataFrame([{'math':88,
'physics':90},{'history':75, 'math':94}]); print(df)
DataFrame, Creation
data = {
'calories': pd.Series([200, 380, 480, 290],
index=['sat', 'sun', 'mon','tus']),
'duration': pd.Series([50, 40, 45, 30], index=['sat',
'sun', 'mon','tus'])
}
df = pd.DataFrame(data); print(df)
df = pd.DataFrame([{'math':88, 'physics':90},{'art':65,
'math':94}], index=['midterm', 'final'],
columns=['physics', 'math', 'art']); print(df)
DataFrame, Basic Functions
data = {
'Name':pd.Series(['Tom','James','Steve','Smith','
Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,
3.8])};
df = pd.DataFrame(data)
print(df)
print(df.T)
print(df.axes)
print(df.dtypes)
print(df.empty)
print(df.ndim)
print(df.shape)
print(df.size)
print(df.values)
DataFrame, Files
df = pd.read_csv('data/data.csv');
print(df)
df = pd.read_json('data/data.json');
print(df)
print(df.head())
print(df.head(10))
print(df.tail())
print(df.tail(6))
print(df.info())
DataFrame, Cleaning
Df =
pd.read_csv('data/wdata.csv’);
print(df)
print(df.loc[[22, 26,7, 11, 12,
18, 28]])
print(df.info())
DataFrame, Cleaning
dfcopy = df.dropna();
print(dfcopy.info())
df.dropna(inplace = True);
print(df.info())
df = pd.read_csv('data/wdata.csv’)
print(df.loc[[22, 26, 7, 11, 12, 18, 28]])
df.fillna(130, inplace = True);
print(df.info())
print(df.loc[[22, 26, 7, 11, 12, 18, 28]])
DataFrame, Cleaning
df = pd.read_csv('data/wdata.csv')
df.dropna(subset=['Date'], inplace = True)
print(df.info())
df = pd.read_csv('data/wdata.csv')
print(df.duplicated())
df.drop_duplicates(inplace=True)
print(df.duplicated())
DataFrame, Files
df = pd.read_csv('data/data.csv');
print(df)
df = pd.read_json('data/data.json');
print(df)
print(df.head())
print(df.head(10))
print(df.tail())
print(df.tail(6))
print(df.info())
Thanks

Unit 1
No ratings yet
Unit 1
170 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Numpy
No ratings yet
Numpy
18 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Numpy
No ratings yet
Numpy
9 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Numpy Merged
No ratings yet
Numpy Merged
96 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Workshop Notes-2 Handling Array With NumPy
No ratings yet
Workshop Notes-2 Handling Array With NumPy
13 pages
ML Merge
No ratings yet
ML Merge
24 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Numpy Merged
No ratings yet
Numpy Merged
93 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
Efficient Computing With NumPy
No ratings yet
Efficient Computing With NumPy
73 pages
Numpy
No ratings yet
Numpy
5 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Numpy Guide
No ratings yet
Numpy Guide
1 page
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
Arrays
No ratings yet
Arrays
28 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Numpy
No ratings yet
Numpy
14 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
13 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Num Py Notes
No ratings yet
Num Py Notes
13 pages
Numpy Python Cheat Sheet PDF
No ratings yet
Numpy Python Cheat Sheet PDF
1 page
RAW Data
No ratings yet
RAW Data
22 pages
Numpy
No ratings yet
Numpy
11 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
CP Harmony Browse AdminGuide
No ratings yet
CP Harmony Browse AdminGuide
168 pages
Sheet 3 Numpy
No ratings yet
Sheet 3 Numpy
10 pages
Numpy
No ratings yet
Numpy
20 pages
MMI Interview
No ratings yet
MMI Interview
227 pages
Python Cheat Sheets Compilation
100% (5)
Python Cheat Sheets Compilation
14 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Enthought: Introduction To Numerical Computing With Numpy
No ratings yet
Enthought: Introduction To Numerical Computing With Numpy
39 pages
Numpy Basics: Arithmetic Operations
No ratings yet
Numpy Basics: Arithmetic Operations
6 pages
Top 50 Linux MCQs (Multiple-Choice Questions and Answers)
No ratings yet
Top 50 Linux MCQs (Multiple-Choice Questions and Answers)
28 pages
A Survey On Deep Multimodal Learning For Computer Vision Advances, Trends, Applications, and Datasets
No ratings yet
A Survey On Deep Multimodal Learning For Computer Vision Advances, Trends, Applications, and Datasets
32 pages
EXCEL Formula (Importent)
No ratings yet
EXCEL Formula (Importent)
212 pages
Applied Machine Learning For Engineers: Introduction To Numpy
No ratings yet
Applied Machine Learning For Engineers: Introduction To Numpy
13 pages
Numpy Python Cheat Sheet
100% (1)
Numpy Python Cheat Sheet
1 page
Lecture 4 (Predicates and Quantifiers)
No ratings yet
Lecture 4 (Predicates and Quantifiers)
36 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
JAVA NOTES FOR Class VIII
No ratings yet
JAVA NOTES FOR Class VIII
6 pages
Radar Simrad R-5000 (Operator Manual)
No ratings yet
Radar Simrad R-5000 (Operator Manual)
82 pages
Interview Questions About Python Programming
No ratings yet
Interview Questions About Python Programming
16 pages
FortiAuthenticator 6.6.2 Release Notes
No ratings yet
FortiAuthenticator 6.6.2 Release Notes
27 pages
Machine Manual - TPSys - 2.4
No ratings yet
Machine Manual - TPSys - 2.4
276 pages
Latest QA Jobs 04 June 2024
No ratings yet
Latest QA Jobs 04 June 2024
33 pages
Number Theory 1
100% (1)
Number Theory 1
60 pages
Data Protection Control Framework
No ratings yet
Data Protection Control Framework
47 pages
Numpy Basics: Arithmetic Operations
No ratings yet
Numpy Basics: Arithmetic Operations
6 pages
1.1 Data Representation EMK Notes 2023
No ratings yet
1.1 Data Representation EMK Notes 2023
9 pages
Vivid Iq 4D Datasheet v9
No ratings yet
Vivid Iq 4D Datasheet v9
17 pages
Yealink CP960 Datasheet
No ratings yet
Yealink CP960 Datasheet
3 pages
Classroom Inventory
No ratings yet
Classroom Inventory
1 page
Contacts
No ratings yet
Contacts
7 pages
Interior Design
No ratings yet
Interior Design
18 pages
Lab 5 - Creating and Configuring Group Policy Objects
No ratings yet
Lab 5 - Creating and Configuring Group Policy Objects
7 pages
Projec 2 Analysis of Big Data
No ratings yet
Projec 2 Analysis of Big Data
7 pages
Karnataka DV List1
No ratings yet
Karnataka DV List1
2 pages
PMD Overwievrev - 6
No ratings yet
PMD Overwievrev - 6
17 pages
Eqps CN
No ratings yet
Eqps CN
43 pages
Chapter 4
No ratings yet
Chapter 4
5 pages
G010059 - 2014-04-11 - CECOD LIST OF PROTOCOLS ON PETROL STATIONS FOR WELMEC GUIDE - Rev 2014-04-11
No ratings yet
G010059 - 2014-04-11 - CECOD LIST OF PROTOCOLS ON PETROL STATIONS FOR WELMEC GUIDE - Rev 2014-04-11
1 page
Technical Information: CD-P1120/1440/1820/CD-150, Change of Mechanism
No ratings yet
Technical Information: CD-P1120/1440/1820/CD-150, Change of Mechanism
2 pages
Snow Blower 120 ATV: WWW - Rammy.fi/en
No ratings yet
Snow Blower 120 ATV: WWW - Rammy.fi/en
2 pages
CYME Gateway - Creating The Network Model For CYME
No ratings yet
CYME Gateway - Creating The Network Model For CYME
2 pages
Welcome To Allianz Global Assistance Overseas Student Health Cover (OSHC)
No ratings yet
Welcome To Allianz Global Assistance Overseas Student Health Cover (OSHC)
2 pages
Number Freak: From 1 to 200- The Hidden Language of Numbers Revealed
From Everand
Number Freak: From 1 to 200- The Hidden Language of Numbers Revealed
Derrick Niederman
3/5 (9)
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Section 7

Uploaded by

Section 7

Uploaded by

Big Data

print(x[0:2, 2:4]) [[3 5]

y = x.astype('i4') [1.2 2.4 3.5]

print(y.dtype); print(y) int32

x = np.array([1.2, 2.4, 3.5], 'i4') [1 2 3]

print(x.dtype); print(x) int32

x = np.array([1.2, 2.4, 3.5], 'i1') [1 2 3]

print(x.dtype); print(x) int8

x = np.array([11, 31, 87, 19, 23, 43])

from numpy import random 86

#basics [30 79 10 14 94]

x = random.rand(); print(x) [ 0 65 56]

x = random.randint(100); print(x) [62 64 28]

x = random.randint(100, size=5); [16 37 24]]

x = random.randint(100, size=(5, 3)); [[0.28630843 0.87333319 0.07027453]

print(x) [0.82643457 0.81043574 0.47318528]

[0.38990336 0.267552 0.23475348]

x = random.rand(5,3); print(x) [0.55594484 0.29363382 0.97318952]]

# We can plot Normal Distribution, Binomial Distribution,

You might also like